1
|
Xu G, Amei A, Wu W, Liu Y, Shen L, Oh EC, Wang Z. RETROSPECTIVE VARYING COEFFICIENT ASSOCIATION ANALYSIS OF LONGITUDINAL BINARY TRAITS: APPLICATION TO THE IDENTIFICATION OF GENETIC LOCI ASSOCIATED WITH HYPERTENSION. Ann Appl Stat 2024; 18:487-505. [PMID: 38577266 PMCID: PMC10994004 DOI: 10.1214/23-aoas1798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Many genetic studies contain rich information on longitudinal phenotypes that require powerful analytical tools for optimal analysis. Genetic analysis of longitudinal data that incorporates temporal variation is important for understanding the genetic architecture and biological variation of complex diseases. Most of the existing methods assume that the contribution of genetic variants is constant over time and fail to capture the dynamic pattern of disease progression. However, the relative influence of genetic variants on complex traits fluctuates over time. In this study, we propose a retrospective varying coefficient mixed model association test, RVMMAT, to detect time-varying genetic effect on longitudinal binary traits. We model dynamic genetic effect using smoothing splines, estimate model parameters by maximizing a double penalized quasi-likelihood function, design a joint test using a Cauchy combination method, and evaluate statistical significance via a retrospective approach to achieve robustness to model misspecification. Through simulations we illustrated that the retrospective varying-coefficient test was robust to model misspecification under different ascertainment schemes and gained power over the association methods assuming constant genetic effect. We applied RVMMAT to a genome-wide association analysis of longitudinal measure of hypertension in the Multi-Ethnic Study of Atherosclerosis. Pathway analysis identified two important pathways related to G-protein signaling and DNA damage. Our results demonstrated that RVMMAT could detect biologically relevant loci and pathways in a genome scan and provided insight into the genetic architecture of hypertension.
Collapse
Affiliation(s)
- Gang Xu
- Department of Mathematical Sciences, University of Nevada
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Linchuan Shen
- Department of Mathematical Sciences, University of Nevada
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
2
|
Jin H, Kwak SH, Yoon JW, Lee S, Park KS, Won S, Cho NH. Genome-Wide Association Study on Longitudinal Change in Fasting Plasma Glucose in Korean Population. Diabetes Metab J 2023; 47:255-266. [PMID: 36653889 PMCID: PMC10040618 DOI: 10.4093/dmj.2021.0375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 04/27/2022] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) on type 2 diabetes mellitus (T2DM) have identified more than 400 distinct genetic loci associated with diabetes and nearly 120 loci for fasting plasma glucose (FPG) and fasting insulin level to date. However, genetic risk factors for the longitudinal deterioration of FPG have not been thoroughly evaluated. We aimed to identify genetic variants associated with longitudinal change of FPG over time. METHODS We used two prospective cohorts in Korean population, which included a total of 10,528 individuals without T2DM. GWAS of repeated measure of FPG using linear mixed model was performed to investigate the interaction of genetic variants and time, and meta-analysis was conducted. Genome-wide complex trait analysis was used for heritability calculation. In addition, expression quantitative trait loci (eQTL) analysis was performed using the Genotype-Tissue Expression project. RESULTS A small portion (4%) of the genome-wide single nucleotide polymorphism (SNP) interaction with time explained the total phenotypic variance of longitudinal change in FPG. A total of four known genetic variants of FPG were associated with repeated measure of FPG levels. One SNP (rs11187850) showed a genome-wide significant association for genetic interaction with time. The variant is an eQTL for NOC3 like DNA replication regulator (NOC3L) gene in pancreas and adipose tissue. Furthermore, NOC3L is also differentially expressed in pancreatic β-cells between subjects with or without T2DM. However, this variant was not associated with increased risk of T2DM nor elevated FPG level. CONCLUSION We identified rs11187850, which is an eQTL of NOC3L, to be associated with longitudinal change of FPG in Korean population.
Collapse
Affiliation(s)
- Heejin Jin
- Institute of Health and Environment, Seoul National University, Seoul, Korea
| | - Soo Heon Kwak
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Korea
| | - Ji Won Yoon
- Department of Internal Medicine, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, Korea
| | - Sanghun Lee
- Department of Bioconvergence & Engineering, Dankook University, Yongin, Korea
| | - Kyong Soo Park
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Korea
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea
| | - Sungho Won
- Institute of Health and Environment, Seoul National University, Seoul, Korea
- Department of Public Health Sciences, Seoul National University, Seoul, Korea
- RexSoft Inc., Seoul, Korea
| | - Nam H. Cho
- Department of Preventive Medicine, Ajou University School of Medicine, Suwon, Korea
| |
Collapse
|
3
|
Venkatesh SS, Ganjgahi H, Palmer DS, Coley K, Wittemans LBL, Nellaker C, Holmes C, Lindgren CM, Nicholson G. The genetic architecture of changes in adiposity during adulthood. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.09.23284364. [PMID: 36711652 PMCID: PMC9882550 DOI: 10.1101/2023.01.09.23284364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Obesity is a heritable disease, characterised by excess adiposity that is measured by body mass index (BMI). While over 1,000 genetic loci are associated with BMI, less is known about the genetic contribution to adiposity trajectories over adulthood. We derive adiposity-change phenotypes from 1.5 million primary-care health records in over 177,000 individuals in UK Biobank to study the genetic architecture of weight-change. Using multiple BMI measurements over time increases power to identify genetic factors affecting baseline BMI. In the largest reported genome-wide study of adiposity-change in adulthood, we identify novel associations with BMI-change at six independent loci, including rs429358 (a missense variant in APOE). The SNP-based heritability of BMI-change (1.98%) is 9-fold lower than that of BMI, and higher in women than in men. The modest genetic correlation between BMI-change and BMI (45.2%) indicates that genetic studies of longitudinal trajectories could uncover novel biology driving quantitative trait values in adulthood.
Collapse
Affiliation(s)
- Samvida S. Venkatesh
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | | | - Duncan S. Palmer
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Sciences Division, University of Oxford, UK
| | - Kayesha Coley
- Department of Population Health Sciences, University of Leicester, UK
| | - Laura B. L. Wittemans
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Sciences Division, University of Oxford, UK
| | - Christoffer Nellaker
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Sciences Division, University of Oxford, UK
| | - Chris Holmes
- Department of Statistics, University of Oxford, UK
- Nuffield Department of Medicine, Medical Sciences Division, University of Oxford, UK
- The Alan Turing Institute, London, UK
| | - Cecilia M. Lindgren
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK
- Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Sciences Division, University of Oxford, UK
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | | |
Collapse
|
4
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
5
|
Wang H, Zhang J, Klump KL, Alexandra Burt S, Cui Y. Multivariate partial linear varying coefficients model for gene-environment interactions with multiple longitudinal traits. Stat Med 2022; 41:3643-3660. [PMID: 35582816 PMCID: PMC9308731 DOI: 10.1002/sim.9440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 04/26/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022]
Abstract
Correlated phenotypes often share common genetic determinants. Thus, a multi‐trait analysis can potentially increase association power and help in understanding pleiotropic effect. When multiple traits are jointly measured over time, the correlation information between multivariate longitudinal responses can help to gain power in association analysis, and the longitudinal traits can provide insights on the dynamic gene effect over time. In this work, we propose a multivariate partially linear varying coefficients model to identify genetic variants with their effects potentially modified by environmental factors. We derive a testing framework to jointly test the association of genetic factors and illustrated with a bivariate phenotypic trait, while taking the time varying genetic effects into account. We extend the quadratic inference functions to deal with the longitudinal correlations and used penalized splines for the approximation of nonparametric coefficient functions. Theoretical results such as consistency and asymptotic normality of the estimates are established. The performance of the testing procedure is evaluated through Monte Carlo simulation studies. The utility of the method is demonstrated with a real data set from the Twin Study of Hormones and Behavior across the menstrual cycle project, in which single nucleotide polymorphisms associated with emotional eating behavior are identified.
Collapse
Affiliation(s)
- Honglang Wang
- Department of Mathematical Sciences, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Jingyi Zhang
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA.,Amazon Lab126, Sunnyvale, California, USA
| | - Kelly L Klump
- Department of Psychology, Michigan State University, East Lansing, Michigan, USA
| | - Sybil Alexandra Burt
- Department of Psychology, Michigan State University, East Lansing, Michigan, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
6
|
Chung W, Cho Y. Bayesian mixed models for longitudinal genetic data: theory, concepts, and simulation studies. Genomics Inform 2022; 20:e8. [PMID: 35399007 PMCID: PMC9001998 DOI: 10.5808/gi.21080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 03/03/2022] [Indexed: 01/02/2023] Open
Abstract
Despite the success of recent genome-wide association studies investigating longitudinal traits, a large fraction of overall heritability remains unexplained. This suggests that some of the missing heritability may be accounted for by gene-gene and gene-time/environment interactions. In this paper, we develop a Bayesian variable selection method for longitudinal genetic data based on mixed models. The method jointly models the main effects and interactions of all candidate genetic variants and non-genetic factors and has higher statistical power than previous approaches. To account for the within-subject dependence structure, we propose a grid-based approach that models only one fixed-dimensional covariance matrix, which is thus applicable to data where subjects have different numbers of time points. We provide the theoretical basis of our Bayesian method and then illustrate its performance using data from the 1000 Genome Project with various simulation settings. Several simulation studies show that our multivariate method increases the statistical power compared to the corresponding univariate method and can detect gene-time/environment interactions well. We further evaluate our method with different numbers of individuals, variants, and causal variants, as well as different trait-heritability, and conclude that our method performs reasonably well with various simulation settings.
Collapse
Affiliation(s)
- Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea.,Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Youngkwang Cho
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea
| |
Collapse
|
7
|
Grid-based Gaussian process models for longitudinal genetic data. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2022. [DOI: 10.29220/csam.2022.29.1.065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
8
|
Grid-based Gaussian process models for longitudinal genetic data. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2022. [DOI: 10.29220/csam.2022.29.1.745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
9
|
Xu H, Li X, Yang Y, Li Y, Pinheiro J, Sasser K, Hamadeh H, Steven X, Yuan M. High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes. Bioinformatics 2020; 36:3004-3010. [PMID: 32096821 DOI: 10.1093/bioinformatics/btaa120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 01/16/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this article, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates from mixed-effects modeling with a novel ℓ0-norm algorithm. RESULTS Extensive simulations demonstrated that the proposed approach not only provided accurate selection of single nucleotide polymorphisms (SNPs) with comparable or higher power but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately >1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer's Disease Neuroimaging Initiative, we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (≥6000 times). AVAILABILITY AND IMPLEMENTATION The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huang Xu
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Xiang Li
- Janssen Research and Development, Raritan, NJ 08869, USA
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Yi Li
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Jose Pinheiro
- Janssen Research and Development, Raritan, NJ 08869, USA
| | | | | | - Xu Steven
- Genmab US, Inc., Princeton, NJ 08540, USA
| | - Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei 230032, China
| | | |
Collapse
|
10
|
An B, Xu L, Xia J, Wang X, Miao J, Chang T, Song M, Ni J, Xu L, Zhang L, Li J, Gao H. Multiple association analysis of loci and candidate genes that regulate body size at three growth stages in Simmental beef cattle. BMC Genet 2020; 21:32. [PMID: 32171250 PMCID: PMC7071762 DOI: 10.1186/s12863-020-0837-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 03/04/2020] [Indexed: 01/08/2023] Open
Abstract
Background Body size traits as one of the main breeding selection criteria was widely used to monitor cattle growth and to evaluate the selection response. In this study, body size was defined as body height (BH), body length (BL), hip height (HH), heart size (HS), abdominal size (AS), and cannon bone size (CS). We performed genome-wide association studies (GWAS) of these traits over the course of three growth stages (6, 12 and 18 months after birth) using three statistical models, single-trait GWAS, multi-trait GWAS and LONG-GWAS. The Illumina Bovine HD 770 K BeadChip was used to identify genomic single nucleotide polymorphisms (SNPs) in 1217 individuals. Results In total, 19, 29, and 10 significant SNPs were identified by the three models, respectively. Among these, 21 genes were promising candidate genes, including SOX2, SNRPD1, RASGEF1B, EFNA5, PTBP1, SNX9, SV2C, PKDCC, SYNDIG1, AKR1E2, and PRIM2 identified by single-trait analysis; SLC37A1, LAP3, PCDH7, MANEA, and LHCGR identified by multi-trait analysis; and P2RY1, MPZL1, LINGO2, CMIP, and WSCD1 identified by LONG-GWAS. Conclusions Multiple association analysis was performed for six growth traits at each growth stage. These findings offer valuable insights for the further investigation of potential genetic mechanism of growth traits in Simmental beef cattle.
Collapse
Affiliation(s)
| | | | - Jiangwei Xia
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, 310000, China
| | - Xiaoqiao Wang
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Jian Miao
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Tianpeng Chang
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Meihua Song
- Zhuang Yuan Veterinary Station of Qixia city, Yantai, 265300, China
| | - Junqing Ni
- Heibei Livestock Breeding Workstation, Shijiazhuang, 050061, China
| | - Lingyang Xu
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Lupei Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Junya Li
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China
| | - Huijiang Gao
- Institute of Animal Science, Chinese Academy of Agricultural Science, Beijing, 100193, China.
| |
Collapse
|
11
|
Wu W, Wang Z, Xu K, Zhang X, Amei A, Gelernter J, Zhao H, Justice AC, Wang Z. Retrospective Association Analysis of Longitudinal Binary Traits Identifies Important Loci and Pathways in Cocaine Use. Genetics 2019; 213:1225-1236. [PMID: 31591132 PMCID: PMC6893384 DOI: 10.1534/genetics.119.302598] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 10/04/2019] [Indexed: 12/15/2022] Open
Abstract
Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.
Collapse
Affiliation(s)
- Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Zhong Wang
- Baker Institute for Animal Health, Cornell University, Ithaca, New York 14850
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06511
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
12
|
Hosseinzadeh N, Mehrabi Y, Daneshpour MS, Zayeri F, Guity K, Azizi F. Identifying new associated pleiotropic SNPs with lipids by simultaneous test of multiple longitudinal traits: An Iranian family-based study. Gene 2019; 692:156-169. [DOI: 10.1016/j.gene.2019.01.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 01/05/2019] [Accepted: 01/11/2019] [Indexed: 02/08/2023]
|
13
|
Wang Z, Wang N, Wu R, Wang Z. fGWAS: An R package for genome-wide association analysis with longitudinal phenotypes. J Genet Genomics 2018; 45:411-413. [PMID: 30049619 PMCID: PMC6179436 DOI: 10.1016/j.jgg.2018.06.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 06/18/2018] [Accepted: 06/27/2018] [Indexed: 10/28/2022]
Affiliation(s)
- Zhong Wang
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China; Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY 14850, USA.
| | - Nating Wang
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Rongling Wu
- Center for Computational Biology, Beijing Forestry University, Beijing 100083, China; Center for Statistical Genetics, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA 17033, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA.
| |
Collapse
|
14
|
Rudra P, Broadaway KA, Ware EB, Jhun MA, Bielak LF, Zhao W, Smith JA, Peyser PA, Kardia SL, Epstein MP, Ghosh D. Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits. Genet Epidemiol 2018; 42:320-332. [PMID: 29601641 PMCID: PMC5980726 DOI: 10.1002/gepi.22121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Revised: 01/19/2018] [Accepted: 02/19/2018] [Indexed: 01/09/2023]
Abstract
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next-generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross-phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare-variant approaches exist for testing cross-phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross-phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome-wide scale due to the use of a closed-form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.
Collapse
Affiliation(s)
- Pratyaydipta Rudra
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO
| | | | - Erin B. Ware
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI
| | - Min A. Jhun
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
| | | | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
| | | | | | | | | | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO
| |
Collapse
|
15
|
Wu X, McPeek MS. L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals. Am J Hum Genet 2018; 102:574-591. [PMID: 29625022 DOI: 10.1016/j.ajhg.2018.02.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 02/20/2018] [Indexed: 01/11/2023] Open
Abstract
In complex-trait mapping, when each subject has multiple measurements of a quantitative trait over time, power for detecting genetic association can be gained by the inclusion of all measurements and not just single time points or averages in the analysis. To increase power and control type 1 error, one should account for dependence among observations for a single individual as well as dependence between observations of related individuals if they are present in the sample. We propose L-GATOR, a retrospective, mixed-effects method for association mapping of longitudinally measured traits in samples with related individuals. L-GATOR allows arbitrary time points for different individuals, incorporates both time-varying and static covariates, and properly addresses various types of dependence. In simulations, we show that L-GATOR outperforms existing prospective methods in terms of both type 1 error and power when there is phenotype model misspecification or missing data. Compared with the previously proposed longGWAS method, L-GATOR was more than ten times faster for association testing in our simulations and almost 100 times faster for parameter estimation. L-GATOR is applicable to essentially arbitrary combinations of related and unrelated individuals, including small families as well as large, complex pedigrees. We apply the method to data from the Framingham Heart Study to identify association between longitudinal systolic blood pressure measurements and genome-wide SNPs. Of the smallest p values, one-third occur in or near genes that have been previously identified as associated with pulse pressure (such as PIK3CG) and systolic and diastolic blood pressure (such as C10orf107), showing that L-GATOR is able to prioritize relevant loci in a genome screen.
Collapse
|
16
|
He Z, Lee S, Zhang M, Smith JA, Guo X, Palmas W, Kardia SL, Ionita-Laza I, Mukherjee B. Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA). Genet Epidemiol 2017; 41:801-810. [PMID: 29076270 PMCID: PMC5696115 DOI: 10.1002/gepi.22081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 08/24/2017] [Accepted: 08/24/2017] [Indexed: 11/09/2022]
Abstract
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene-based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one-at-a-time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model-based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare-variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within-subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi-Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.
Collapse
Affiliation(s)
- Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109
| | - Xiuqing Guo
- Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90509
| | - Walter Palmas
- Department of Medicine, Columbia University, New York, NY 10032
| | | | | | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
17
|
Qian J, Nunez S, Kim S, Reilly MP, Foulkes AS. A score test for genetic class-level association with nonlinear biomarker trajectories. Stat Med 2017; 36:3075-3091. [PMID: 28543585 DOI: 10.1002/sim.7314] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 01/12/2017] [Accepted: 03/22/2017] [Indexed: 11/06/2022]
Abstract
Emerging data suggest that the genetic regulation of the biological response to inflammatory stress may be fundamentally different to the genetic underpinning of the homeostatic control (resting state) of the same biological measures. In this paper, we interrogate this hypothesis using a single-SNP score test and a novel class-level testing strategy to characterize protein-coding gene and regulatory element-level associations with longitudinal biomarker trajectories in response to stimulus. Using the proposed class-level association score statistic for longitudinal data, which accounts for correlations induced by linkage disequilibrium, the genetic underpinnings of evoked dynamic changes in repeatedly measured biomarkers are investigated. The proposed method is applied to data on two biomarkers arising from the Genetics of Evoked Responses to Niacin and Endotoxemia study, a National Institutes of Health-sponsored investigation of the genomics of inflammatory and metabolic responses during low-grade endotoxemia. Our results suggest that the genetic basis of evoked inflammatory response is different than the genetic contributors to resting state, and several potentially novel loci are identified. A simulation study demonstrates appropriate control of type-1 error rates, relative computational efficiency, and power. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Jing Qian
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, U.S.A
| | - Sara Nunez
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | - Soohyun Kim
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| | | | - Andrea S Foulkes
- Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A
| |
Collapse
|
18
|
Longitudinal data analysis for rare variants detection with penalized quadratic inference function. Sci Rep 2017; 7:650. [PMID: 28381821 PMCID: PMC5429681 DOI: 10.1038/s41598-017-00712-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 03/08/2017] [Indexed: 11/08/2022] Open
Abstract
Longitudinal genetic data provide more information regarding genetic effects over time compared with cross-sectional data. Coupled with next-generation sequencing technologies, it becomes reality to identify important genes containing both rare and common variants in a longitudinal design. In this work, we adopted a weighted sum statistic (WSS) to collapse multiple variants in a gene region to form a gene score. When multiple genes in a pathway were considered together, a penalized longitudinal model under the quadratic inference function (QIF) framework was applied for efficient gene selection. We evaluated the estimation accuracy and model selection performance under different model settings, then applied the method to a real dataset from the Genetic Analysis Workshop 18 (GAW18). Compared with the unpenalized QIF method, the penalized QIF (pQIF) method achieved better estimation accuracy and higher selection efficiency. The pQIF remained optimal even when the working correlation structure was mis-specified. The real data analysis identified one important gene, angiotensin II receptor type 1 (AGTR1), in the Ca2+/AT-IIR/α-AR signaling pathway. The estimated effect implied that AGTR1 may have a protective effect for hypertension. Our pQIF method provides a general tool for longitudinal sequencing studies involving large numbers of genetic variants.
Collapse
|
19
|
Fusi N, Listgarten J. Flexible Modeling of Genetic Effects on Function-Valued Traits. J Comput Biol 2017; 24:524-535. [PMID: 28056190 DOI: 10.1089/cmb.2016.0174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association studies commonly examine one trait at a time. Occasionally they examine several related traits with the hope of increasing power; in such a setting, the traits are not generally smoothly varying in any way such as time or space. However, for function-valued traits, the trait is often smoothly varying along the axis of interest, such as space or time. For instance, in the case of longitudinal traits such as growth curves, the axis of interest is time; for spatially varying traits such as chromatin accessibility, it would be position along the genome. Although there have been efforts to perform genome-wide association studies with such function-valued traits, the statistical approaches developed for this purpose often have limitations such as requiring the trait to behave linearly in time or space, or constraining the genetic effect itself to be constant or linear in time. Herein, we present a flexible model for this problem-the Partitioned Gaussian Process-which removes many such limitations and is especially effective as the number of time points increases. The theoretical basis of this model provides machinery for handling missing and unaligned function values such as would occur when not all individuals are measured at the same time points. Furthermore, we make use of algebraic refactorizations to substantially reduce the time complexity of our model beyond the naive implementation. Finally, we apply our approach and several others to synthetic data before closing, with some directions for improved modeling and statistical testing.
Collapse
Affiliation(s)
- Nicolo Fusi
- Microsoft Research , Cambridge, Massachusetts
| | | |
Collapse
|
20
|
Wang Z, Xu K, Zhang X, Wu X, Wang Z. Longitudinal SNP-set association analysis of quantitative phenotypes. Genet Epidemiol 2017; 41:81-93. [PMID: 27859628 PMCID: PMC5154867 DOI: 10.1002/gepi.22016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/10/2016] [Accepted: 09/19/2016] [Indexed: 02/06/2023]
Abstract
Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2.
Collapse
Affiliation(s)
- Zhong Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Baker Institute for Animal Health, Cornell University, Ithaca, New York, United States of America
- Center for Computational Biology, Beijing Forestry University, Beijing, China
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
21
|
Zhu H, Wang Z, Wang X, Sha Q. A novel statistical method for rare-variant association studies in general pedigrees. BMC Proc 2016; 10:193-196. [PMID: 27980635 PMCID: PMC5133499 DOI: 10.1186/s12919-016-0029-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to identify rare variants that underlie complex diseases. For any type of study design, the statistical power will be improved if rare variants can be enriched in the samples. Family-based designs, with ascertainment based on phenotype, may enrich the sample for causal rare variants and thus can be more powerful than population-based designs. Therefore, it is important to develop family-based statistical methods that can account for ascertainment. In this paper, we develop a novel statistical method for rare-variant association studies in general pedigrees for quantitative traits. This method uses a retrospective view that treats the traits as fixed and the genotypes as random, which allows us to account for complex and undefined ascertainment of families. We then apply the newly developed method to the Genetic Analysis Workshop 19 data set and compare the power of the new method with two other methods for general pedigrees. The results show that the newly proposed method increases power in most of the cases we consider, more than the other two methods.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203-5017 USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931 USA
| |
Collapse
|
22
|
Melton PE, Peralta JM, Almasy L. Constrained multivariate association with longitudinal phenotypes. BMC Proc 2016; 10:329-332. [PMID: 27980657 PMCID: PMC5133503 DOI: 10.1186/s12919-016-0051-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The incorporation of longitudinal data into genetic epidemiological studies has the potential to provide valuable information regarding the effect of time on complex disease etiology. Yet, the majority of research focuses on variables collected from a single time point. This aim of this study was to test for main effects on a quantitative trait across time points using a constrained maximum-likelihood measured genotype approach. This method simultaneously accounts for all repeat measurements of a phenotype in families. We applied this method to systolic blood pressure (SBP) measurements from three time points using the Genetic Analysis Workshop 19 (GAW19) whole-genome sequence family simulated data set and 200 simulated replicates. Data consisted of 849 individuals from 20 extended Mexican American pedigrees. Comparisons were made among 3 statistical approaches: (a) constrained, where the effect of a variant or gene region on the mean trait value was constrained to be equal across all measurements; (b) unconstrained, where the variant or gene region effect was estimated separately for each time point; and (c) the average SBP measurement from three time points. These approaches were run for nine genetic variants with known effect sizes (>0.001) for SBP variability and a known gene-centric kernel (MAP4)-based test under the GAW19 simulation model across 200 replicates. RESULTS When compared to results using two time points, the constrained method utilizing all 3 time points increased power to detect association. Averaging SBP was equally effective when the variant has a large effect on the phenotype, but less powerful for variants with lower effect sizes. However, averaging SBP was far more effective than either the constrained or unconstrained approaches when using a gene-centric kernel-based test. CONCLUSION We determined that this constrained multivariate approach improves genetic signal over the bivariate method. However, this method is still only effective in those variants that explain a moderate to large proportion of the phenotypic variance but is not as effective for gene-centric tests.
Collapse
Affiliation(s)
- Phillip E. Melton
- The Curtin/UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Medicine Dentistry & Health Sciences, The University of Western Australia, Perth, Australia
| | - Juan M. Peralta
- South Texas Diabetes and Obesity Institute, University of Texas at Brownsville, Brownsville, TX 78520 USA
| | - Laura Almasy
- South Texas Diabetes and Obesity Institute, University of Texas Health Science Center, San Antonio, TX 78229 USA
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| |
Collapse
|
23
|
Marchetti-Bowick M, Yin J, Howrylak JA, Xing EP. A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits. Bioinformatics 2016; 32:2903-10. [PMID: 27296983 PMCID: PMC5942717 DOI: 10.1093/bioinformatics/btw347] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 05/24/2016] [Accepted: 05/27/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Despite the widespread popularity of genome-wide association studies (GWAS) for genetic mapping of complex traits, most existing GWAS methodologies are still limited to the use of static phenotypes measured at a single time point. In this work, we propose a new method for association mapping that considers dynamic phenotypes measured at a sequence of time points. Our approach relies on the use of Time-Varying Group Sparse Additive Models (TV-GroupSpAM) for high-dimensional, functional regression. RESULTS This new model detects a sparse set of genomic loci that are associated with trait dynamics, and demonstrates increased statistical power over existing methods. We evaluate our method via experiments on synthetic data and perform a proof-of-concept analysis for detecting single nucleotide polymorphisms associated with two phenotypes used to assess asthma severity: forced vital capacity, a sensitive measure of airway obstruction and bronchodilator response, which measures lung response to bronchodilator drugs. AVAILABILITY AND IMPLEMENTATION Source code for TV-GroupSpAM freely available for download at http://www.cs.cmu.edu/~mmarchet/projects/tv_group_spam, implemented in MATLAB. CONTACT epxing@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Junming Yin
- Department of Management Information Systems, University of Arizona, Tucson, AZ, USA
| | - Judie A Howrylak
- Division of Pulmonary and Critical Care Medicine, Penn State University, Milton S. Hershey Medical Center, Hershey, PA, USA
| | - Eric P Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
24
|
Vujkovic M, Aplenc R, Alonzo TA, Gamis AS, Li Y. Comparing Analytic Methods for Longitudinal GWAS and a Case-Study Evaluating Chemotherapy Course Length in Pediatric AML. A Report from the Children's Oncology Group. Front Genet 2016; 7:139. [PMID: 27547214 PMCID: PMC4974249 DOI: 10.3389/fgene.2016.00139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 07/19/2016] [Indexed: 12/11/2022] Open
Abstract
Regression analysis is commonly used in genome-wide association studies (GWAS) to test genotype-phenotype associations but restricts the phenotype to a single observation for each individual. There is an increasing need for analytic methods for longitudinally collected phenotype data. Several methods have been proposed to perform longitudinal GWAS for family-based studies but few methods are described for unrelated populations. We compared the performance of three statistical approaches for longitudinal GWAS in unrelated subjectes: (1) principal component-based generalized estimating equations (PC-GEE); (2) principal component-based linear mixed effects model (PC-LMEM); (3) kinship coefficient matrix-based linear mixed effects model (KIN-LMEM), in a study of single-nucleotide polymorphisms (SNPs) on the duration of 4 courses of chemotherapy in 624 unrelated children with de novo acute myeloid leukemia (AML) genotyped on the Illumina 2.5 M OmniQuad from the COG studies AAML0531 and AAML1031. In this study we observed an exaggerated type I error with PC-GEE in SNPs with minor allele frequencies < 0.05, wheras KIN-LMEM produces more than expected type II errors. PC-MEM showed balanced type I and type II errors for the observed vs. expected P-values in comparison to competing approaches. In general, a strong concordance was observed between the P-values with the different approaches, in particular among P < 0.01 where the between-method AUCs exceed 99%. PC-LMEM accounts for genetic relatedness and correlations among repeated phenotype measures, shows minimal genome-wide inflation of type I errors, and yields high power. We therefore recommend PC-LMEM as a robust analytic approach for GWAS of longitudinal data in unrelated populations.
Collapse
Affiliation(s)
- Marijana Vujkovic
- Division of Oncology, Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Richard Aplenc
- Division of Oncology, Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Todd A Alonzo
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California Los Angeles, CA, USA
| | - Alan S Gamis
- Division of Hematology, Oncology Bone Marrow Transplantation, Children's Mercy Hospitals and Clinics Kansas City, MO, USA
| | - Yimei Li
- Division of Oncology, Children's Hospital of Philadelphia Philadelphia, PA, USA
| |
Collapse
|
25
|
Yan Q, Weeks DE, Tiwari HK, Yi N, Zhang K, Gao G, Lin WY, Lou XY, Chen W, Liu N. Rare-Variant Kernel Machine Test for Longitudinal Data from Population and Family Samples. Hum Hered 2016; 80:126-38. [PMID: 27161037 DOI: 10.1159/000445057] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 02/24/2016] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE The kernel machine (KM) test reportedly performs well in the set-based association test of rare variants. Many studies have been conducted to measure phenotypes at multiple time points, but the standard KM methodology has only been available for phenotypes at a single time point. In addition, family-based designs have been widely used in genetic association studies; therefore, the data analysis method used must appropriately handle familial relatedness. A rare-variant test does not currently exist for longitudinal data from family samples. Therefore, in this paper, we aim to introduce an association test for rare variants, which includes multiple longitudinal phenotype measurements for either population or family samples. METHODS This approach uses KM regression based on the linear mixed model framework and is applicable to longitudinal data from either population (L-KM) or family samples (LF-KM). RESULTS In our population-based simulation studies, L-KM has good control of Type I error rate and increased power in all the scenarios we considered compared with other competing methods. Conversely, in the family-based simulation studies, we found an inflated Type I error rate when L-KM was applied directly to the family samples, whereas LF-KM retained the desired Type I error rate and had the best power performance overall. Finally, we illustrate the utility of our proposed LF-KM approach by analyzing data from an association study between rare variants and blood pressure from the Genetic Analysis Workshop 18 (GAW18). CONCLUSION We propose a method for rare-variant association testing in population and family samples using phenotypes measured at multiple time points for each subject. The proposed method has the best power performance compared to competing approaches in our simulation study.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa., USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Abstract
BACKGROUND Longitudinal phenotypic data provides a rich potential resource for genetic studies which may allow for greater understanding of variants and their covariates over time. Herein, we review 3 longitudinal analytical approaches from the Genetic Analysis Workshop 19 (GAW19). These contributions investigated both genome-wide association (GWA) and whole genome sequence (WGS) data from odd numbered chromosomes on up to 4 time points for blood pressure-related phenotypes. The statistical models used included generalized estimating equations (GEEs), latent class growth modeling (LCGM), linear mixed-effect (LME), and variance components (VC). The goal of these analyses was to test statistical approaches that use repeat measurements to increase genetic signal for variant identification. RESULTS Two analytical methods were applied to the GAW19: GWA using real phenotypic data, and one approach to WGS using 200 simulated replicates. The first GWA approach applied a GEE-based model to identify gene-based associations with 4 derived hypertension phenotypes. This GEE model identified 1 significant locus, GRM7, which passed multiple test corrections for 2 hypertension-derived traits. The second GWA approach employed the LME to estimate genetic associations with systolic blood pressure (SBP) change trajectories identified using LCGM. This LCGM method identified 5 SBP trajectories and association analyses identified a genome-wide significant locus, near ATOX1 (p = 1.0E(-8)). Finally, a third VC-based model using WGS and simulated SBP phenotypes that constrained the β coefficient for a genetic variant across each time point was calculated and compared to an unconstrained approach. This constrained VC approach demonstrated increased power for WGS variants of moderate effect, but when larger genetic effects were present, averaging across time points was as effective. CONCLUSION In this paper, we summarize 3 GAW19 contributions applying novel statistical methods and testing previously proposed techniques under alternative conditions for longitudinal genetic association. We conclude that these approaches when appropriately applied have the potential to: (a) increase statistical power; (b) decrease trait heterogeneity and standard error;
Collapse
Affiliation(s)
- Yen-Feng Chiu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan, ROC.
| | - Anne E Justice
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, 27514, USA.
| | - Phillip E Melton
- Centre for Genetic Origins of Health and Disease, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|
27
|
Chien LC, Hsu FC, Bowden DW, Chiu YF. Generalization of Rare Variant Association Tests for Longitudinal Family Studies. Genet Epidemiol 2016; 40:101-12. [PMID: 26783077 DOI: 10.1002/gepi.21951] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 11/19/2015] [Accepted: 11/19/2015] [Indexed: 11/06/2022]
Abstract
Given the functional relevance of many rare variants, their identification is frequently critical for dissecting disease etiology. Functional variants are likely to be aggregated in family studies enriched with affected members, and this aggregation increases the statistical power to detect rare variants associated with a trait of interest. Longitudinal family studies provide additional information for identifying genetic and environmental factors associated with disease over time. However, methods to analyze rare variants in longitudinal family data remain fairly limited. These methods should be capable of accounting for different sources of correlations and handling large amounts of sequencing data efficiently. To identify rare variants associated with a phenotype in longitudinal family studies, we extended pedigree-based burden (BT) and kernel (KS) association tests to genetic longitudinal studies. Generalized estimating equation (GEE) approaches were used to generalize the pedigree-based BT and KS to multiple correlated phenotypes under the generalized linear model framework, adjusting for fixed effects of confounding factors. These tests accounted for complex correlations between repeated measures of the same phenotype (serial correlations) and between individuals in the same family (familial correlations). We conducted comprehensive simulation studies to compare the proposed tests with mixed-effects models and marginal models, using GEEs under various configurations. When the proposed tests were applied to data from the Diabetes Heart Study, we found exome variants of POMGNT1 and JAK1 genes were associated with type 2 diabetes.
Collapse
Affiliation(s)
- Li-Chu Chien
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Fang-Chi Hsu
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Donald W Bowden
- Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America.,Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America.,Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan
| |
Collapse
|
28
|
Sung Y, Feng Z, Subedi S. A genome-wide association study of multiple longitudinal traits with related subjects. Stat (Int Stat Inst) 2016; 5:22-44. [PMID: 27134745 DOI: 10.1002/sta4.102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Pleiotropy is a phenomenon that a single gene inflicts multiple correlated phenotypic effects, often characterized as traits, involving multiple biological systems. We propose a two-stage method to identify pleiotropic effects on multiple longitudinal traits from a family-based data set. The first stage analyzes each longitudinal trait via a three-level mixed-effects model. Random effects at the subject-level and at the family-level measure the subject-specific genetic effects and between-subjects intraclass correlations within families, respectively. The second stage performs a simultaneous association test between a single nucleotide polymorphism and all subject-specific effects for multiple longitudinal traits. This is performed using a quasi-likelihood scoring method in which the correlation structure among related subjects is adjusted. Two simulation studies for the proposed method are undertaken to assess both the type I error control and the power. Furthermore, we demonstrate the utility of the two-stage method in identifying pleiotropic genes or loci by analyzing the Genetic Analysis Workshop 16 Problem 2 cohort data drawn from the Framingham Heart Study and illustrate an example of the kind of complexity in data that can be handled by the proposed approach. We establish that our two-stage method can identify pleiotropic effects whilst accommodating varying data types in the model.
Collapse
Affiliation(s)
- Yubin Sung
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Zeny Feng
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Sanjeena Subedi
- Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| |
Collapse
|
29
|
Yi G, Shen M, Yuan J, Sun C, Duan Z, Qu L, Dou T, Ma M, Lu J, Guo J, Chen S, Qu L, Wang K, Yang N. Genome-wide association study dissects genetic architecture underlying longitudinal egg weights in chickens. BMC Genomics 2015; 16:746. [PMID: 26438435 PMCID: PMC4595193 DOI: 10.1186/s12864-015-1945-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 09/22/2015] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND As a major economic trait in chickens, egg weight (EW) receives widespread interests in breeding, production and consumption. However, limited information is available for underlying genetic architecture of longitudinal trend in EW. Herein, we measured EWs at nine time points from onset of laying to 60 week of age, and conducted comprehensive genome-wide association studies (GWAS) in 1,534 F2 hens derived from reciprocal crosses between White Leghorn and Dongxiang chickens. RESULTS Egg weights at all ages except the first egg weight (FEW) exhibited high SNP-based heritability estimates (0.47~0.60). Strong pair-wise genetic correlations (0.77~1.00) were found among all EWs. Nine separate univariate genome-wide screens suggested 73 signals showing significant associations with longitudinal EWs. After multivariate and conditional analyses, four variants on three chromosomes remained independent contributions. The minor alleles at two loci exerted consistent and positive substitution effects on EWs, and other two were negative. The four loci together accounted for 3.84 % of the phenotypic variance for FEW and 7.29~11.06 % for EWs from 32 to 60 week of age. We obtained five candidate genes, of which NCAPG harbors a non-synonymous SNP (rs14491030) causing a valine-to-alanine amino-acid substitution. Genome partitioning analysis indicated a strong linear correlation between the variance explained by each chromosome and its length, which provided evidence that EW follows a highly polygenic nature of inheritance. CONCLUSIONS Identification of significant genetic causes that together implicate EWs at different ages will greatly advance our understanding of the genetic basis behind longitudinal EWs, and would be helpful to illuminate the future breeding direction on how to select desired egg size.
Collapse
Affiliation(s)
- Guoqiang Yi
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Manman Shen
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Jingwei Yuan
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Congjiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Zhongyi Duan
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Liang Qu
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Taocun Dou
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Meng Ma
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Jian Lu
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Jun Guo
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Sirui Chen
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Lujiang Qu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Kehua Wang
- Jiangsu Institute of Poultry Science, Yangzhou, Jiangsu, 225125, China.
| | - Ning Yang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
30
|
He Z, Zhang M, Lee S, Smith JA, Guo X, Palmas W, Kardia SLR, Diez Roux AV, Mukherjee B. Set-based tests for genetic association in longitudinal studies. Biometrics 2015; 71:606-15. [PMID: 25854837 DOI: 10.1111/biom.12310] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 01/01/2015] [Accepted: 02/01/2015] [Indexed: 11/30/2022]
Abstract
Genetic association studies with longitudinal markers of chronic diseases (e.g., blood pressure, body mass index) provide a valuable opportunity to explore how genetic variants affect traits over time by utilizing the full trajectory of longitudinal outcomes. Since these traits are likely influenced by the joint effect of multiple variants in a gene, a joint analysis of these variants considering linkage disequilibrium (LD) may help to explain additional phenotypic variation. In this article, we propose a longitudinal genetic random field model (LGRF), to test the association between a phenotype measured repeatedly during the course of an observational study and a set of genetic variants. Generalized score type tests are developed, which we show are robust to misspecification of within-subject correlation, a feature that is desirable for longitudinal analysis. In addition, a joint test incorporating gene-time interaction is further proposed. Computational advancement is made for scalable implementation of the proposed methods in large-scale genome-wide association studies (GWAS). The proposed methods are evaluated through extensive simulation studies and illustrated using data from the Multi-Ethnic Study of Atherosclerosis (MESA). Our simulation results indicate substantial gain in power using LGRF when compared with two commonly used existing alternatives: (i) single marker tests using longitudinal outcome and (ii) existing gene-based tests using the average value of repeated measurements as the outcome.
Collapse
Affiliation(s)
- Zihuai He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Xiuqing Guo
- Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California, U.S.A
| | - Walter Palmas
- Department of Medicine, Columbia University, New York, New York, U.S.A
| | - Sharon L R Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Ana V Diez Roux
- Department of Epidemiology, Drexel University, Philadelphia, U.S.A
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| |
Collapse
|
31
|
Brodt A, Botzman M, David E, Gat-Viks I. Dissecting dynamic genetic variation that controls temporal gene response in yeast. PLoS Comput Biol 2014; 10:e1003984. [PMID: 25474467 PMCID: PMC4256076 DOI: 10.1371/journal.pcbi.1003984] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 10/13/2014] [Indexed: 11/18/2022] Open
Abstract
Inter-individual variation in regulatory circuits controlling gene expression is a powerful source of functional information. The study of associations among genetic variants and gene expression provides important insights about cell circuitry but cannot specify whether and when potential variants dynamically alter their genetic effect during the course of response. Here we develop a computational procedure that captures temporal changes in genetic effects, and apply it to analyze transcription during inhibition of the TOR signaling pathway in segregating yeast cells. We found a high-order coordination of gene modules: sets of genes co-associated with the same genetic variant and sharing a common temporal genetic effect pattern. The temporal genetic effects of some modules represented a single state-transitioning pattern; for example, at 10-30 minutes following stimulation, genetic effects in the phosphate utilization module attained a characteristic transition to a new steady state. In contrast, another module showed an impulse pattern of genetic effects; for example, in the poor nitrogen sources utilization module, a spike up of a genetic effect at 10-20 minutes following stimulation reflected inter-individual variation in the timing (rather than magnitude) of response. Our analysis suggests that the same mechanism typically leads to both inter-individual variation and the temporal genetic effect pattern in a module. Our methodology provides a quantitative genetic approach to studying the molecular mechanisms that shape dynamic changes in transcriptional responses.
Collapse
Affiliation(s)
- Avital Brodt
- Department of Cell Research and Immunology, Tel Aviv University, Tel Aviv, Israel
| | - Maya Botzman
- Department of Cell Research and Immunology, Tel Aviv University, Tel Aviv, Israel
| | - Eyal David
- Department of Cell Research and Immunology, Tel Aviv University, Tel Aviv, Israel
| | - Irit Gat-Viks
- Department of Cell Research and Immunology, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
32
|
Wu Z, Hu Y, Melton PE. Longitudinal data analysis for genetic studies in the whole-genome sequencing era. Genet Epidemiol 2014; 38 Suppl 1:S74-80. [PMID: 25112193 DOI: 10.1002/gepi.21829] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The analysis of whole-genome sequence (WGS) data using longitudinal phenotypes offers a potentially rich resource for the examination of the genetic variants and their covariates that affect complex phenotypes over time. We summarize eight contributions to the Genetic Analysis Workshop 18, which applied a diverse array of statistical genetic methods to analyze WGS data in combination with data from genome-wide association studies (GWAS) from up to four different time points on blood pressure phenotypes. The common goal of these analyses was to develop and apply appropriate methods that utilize longitudinal repeated measures to potentially increase the analytic efficiency of WGS and GWAS data. These diverse methods can be grouped into two categories, based on the way they model dependence structures: (1) linear mixed-effects (LME) models, where the random effect terms in the linear models are used to capture the dependence structures; and (2) variance-components models, where the dependence structures are constructed directly based on multiple components of variance-covariance matrices for the multivariate Gaussian responses. Despite the heterogeneous nature of these analytical methods, the group came to the following conclusions: (1) the use of repeat measurements can gain power to identify variants associated with the phenotype; (2) the inclusion of family data may correct genotyping errors and allow for more accurate detection of rare variants than using unrelated individuals only; and (3) fitting mixed-effects and variance-components models for longitudinal data presents computational challenges. The challenges and computational burden demanded by WGS data were addressed in the eight contributions.
Collapse
Affiliation(s)
- Zheyang Wu
- Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, Massachusetts, United States of America
| | | | | |
Collapse
|
33
|
Wang W, Feng Z, Bull SB, Wang Z. A 2-step strategy for detecting pleiotropic effects on multiple longitudinal traits. Front Genet 2014; 5:357. [PMID: 25368629 PMCID: PMC4202779 DOI: 10.3389/fgene.2014.00357] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 09/25/2014] [Indexed: 12/13/2022] Open
Abstract
Genetic pleiotropy refers to the situation in which a single gene influences multiple traits and so it is considered as a major factor that underlies genetic correlation among traits. To identify pleiotropy, an important focus in genome-wide association studies (GWAS) is on finding genetic variants that are simultaneously associated with multiple traits. On the other hand, longitudinal designs are often employed in many complex disease studies, such that, traits are measured repeatedly over time within the same subject. Performing genetic association analysis simultaneously on multiple longitudinal traits for detecting pleiotropic effects is interesting but challenging. In this paper, we propose a 2-step method for simultaneously testing the genetic association with multiple longitudinal traits. In the first step, a mixed effects model is used to analyze each longitudinal trait. We focus on estimation of the random effect that accounts for the subject-specific genetic contribution to the trait; fixed effects of other confounding covariates are also estimated. This first step enables separation of the genetic effect from other confounding effects for each subject and for each longitudinal trait. Then in the second step, we perform a simultaneous association test on multiple estimated random effects arising from multiple longitudinal traits. The proposed method can efficiently detect pleiotropic effects on multiple longitudinal traits and can flexibly handle traits of different data types such as quantitative, binary, or count data. We apply this method to analyze the 16th Genetic Analysis Workshop (GAW16) Framingham Heart Study (FHS) data. A simulation study is also conducted to validate this 2-step method and evaluate its performance.
Collapse
Affiliation(s)
- Weiqiang Wang
- Department of Mathematics and Statistics, University of Guelph Guelph, ON, Canada
| | - Zeny Feng
- Department of Mathematics and Statistics, University of Guelph Guelph, ON, Canada
| | - Shelley B Bull
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital, Prosserman Centre for Health Research Toronto, ON, Canada ; Dalla Lana School of Public Health, University of Toronto Toronto, ON, Canada
| | - Zuoheng Wang
- Division of Biostatistics, Yale School of Public Health New Haven, CT, USA
| |
Collapse
|
34
|
Functional multi-locus QTL mapping of temporal trends in Scots pine wood traits. G3-GENES GENOMES GENETICS 2014; 4:2365-79. [PMID: 25305041 PMCID: PMC4267932 DOI: 10.1534/g3.114.014068] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Quantitative trait loci (QTL) mapping of wood properties in conifer species has focused on single time point measurements or on trait means based on heterogeneous wood samples (e.g., increment cores), thus ignoring systematic within-tree trends. In this study, functional QTL mapping was performed for a set of important wood properties in increment cores from a 17-yr-old Scots pine (Pinus sylvestris L.) full-sib family with the aim of detecting wood trait QTL for general intercepts (means) and for linear slopes by increasing cambial age. Two multi-locus functional QTL analysis approaches were proposed and their performances were compared on trait datasets comprising 2 to 9 time points, 91 to 455 individual tree measurements and genotype datasets of amplified length polymorphisms (AFLP), and single nucleotide polymorphism (SNP) markers. The first method was a multilevel LASSO analysis whereby trend parameter estimation and QTL mapping were conducted consecutively; the second method was our Bayesian linear mixed model whereby trends and underlying genetic effects were estimated simultaneously. We also compared several different hypothesis testing methods under either the LASSO or the Bayesian framework to perform QTL inference. In total, five and four significant QTL were observed for the intercepts and slopes, respectively, across wood traits such as earlywood percentage, wood density, radial fiberwidth, and spiral grain angle. Four of these QTL were represented by candidate gene SNPs, thus providing promising targets for future research in QTL mapping and molecular function. Bayesian and LASSO methods both detected similar sets of QTL given datasets that comprised large numbers of individuals.
Collapse
|
35
|
Eu-ahsunthornwattana J, Miller EN, Fakiola M, Jeronimo SMB, Blackwell JM, Cordell HJ. Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS Genet 2014; 10:e1004445. [PMID: 25033443 PMCID: PMC4102448 DOI: 10.1371/journal.pgen.1004445] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Accepted: 05/02/2014] [Indexed: 11/23/2022] Open
Abstract
Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use. Recently, statistical approaches known as linear mixed models (LMMs) have become popular for analysing data from genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it has not always been clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several different LMM approaches (and software implementations) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals. We also compare the LMM results to those obtained using alternative analysis methods. Overall, we find strong concordance between the results from the different LMM approaches and high correlation between the results from LMMs and most alternative approaches. We conclude that LMM approaches perform well in comparison to competing approaches and, in most applications, the precise LMM implementation will not be too important, and can be chosen on the basis of speed or convenience.
Collapse
Affiliation(s)
- Jakris Eu-ahsunthornwattana
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Newcastle upon Tyne, United Kingdom
- Division of Medical Genetics, Department of Internal Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Ratchathevi, Bangkok, Thailand
| | - E. Nancy Miller
- Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, United Kingdom
| | - Michaela Fakiola
- Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, United Kingdom
| | | | - Selma M. B. Jeronimo
- Department of Biochemistry, Center for Biosciences, Universidade Federal do Rio Grande do Norte, Natal, Brazil
| | - Jenefer M. Blackwell
- Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, United Kingdom
- Telethon Institute for Child Health Research, Centre for Child Health Research, The University of Western Australia, Subiaco, Western Australia, Australia
| | - Heather J. Cordell
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Newcastle upon Tyne, United Kingdom
- * E-mail:
| |
Collapse
|
36
|
Wang S, Fang S, Sha Q, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants with longitudinal data. BMC Proc 2014; 8:S91. [PMID: 25519418 PMCID: PMC4143720 DOI: 10.1186/1753-6561-8-s1-s91] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Increasing evidence shows that complex diseases are caused by both common and rare variants. Recently, several statistical methods for detecting associations of rare variants have been developed, including the test for testing the effect of an optimally weighted combination of variants (TOW) developed by our group in 2012. These methodologies consider phenotype measurement at only one time point. Because many sequence data have been developed on population cohorts that contain phenotype measurements at multiple time points, such as the data set provided in the Genetic Analysis Workshop 18 (GAW18), we extend TOW from phenotype measurement at one time point to phenotype measurements at multiple time points. We then apply the newly proposed method to the GAW18 data set and compare the power of the new method with TOW using only one phenotype measurement. The application results show that the newly proposed method jointly modeling phenotype measurements at all time points has increased power over TOW.
Collapse
Affiliation(s)
- Shuaicheng Wang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| | - Shurong Fang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI 49931, USA
| |
Collapse
|
37
|
Abstract
Statistical genetic methods incorporating temporal variation allow for greater understanding of genetic architecture and consistency of biological variation influencing development of complex diseases. This study proposes a bivariate association method jointly testing association of two quantitative phenotypic measures from different time points. Measured genotype association was analyzed for single-nucleotide polymorphisms (SNPs) for systolic blood pressure (SBP) from the first and third visits using 200 simulated Genetic Analysis Workshop 18 (GAW18) replicates. Bivariate association, in which the effect of an SNP on the mean trait values of the two phenotypes is constrained to be equal for both measures and is included as a covariate in the analysis, was compared with a bivariate analysis in which the effect of an SNP was estimated separately for the two measures and univariate association analyses in 9 SNPs that explained greater than 0.001% SBP variance over all 200 GAW18 replicates.The SNP 3_48040283 was significantly associated with SBP in all 200 replicates with the constrained bivariate method providing increased signal over the unconstrained bivariate method. This method improved signal in all 9 SNPs with simulated effects on SBP for nominal significance (p-value <0.05). However, this appears to be determined by the effect size of the SNP on the phenotype. This bivariate association method applied to longitudinal data improves genetic signal for quantitative traits when the effect size of the variant is moderate to large.
Collapse
Affiliation(s)
- Phillip E Melton
- Centre for Genetic Origins of Health and Disease, University of Western Australia, Crawley, Australia
| | - Laura A Almasy
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, USA
| |
Collapse
|
38
|
Zhang Z, Hong Y, Gao J, Xiao S, Ma J, Zhang W, Ren J, Huang L. Genome-wide association study reveals constant and specific loci for hematological traits at three time stages in a White Duroc × Erhualian F2 resource population. PLoS One 2013; 8:e63665. [PMID: 23691082 PMCID: PMC3656948 DOI: 10.1371/journal.pone.0063665] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 04/05/2013] [Indexed: 11/18/2022] Open
Abstract
Hematological traits are important indicators of immune function and have been commonly examined as biomarkers of disease and disease severity in humans. Pig is an ideal biomedical model for human diseases due to its high degree of similarity with human physiological characteristics. Here, we conducted genome-wide association studies (GWAS) for 18 hematological traits at three growth stages (days 18, 46 and 240) in a White Duroc × Erhualian F2 intercross. In total, we identified 38 genome-wide significant regions containing 185 genome-wide significant SNPs by single-marker GWAS or LONG-GWAS. The significant regions are distributed on pig chromosomes (SSC) 1, 4, 5, 7, 8, 10, 11, 12, 13, 17 and 18, and most of significant SNPs reside on SSC7 and SSC8. Of the 38 significant regions, 7 show constant effects on hematological traits across the whole life stages, and 6 regions have time-specific effects on the measured traits at early or late stages. The most prominent locus is the genomic region between 32.36 and 84.49 Mb on SSC8 that is associated with multiple erythroid traits. The KIT gene in this region appears to be a promising candidate gene. The findings improve our understanding of the genetic architecture of hematological traits in pigs. Further investigations are warranted to characterize the responsible gene(s) and causal variant(s) especially for the major loci on SSC7 and SSC8.
Collapse
Affiliation(s)
- Zhiyan Zhang
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, Jiangxi Agricultural University, Nanchang, China
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Guo X, Liu D, Wen C, He M, Wang X. Incorporating heterogeneous parent-child environmental effects in biometrical genetic models. Stat Med 2013; 32:3501-8. [DOI: 10.1002/sim.5785] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2012] [Revised: 12/11/2012] [Accepted: 02/17/2013] [Indexed: 11/11/2022]
Affiliation(s)
- Xiaobo Guo
- Department of Statistics, School of Mathematics and Computational Science; Sun Yat-Sen University; Guangzhou GD 510275 China
| | - Dungang Liu
- Department of Biostatistics; Yale University School of Public Health; New Haven CT 06520 U.S.A
| | - Canhong Wen
- Department of Statistics, School of Mathematics and Computational Science; Sun Yat-Sen University; Guangzhou GD 510275 China
| | - Mingguang He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center; Sun Yat-Sen University; Guangzhou GD 510080 China
| | - Xueqin Wang
- Department of Statistics, School of Mathematics and Computational Science; Sun Yat-Sen University; Guangzhou GD 510275 China
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center; Sun Yat-Sen University; Guangzhou GD 510080 China
- Zhongshan School of Medicine; Sun Yat-Sen University; Guangzhou GD 510080 China
| |
Collapse
|