1
|
Burrows K, Heiskala A, Bradfield JP, Balkhiyarova Z, Ning L, Boissel M, Chan YM, Froguel P, Bonnefond A, Hakonarson H, Alves AC, Lawlor DA, Kaakinen M, Järvelin MR, Grant SF, Tilling K, Prokopenko I, Sebert S, Canouil M, Warrington NM. A framework for conducting time-varying genome-wide association studies: An application to body mass index across childhood in six multiethnic cohorts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.13.24304263. [PMID: 38559031 PMCID: PMC10980110 DOI: 10.1101/2024.03.13.24304263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Genetic effects on changes in human traits over time are understudied and may have important pathophysiological impact. We propose a framework that enables data quality control, implements mixed models to evaluate trajectories of change in traits, and estimates phenotypes to identify age-varying genetic effects in genome-wide association studies (GWASs). Using childhood body mass index (BMI) as an example, we included 71,336 participants from six cohorts and estimated the slope and area under the BMI curve within four time periods (infancy, early childhood, late childhood and adolescence) for each participant, in addition to the age and BMI at the adiposity peak and the adiposity rebound. GWAS on each of the estimated phenotypes identified 28 genome-wide significant variants at 13 loci across the 12 estimated phenotypes, one of which was novel (in DAOA) and had not been previously associated with childhood or adult BMI. Genetic studies of changes in human traits over time could uncover novel biological mechanisms influencing quantitative traits.
Collapse
Affiliation(s)
- Kimberley Burrows
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Anni Heiskala
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Jonathan P. Bradfield
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Quantinuum Research LLC, Wayne, PA, USA
| | - Zhanna Balkhiyarova
- Department of Clinical and Experimental Medicine, School of Biosciences and Medicine, University of Surrey, Guildford, UK
- People-Centred Artificial Intelligence Institute, University of Surrey, Guildford, UK
- Section of Metabolism, Digestion and Reproduction, Department of Medicine, Imperial College London, London, UK
| | - Lijiao Ning
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Mathilde Boissel
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Yee-Ming Chan
- Division of Endocrinology, Department of Pediatrics, Boston Children’s Hospital
- Department of Pediatrics, Harvard Medical School
| | - Philippe Froguel
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, United Kingdom
| | - Amelie Bonnefond
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, United Kingdom
| | - Hakon Hakonarson
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | | | - Deborah A Lawlor
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Marika Kaakinen
- Department of Clinical and Experimental Medicine, School of Biosciences and Medicine, University of Surrey, Guildford, UK
- People-Centred Artificial Intelligence Institute, University of Surrey, Guildford, UK
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, UK
| | - Marjo-Riitta Järvelin
- Research Unit of Population Health, University of Oulu, Oulu, Finland
- MRC Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, United Kingdom
- Department of Life Sciences, College of Health and Life Sciences, Brunel University London, London, United Kingdom
| | - Struan F.A. Grant
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Divisions of Human Genetics and Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kate Tilling
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
| | - Inga Prokopenko
- Department of Clinical and Experimental Medicine, School of Biosciences and Medicine, University of Surrey, Guildford, UK
- People-Centred Artificial Intelligence Institute, University of Surrey, Guildford, UK
| | - Sylvain Sebert
- Research Unit of Population Health, University of Oulu, Oulu, Finland
| | - Mickaël Canouil
- Univ Lille, INSERM/CNRS UMR1283/8199, EGID, Institut Pasteur de Lille, Lille University Hospital, Lille, France
| | - Nicole M Warrington
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
- Frazer Institute, University of Queensland, Brisbane, Australia
| |
Collapse
|
2
|
Abdel-Azim G, Patel P, Li S, Guo S, Black MH. Fast multiple-trait genome-wide association analysis for correlated longitudinal measurements. Sci Rep 2023; 13:20603. [PMID: 37996550 PMCID: PMC10667366 DOI: 10.1038/s41598-023-47555-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023] Open
Abstract
Large-scale longitudinal biobank data can be leveraged to identify genetic variation contributing to human diseases progression and traits trajectories. While methods for genome-wide association studies (GWAS) of multiple correlated traits have been proposed, an efficient multiple-trait approach to model longitudinal phenotypes is not currently available. We developed GAMUT, a genome-wide association approach for multiple longitudinal traits. GAMUT employs a mixed-effects model to fit longitudinal outcomes where a fast algorithm for inversion by recursive partitioning of the random effects submatrix is introduced. To evaluate performance of the algorithms introduced and assess their statistical power and type I error, stochastic simulation was conducted. Consistent with our expectation, power was greater for cross-sectional (CS) than longitudinal (LT) effects, particularly with a diminishing LT/CS ratio. With a minimum minor allele count of 3 within genotype by time categories, observed type I error was roughly equal to theoretical genome-wide significance. Additionally, 28 blood-based biomarkers measured at 2 time points on participants of the UK Biobank were used to compare GAMUT against single-trait standard and longitudinal GWAS (including rate of change). Across all biomarkers, we observed 539 (CS) and 248 (LT) significant independent variants for the GAMUT method, and 513 (CS) and 30 (LT) for single-trait longitudinal GWAS, respectively. Only 37 variants were identified by modeling rates of change using standard GWAS.
Collapse
Affiliation(s)
| | - Parth Patel
- Janssen Res. & Dev. (Johnson & Johnson), Spring House, PA, USA
| | - Shuwei Li
- Janssen Res. & Dev. (Johnson & Johnson), Spring House, PA, USA
| | - Shicheng Guo
- Janssen Res. & Dev. (Johnson & Johnson), Spring House, PA, USA
| | | |
Collapse
|
3
|
Ta M, Blauwendraat C, Antar T, Leonard HL, Singleton AB, Nalls MA, Iwaki H. Genome-Wide Meta-Analysis of Cerebrospinal Fluid Biomarkers in Alzheimer's Disease and Parkinson's Disease Cohorts. Mov Disord 2023; 38:1697-1705. [PMID: 37539664 DOI: 10.1002/mds.29511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/14/2023] [Accepted: 05/30/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND Amyloid-β, phosphorylated tau (p-tau), and total tau (t-tau) in cerebrospinal fluid are established biomarkers for Alzheimer's disease (AD). In other neurodegenerative diseases, such as Parkinson's disease (PD), these biomarkers have also been found to be altered, and the molecular mechanisms responsible for these alterations are still under investigation. Moreover, the interplay between these mechanisms and the diverse underlying disease states remains to be elucidated. OBJECTIVE To investigate genetic contributions to the AD biomarkers and assess the commonality and heterogeneity of the associations per underlying disease status. METHODS We conducted genome-wide association studies (GWASs) for the AD biomarkers on subjects from the Parkinson's Progression Markers Initiative, the Fox Investigation for New Discovery of Biomarkers, and the Alzheimer's Disease Neuroimaging Initiative, and meta-analyzed with the largest AD GWAS. We tested heterogeneity of associations of interest between different disease statuses (AD, PD, and control). RESULTS We observed three GWAS signals: the APOE locus for amyloid-β, the 3q28 locus between GEMC1 and OSTN for p-tau and t-tau, and the 7p22 locus (top hit: rs60871478, an intronic variant for DNAAF5, also known as HEATR2) for p-tau. The 7p22 locus is novel and colocalized with the brain DNAAF5 expression. Although no heterogeneity from underlying disease status was observed for the earlier GWAS signals, some disease risk loci suggested disease-specific associations with these biomarkers. CONCLUSIONS Our study identified a novel association at the intronic region of DNAAF5 associated with increased levels of p-tau across all diseases. We also observed some disease-specific genetic associations with these biomarkers. Published 2023. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Michael Ta
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International, Washington, District of Columbia, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Tarek Antar
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Hampton L Leonard
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International, Washington, District of Columbia, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International, Washington, District of Columbia, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International, Washington, District of Columbia, USA
- Center for Alzheimer's and Related Dementias, National Institute of Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
4
|
Brossard M, Paterson AD, Espin-Garcia O, Craiu RV, Bull SB. Characterization of direct and/or indirect genetic associations for multiple traits in longitudinal studies of disease progression. Genetics 2023; 225:iyad119. [PMID: 37369448 DOI: 10.1093/genetics/iyad119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/07/2023] [Accepted: 06/19/2023] [Indexed: 06/29/2023] Open
Abstract
When quantitative longitudinal traits are risk factors for disease progression and subject to random biological variation, joint model analysis of time-to-event and longitudinal traits can effectively identify direct and/or indirect genetic association of single nucleotide polymorphisms (SNPs) with time-to-event. We present a joint model that integrates: (1) a multivariate linear mixed model describing trajectories of multiple longitudinal traits as a function of time, SNP effects, and subject-specific random effects and (2) a frailty Cox survival model that depends on SNPs, longitudinal trajectory effects, and subject-specific frailty accounting for dependence among multiple time-to-event traits. Motivated by complex genetic architecture of type 1 diabetes complications (T1DC) observed in the Diabetes Control and Complications Trial (DCCT), we implement a 2-stage approach to inference with bootstrap joint covariance estimation and develop a hypothesis testing procedure to classify direct and/or indirect SNP association with each time-to-event trait. By realistic simulation study, we show that joint modeling of 2 time-to-T1DC (retinopathy and nephropathy) and 2 longitudinal risk factors (HbA1c and systolic blood pressure) reduces estimation bias in genetic effects and improves classification accuracy of direct and/or indirect SNP associations, compared to methods that ignore within-subject risk factor variability and dependence among longitudinal and time-to-event traits. Through DCCT data analysis, we demonstrate feasibility for candidate SNP modeling and quantify effects of sample size and Winner's curse bias on classification for 2 SNPs identified as having indirect associations with time-to-T1DC traits. Joint analysis of multiple longitudinal and multiple time-to-event traits provides insight into complex traits architecture.
Collapse
Affiliation(s)
- Myriam Brossard
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto M5T 3L9, Ontario, Canada
| | - Andrew D Paterson
- Program in Genetics and Genome Biology, Hospital for Sick Children Research Institute, Toronto M5G 1X8, Ontario, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto M5T 3M7, Ontario, Canada
| | - Osvaldo Espin-Garcia
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto M5T 3M7, Ontario, Canada
- Department of Biostatistics, Princess Margaret Cancer Centre, Toronto M5G 2C1, Ontario, Canada
- Department of Statistical Sciences, University of Toronto, Toronto M5S 3G3, Ontario, Canada
- Department of Epidemiology and Biostatistics, Western University, London N6A 5C1, Ontario, Canada
| | - Radu V Craiu
- Department of Statistical Sciences, University of Toronto, Toronto M5S 3G3, Ontario, Canada
| | - Shelley B Bull
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto M5T 3L9, Ontario, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto M5T 3M7, Ontario, Canada
| |
Collapse
|
5
|
Ta M, Blauwendraat C, Antar T, Leonard HL, Singleton AB, Nalls MA, Iwaki H. Genome-wide meta-analysis of CSF biomarkers in Alzheimer's disease and Parkinson's disease cohorts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.13.23291354. [PMID: 37398091 PMCID: PMC10312859 DOI: 10.1101/2023.06.13.23291354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Background Amyloid beta (Aβ), phosphorylated tau (p-tau), and total tau (t-tau) in cerebrospinal fluid are established biomarkers for Alzheimer's disease (AD). In other neurodegenerative diseases, such as Parkinson's disease (PD), these biomarkers have also been found to be altered, and the molecular mechanisms responsible for these alterations are still under investigation. Moreover, the interplay between these mechanisms and the diverse underlying disease states remains to be elucidated. Objectives To investigate genetic contributions to the AD biomarkers and assess the commonality and heterogeneity of the associations per underlying disease status. Methods We conducted GWAS for the AD biomarkers on subjects from the Parkinson's Progression Markers Initiative (PPMI), the Fox Investigation for New Discovery of Biomarkers (BioFIND), and the Alzheimer's Disease Neuroimaging Initiative (ADNI) and meta-analyzed with the largest AD GWAS.[7] We tested heterogeneity of associations of interest between different disease statuses (AD, PD, and control). Results We observed three GWAS signals: the APOE locus for Aβ, the 3q28 locus between GEMC1 and OSTN for p-tau and t-tau, and the 7p22 locus (top hit: rs60871478, an intronic variant for DNAAF5 , also known as HEATR2 ) for p-tau. The 7p22 locus is novel and co-localized with the brain DNAAF5 expression. While no heterogeneity from underlying disease status was observed for the above GWAS signals, some disease risk loci suggested disease specific associations with these biomarkers. Conclusions Our study identified a novel association at the intronic region of DNAAF5 associated with increased levels of p-tau across all diseases. We also observed some disease specific genetic associations with these biomarkers.
Collapse
|
6
|
Ko S, German CA, Jensen A, Shen J, Wang A, Mehrotra DV, Sun YV, Sinsheimer JS, Zhou H, Zhou JJ. GWAS of longitudinal trajectories at biobank scale. Am J Hum Genet 2022; 109:433-445. [PMID: 35196515 PMCID: PMC8948167 DOI: 10.1016/j.ajhg.2022.01.018] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 01/25/2022] [Indexed: 12/12/2022] Open
Abstract
Biobanks linked to massive, longitudinal electronic health record (EHR) data make numerous new genetic research questions feasible. One among these is the study of biomarker trajectories. For example, high blood pressure measurements over visits strongly predict stroke onset, and consistently high fasting glucose and Hb1Ac levels define diabetes. Recent research reveals that not only the mean level of biomarker trajectories but also their fluctuations, or within-subject (WS) variability, are risk factors for many diseases. Glycemic variation, for instance, is recently considered an important clinical metric in diabetes management. It is crucial to identify the genetic factors that shift the mean or alter the WS variability of a biomarker trajectory. Compared to traditional cross-sectional studies, trajectory analysis utilizes more data points and captures a complete picture of the impact of time-varying factors, including medication history and lifestyle. Currently, there are no efficient tools for genome-wide association studies (GWASs) of biomarker trajectories at the biobank scale, even for just mean effects. We propose TrajGWAS, a linear mixed effect model-based method for testing genetic effects that shift the mean or alter the WS variability of a biomarker trajectory. It is scalable to biobank data with 100,000 to 1,000,000 individuals and many longitudinal measurements and robust to distributional assumptions. Simulation studies corroborate that TrajGWAS controls the type I error rate and is powerful. Analysis of eleven biomarkers measured longitudinally and extracted from UK Biobank primary care data for more than 150,000 participants with 1,800,000 observations reveals loci that significantly alter the mean or WS variability.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Christopher A. German
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Aubrey Jensen
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| | - Anran Wang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| | - Devan V. Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| | - Yan V. Sun
- Department of Epidemiology, Emory University, Atlanta, GA 30322, USA
| | - Janet S. Sinsheimer
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Hua Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| | - Jin J. Zhou
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ 85721, USA,Corresponding author
| |
Collapse
|
7
|
A genome-wide association study of the longitudinal course of executive functions. Transl Psychiatry 2021; 11:386. [PMID: 34247186 PMCID: PMC8272719 DOI: 10.1038/s41398-021-01510-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/15/2021] [Indexed: 01/13/2023] Open
Abstract
Executive functions are metacognitive capabilities that control and coordinate mental processes. In the transdiagnostic PsyCourse Study, comprising patients of the affective-to-psychotic spectrum and controls, we investigated the genetic basis of the time course of two core executive subfunctions: set-shifting (Trail Making Test, part B (TMT-B)) and updating (Verbal Digit Span backwards) in 1338 genotyped individuals. Time course was assessed with four measurement points, each 6 months apart. Compared to the initial assessment, executive performance improved across diagnostic groups. We performed a genome-wide association study to identify single nucleotide polymorphisms (SNPs) associated with performance change over time by testing for SNP-by-time interactions using linear mixed models. We identified nine genome-wide significant SNPs for TMT-B in strong linkage disequilibrium with each other on chromosome 5. These were associated with decreased performance on the continuous TMT-B score across time. Variant rs150547358 had the lowest P value = 7.2 × 10-10 with effect estimate beta = 1.16 (95% c.i.: 1.11, 1.22). Implementing data of the FOR2107 consortium (1795 individuals), we replicated these findings for the SNP rs150547358 (P value = 0.015), analyzing the difference of the two available measurement points two years apart. In the replication study, rs150547358 exhibited a similar effect estimate beta = 0.85 (95% c.i.: 0.74, 0.97). Our study demonstrates that longitudinally measured phenotypes have the potential to unmask novel associations, adding time as a dimension to the effects of genomics.
Collapse
|
8
|
Liang Y, Li B, Zhang Q, Zhang S, He X, Jiang L, Jin Y. Interaction analyses based on growth parameters of GWAS between Escherichia coli and Staphylococcus aureus. AMB Express 2021; 11:34. [PMID: 33646434 PMCID: PMC7921238 DOI: 10.1186/s13568-021-01192-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 02/09/2021] [Indexed: 01/02/2023] Open
Abstract
To accurately explore the interaction mechanism between Escherichia coli and Staphylococcus aureus, we designed an ecological experiment to monoculture and co-culture E. coli and S. aureus. We co-cultured 45 strains of E. coli and S. aureus, as well as each species individually to measure growth over 36 h. We implemented a genome wide association study (GWAS) based on growth parameters (λ, R, A and s) to identify significant single nucleotide polymorphisms (SNPs) of the bacteria. Three commonly used growth regression equations, Logistic, Gompertz, and Richards, were used to fit the bacteria growth data of each strain. Then each equation's Akaike's information criterion (AIC) value was calculated as a commonly used information criterion. We used the optimal growth equation to estimate the four parameters above for strains in co-culture. By plotting the estimates for each parameter across two strains, we can visualize how growth parameters respond ecologically to environment stimuli. We verified that different genotypes of bacteria had different growth trajectories, although they were the same species. We reported 85 and 52 significant SNPs that were associated with interaction in E. coli and S. aureus, respectively. Many significant genes might play key roles in interaction, such as yjjW, dnaK, aceE, tatD, ftsA, rclR, ftsK, fepA in E. coli, and scdA, trpD, sdrD, SAOUHSC_01219 in S. aureus. Our study illustrated that there were multiple genes working together to affect bacterial interaction, and laid a solid foundation for the later study of more complex inter-bacterial interaction mechanisms.
Collapse
|
9
|
Yuan M, Zhu Z, Yang Y, Zhao M, Sasser K, Hamadeh H, Pinheiro J, Xu XS. Efficient algorithms for covariate analysis with dynamic data using nonlinear mixed-effects model. Stat Methods Med Res 2020; 30:233-243. [PMID: 32838650 DOI: 10.1177/0962280220949898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Nonlinear mixed-effects modeling is one of the most popular tools for analyzing repeated measurement data, particularly for applications in the biomedical fields. Multiple integration and nonlinear optimization are the two major challenges for likelihood-based methods in nonlinear mixed-effects modeling. To solve these problems, approaches based on empirical Bayesian estimates have been proposed by breaking the problem into a nonlinear mixed-effects model with no covariates and a linear regression model without random effect. This approach is time-efficient as it involves no covariates in the nonlinear optimization. However, covariate effects based on empirical Bayesian estimates are underestimated and the bias depends on the extent of shrinkage. Marginal correction method has been proposed to correct the bias caused by shrinkage to some extent. However, the marginal approach appears to be suboptimal when testing covariate effects on multiple model parameters, a situation that is often encountered in real-world data analysis. In addition, the marginal approach cannot correct the inaccuracy in the associated p-values. In this paper, we proposed a simultaneous correction method (nSCEBE), which can handle the situation where covariate analysis is performed on multiple model parameters. Simulation studies and real data analysis showed that nSCEBE is accurate and efficient for both effect-size estimation and p-value calculation compared with the existing methods. Importantly, nSCEBE can be >2000 times faster than the standard mixed-effects models, potentially allowing utilization for high-dimension covariate analysis for longitudinal or repeated measured outcomes.
Collapse
Affiliation(s)
- Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei, China
| | - Zhi Zhu
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, China
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, China
| | - Minghua Zhao
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, China
| | | | | | | | | |
Collapse
|
10
|
Yuan M, Xu XS, Yang Y, Zhou Y, Li Y, Xu J, Pinheiro J. SCEBE: an efficient and scalable algorithm for genome-wide association studies on longitudinal outcomes with mixed-effects modeling. Brief Bioinform 2020; 22:5868073. [PMID: 32634825 DOI: 10.1093/bib/bbaa130] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 05/18/2020] [Accepted: 05/28/2020] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association studies (GWAS) using longitudinal phenotypes collected over time is appealing due to the improvement of power. However, computation burden has been a challenge because of the complex algorithms for modeling the longitudinal data. Approximation methods based on empirical Bayesian estimates (EBEs) from mixed-effects modeling have been developed to expedite the analysis. However, our analysis demonstrated that bias in both association test and estimation for the existing EBE-based methods remains an issue. We propose an incredibly fast and unbiased method (simultaneous correction for EBE, SCEBE) that can correct the bias in the naive EBE approach and provide unbiased P-values and estimates of effect size. Through application to Alzheimer's Disease Neuroimaging Initiative data with 6 414 695 single nucleotide polymorphisms, we demonstrated that SCEBE can efficiently perform large-scale GWAS with longitudinal outcomes, providing nearly 10 000 times improvement of computational efficiency and shortening the computation time from months to minutes. The SCEBE package and the example datasets are available at https://github.com/Myuan2019/SCEBE.
Collapse
Affiliation(s)
- Min Yuan
- Anhui Medical University, Anhui, China
| | | | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Heifei, China
| | - Yinsheng Zhou
- Department of Statistics and Finance, University of Science and Technology of China, Heifei, China
| | - Yi Li
- Department of Statistics and Finance, University of Science and Technology of China, Heifei, China
| | - Jinfeng Xu
- Department of Statistics and Actuarial Science, University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Jose Pinheiro
- Janssen Research and Development LLC, Raritan, NJ, USA
| | | |
Collapse
|