1
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
2
|
Co-Inheritance of Variation in All-Cause Mortality and Biochemical Risk Factors. Twin Res Hum Genet 2022; 25:107-114. [PMID: 35818962 DOI: 10.1017/thg.2022.25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Biomarkers may be useful endophenotypes for genetic studies if they share genetic sources of variation with the outcome, for example, with all-cause mortality. Australian adult study participants who had reported their parental survival information were included in the study: 14,169 participants had polygenic risk scores (PRS) from genotyping and up to 13,365 had biomarker results. We assessed associations between participants' biomarker results and parental survival, and between biomarker results and eight parental survival PRS at varying p-value cut-offs. Survival in parents was associated with participants' serum bilirubin, C-reactive protein, HDL cholesterol, triglycerides and uric acid, and with LDL cholesterol for participants' fathers but not for their mothers. PRS for all-cause mortality were associated with liver function tests (alkaline phosphatase, butyrylcholinesterase, gamma-glutamyl transferase), metabolic tests (LDL and HDL cholesterol, triglycerides, uric acid), and acute-phase reactants (C-reactive protein, globulins). Association between offspring biomarker results and parental survival demonstrates the existence of familial effects common to both, while associations between biomarker results and PRS for mortality favor at least a partial genetic cause of this covariation. Identification of genetic loci affecting mortality-associated biomarkers offers a route to the identification of additional loci affecting mortality.
Collapse
|
3
|
Salvaña MLO, Lenzi A, Genton MG. Spatio-Temporal Cross-Covariance Functions under the Lagrangian Framework with Multiple Advections. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2078330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Mary Lai O. Salvaña
- Statistics Program, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Amanda Lenzi
- Statistics Program, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Marc G. Genton
- Statistics Program, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
4
|
Kalka IN, Gavrieli A, Shilo S, Rossman H, Artzi NS, Yacovzada NS, Segal E. Estimating heritability of glycaemic response to metformin using nationwide electronic health records and population-sized pedigree. COMMUNICATIONS MEDICINE 2021; 1:55. [PMID: 35602224 PMCID: PMC9053254 DOI: 10.1038/s43856-021-00058-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 11/09/2021] [Indexed: 11/10/2022] Open
Abstract
Background Variability of response to medication is a well-known phenomenon, determined by both environmental and genetic factors. Understanding the heritable component of the response to medication is of great interest but challenging due to several reasons, including small study cohorts and computational limitations. Methods Here, we study the heritability of variation in the glycaemic response to metformin, first-line therapeutic agent for type 2 diabetes (T2D), by leveraging 18 years of electronic health records (EHR) data from Israel’s largest healthcare service provider, consisting of over five million patients of diverse ethnicities and socio-economic background. Our cohort consists of 80,788 T2D patients treated with metformin, with an accumulated number of 1,611,591 HbA1C measurements and 4,581,097 metformin prescriptions. We estimate the explained variance of glycated hemoglobin (HbA1c%) reduction due to inheritance by constructing a six-generation population-size pedigree from national registries and linking it to medical health records. Results Using Linear Mixed Model-based framework, a common-practice method for heritability estimation, we calculate a heritability measure of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${h}^{2}=12.6 \%$$\end{document}h2=12.6% (95% CI, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$6.1 \%\! -\!19.1 \%$$\end{document}6.1%−19.1%) for absolute reduction of HbA1c% after metformin treatment in the entire cohort, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${h}^{2}=21.0 \%$$\end{document}h2=21.0% (95% CI, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$7.8 \%\! -\!34.4 \%$$\end{document}7.8%−34.4%) for males and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${h}^{2}=22.9 \%$$\end{document}h2=22.9% (95% CI, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$10.0 \%\! -\!35.7 \%$$\end{document}10.0%−35.7%) in females. Results remain unchanged after adjusting for pre-treatment HbA1c%, and in proportional reduction of HbA1c%. Conclusions To the best of our knowledge, our work is the first to estimate heritability of drug response using solely EHR data combining a pedigree-based kinship matrix. We demonstrate that while response to metformin treatment has a heritable component, most of the variation is likely due to other factors, further motivating non-genetic analyses aimed at unraveling metformin’s action mechanism. Individuals in a population might respond differently to the same medication and this phenomenon is commonly attributed to either genes or the environment. Here, we studied the familial aspects of the response to metformin, a medication used in the treatment of type 2 diabetes. We combined information from 18 years of medical records identifying newly treated patients with type 2 diabetes with information about how the trait was inherited within their families. We calculated a metric that tells us how well differences in people’s genes account for differences in their traits, and demonstrate that although the difference in response to metformin is in part explained by the genes people with type 2 diabetes inherit, most of it is not explained by genes. This finding contributes to a better understanding of differences in metformin response and might help inform treatment in future. Kalka and Gavrieli et al. assessed the heritability of variation in the glycaemic response to metformin by leveraging electronic health records data gathered from a large cohort of patients with diabetes and combining it with pedigree information. The authors show that although the variability in this response has a heritable component, most of it is likely non-genetic.
Collapse
|
5
|
Huang X, Tatonetti N, LaRow K, Delgoffee B, Mayer J, Page D, Hebbring SJ. E-Pedigrees: a large-scale automatic family pedigree prediction application. Bioinformatics 2021; 37:3966-3968. [PMID: 34086863 PMCID: PMC8570807 DOI: 10.1093/bioinformatics/btab419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 04/30/2021] [Accepted: 06/03/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The use and functionality of Electronic Health Records (EHR) have increased rapidly in the past few decades. EHRs are becoming an important depository of patient health information and can capture family data. Pedigree analysis is a longstanding and powerful approach that can gain insight into the underlying genetic and environmental factors in human health, but traditional approaches to identifying and recruiting families are low-throughput and labor-intensive. Therefore, high-throughput methods to automatically construct family pedigrees are needed. RESULTS We developed a stand-alone application: Electronic Pedigrees, or E-Pedigrees, which combines two validated family prediction algorithms into a single software package for high throughput pedigrees construction. The convenient platform considers patients' basic demographic information and/or emergency contact data to infer high-accuracy parent-child relationship. Importantly, E-Pedigrees allows users to layer in additional pedigree data when available and provides options for applying different logical rules to improve accuracy of inferred family relationships. This software is fast and easy to use, is compatible with different EHR data sources, and its output is a standard PED file appropriate for multiple downstream analyses. AVAILABILITY AND IMPLEMENTATION The Python 3.3+ version E-Pedigrees application is freely available on: https://github.com/xiayuan-huang/E-pedigrees.
Collapse
Affiliation(s)
- Xiayuan Huang
- Department of Biostatistics & Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Nicholas Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Katie LaRow
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Brooke Delgoffee
- Office of Research Computing and Analytics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - John Mayer
- Office of Research Computing and Analytics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - David Page
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA
| | - Scott J Hebbring
- Center for Precision Medicine Research, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| |
Collapse
|
6
|
Reinsch N, Mayer M, Blunk I. Generalized gametic relationships for flexible analyses of parent-of-origin effects. G3 GENES|GENOMES|GENETICS 2021; 11:6166654. [PMID: 33693544 PMCID: PMC8496240 DOI: 10.1093/g3journal/jkab064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 03/08/2021] [Indexed: 11/12/2022]
Abstract
Abstract
A class of epigenetic inheritance patterns known as genomic imprinting allows alleles to influence the phenotype in a parent-of-origin-specific manner. Various pedigree-based parent-of-origin analyses of quantitative traits have attempted to determine the share of genetic variance that is attributable to imprinted loci. In general, these methods require four random gametic effects per pedigree member to account for all possible types of imprinting in a mixed model. As a result, the system of equations may become excessively large to solve using all available data. If only the offspring have records, which is frequently the case for complex pedigrees, only two averaged gametic effects (transmitting abilities) per parent are required (reduced model). However, the parents may have records in some cases. Therefore, in this study, we explain how employing single gametic effects solely for informative individuals (i.e., phenotyped individuals), and only average gametic effects otherwise, significantly reduces the complexity compared with classical gametic models. A generalized gametic relationship matrix is the covariance of this mixture of effects. The matrix can also make the reduced model much more flexible by including observations from parents. Worked examples are present to illustrate the theory and a realistic body mass data set in mice is used to demonstrate its utility. We show how to set up the inverse of the generalized gametic relationship matrix directly from a pedigree. An open-source program is used to implement the rules. The application of the same principles to phased marker data leads to a genomic version of the generalized gametic relationships.
Collapse
Affiliation(s)
- Norbert Reinsch
- Institute of Genetics and Biometry, Leibniz-Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| | - Manfred Mayer
- Institute of Genetics and Biometry, Leibniz-Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| | - Inga Blunk
- Institute of Genetics and Biometry, Leibniz-Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| |
Collapse
|
7
|
Xu T, Qi GA, Zhu J, Xu HM, Chen GB. Subsampling Technique to Estimate Variance Component for UK-Biobank Traits. Front Genet 2021; 12:612045. [PMID: 33747041 PMCID: PMC7978110 DOI: 10.3389/fgene.2021.612045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/18/2021] [Indexed: 11/13/2022] Open
Abstract
The estimation of heritability has been an important question in statistical genetics. Due to the clear mathematical properties, the modified Haseman–Elston regression has been found a bridge that connects and develops various parallel heritability estimation methods. With the increasing sample size, estimating heritability for biobank-scale data poses a challenge for statistical computation, in particular that the calculation of the genetic relationship matrix is a huge challenge in statistical computation. Using the Haseman–Elston framework, in this study we explicitly analyzed the mathematical structure of the key term tr(KTK), the trace of high-order term of the genetic relationship matrix, a component involved in the estimation procedure. In this study, we proposed two estimators, which can estimate tr(KTK) with greatly reduced sampling variance compared to the existing method under the same computational complexity. We applied this method to 81 traits in UK Biobank data and compared the chromosome-wise partition heritability with the whole-genome heritability, also as an approach for testing polygenicity.
Collapse
Affiliation(s)
- Ting Xu
- Department of Mathematics, Zhejiang University, Hangzhou, China
| | - Guo-An Qi
- Department of Agricultural and Biotechnology, Zhejiang University, Hangzhou, China
| | - Jun Zhu
- Department of Agricultural and Biotechnology, Zhejiang University, Hangzhou, China
| | - Hai-Ming Xu
- Department of Agricultural and Biotechnology, Zhejiang University, Hangzhou, China
| | - Guo-Bo Chen
- Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Clinical Research Institute, Hangzhou, China.,Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, China
| |
Collapse
|
8
|
Blunk I, Thomsen H, Reinsch N, Mayer M, Försti A, Sundquist J, Sundquist K, Hemminki K. Genomic imprinting analyses identify maternal effects as a cause of phenotypic variability in type 1 diabetes and rheumatoid arthritis. Sci Rep 2020; 10:11562. [PMID: 32665606 PMCID: PMC7360775 DOI: 10.1038/s41598-020-68212-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 06/18/2020] [Indexed: 02/08/2023] Open
Abstract
Imprinted genes, giving rise to parent-of-origin effects (POEs), have been hypothesised to affect type 1 diabetes (T1D) and rheumatoid arthritis (RA). However, maternal effects may also play a role. By using a mixed model that is able to simultaneously consider all kinds of POEs, the importance of POEs for the development of T1D and RA was investigated in a variance components analysis. The analysis was based on Swedish population-scale pedigree data. With P = 0.18 (T1D) and P = 0.26 (RA) imprinting variances were not significant. Explaining up to 19.00% (± 2.00%) and 15.00% (± 6.00%) of the phenotypic variance, the maternal environmental variance was significant for T1D (P = 1.60 × 10-24) and for RA (P = 0.02). For the first time, the existence of maternal genetic effects on RA was indicated, contributing up to 16.00% (± 3.00%) of the total variance. Environmental factors such as the social economic index, the number of offspring, birth year as well as their interactions with sex showed large effects.
Collapse
Affiliation(s)
- Inga Blunk
- Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196, Dummerstorf, Germany.
| | - Hauke Thomsen
- Division of Molecular Genetic Epidemiology, German Cancer Research Centre (DKFZ), Heidelberg, Germany
- GeneWerk GmbH, Heidelberg, Germany
| | - Norbert Reinsch
- Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196, Dummerstorf, Germany
| | - Manfred Mayer
- Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196, Dummerstorf, Germany
| | - Asta Försti
- Division of Molecular Genetic Epidemiology, German Cancer Research Centre (DKFZ), Heidelberg, Germany
- Center for Primary Health Care Research, Lund University, Malmö, Sweden
- Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Jan Sundquist
- Center for Primary Health Care Research, Lund University, Malmö, Sweden
- Department of Family Medicine and Community Health, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, USA
- Center for Community-Based Healthcare Research and Education (CoHRE), Department of Functional Pathology, School of Medicine, Shimane University, Izumo, Japan
| | - Kristina Sundquist
- Center for Primary Health Care Research, Lund University, Malmö, Sweden
- Department of Family Medicine and Community Health, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, USA
- Center for Community-Based Healthcare Research and Education (CoHRE), Department of Functional Pathology, School of Medicine, Shimane University, Izumo, Japan
| | - Kari Hemminki
- Division of Molecular Genetic Epidemiology, German Cancer Research Centre (DKFZ), Heidelberg, Germany
- Center for Primary Health Care Research, Lund University, Malmö, Sweden
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, Pilsen, Czech Republic
| |
Collapse
|
9
|
Elliott LT. Kinship Solutions for Partially Observed Multiphenotype Data. J Comput Biol 2020; 27:1461-1470. [PMID: 32159382 PMCID: PMC7482112 DOI: 10.1089/cmb.2019.0440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Current work for multivariate analysis of phenotypes in genome-wide association studies often requires that genetic similarity matrices be inverted or decomposed. This can be a computational bottleneck when many phenotypes are presented, each with a different missingness pattern. A usual method in this case is to perform decompositions on subsets of the kinship matrix for each phenotype, with each subset corresponding to the set of observed samples for that phenotype. We provide a new method for decomposing these kinship matrices that can reduce the computational complexity by an order of magnitude by propagating low-rank modifications along a tree spanning the phenotypes. We demonstrate that our method provides speed improvements of around 40% under reasonable conditions.
Collapse
Affiliation(s)
- Lloyd T Elliott
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|