1
|
Stadler M, Zhao SS, Bowes J. A review of the advances in understanding the genetic basis of spondylarthritis and emerging clinical benefit. Best Pract Res Clin Rheumatol 2024:101982. [PMID: 39223061 DOI: 10.1016/j.berh.2024.101982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 07/12/2024] [Accepted: 07/17/2024] [Indexed: 09/04/2024]
Abstract
Spondyloarthropathies (SpA), including ankylosing spondylitis (AS) and psoriatic arthritis (PsA), have been shown to have a substantial genetic predisposition based on heritability estimates derived from family studies and genome-wide association studies (GWAS). GWAS have uncovered numerous genetic loci associated with susceptibility to SpA, with significant associations to human leukocyte antigen (HLA) genes, which are major genetic risk factors for both AS and PsA. Specific loci differentiating PsA from cutaneous-only psoriasis have been identified, though these remain limited. Further research with larger sample sizes is necessary to identify more PsA-specific genetic markers. Current research focuses on translating these genetic insights into clinical applications. For example, polygenic risk scores are showing promise for the classification of disease risk and diagnosis and future research should focus on refining these risk assessment tools to improve clinical outcomes for individuals with SpA. Addressing these challenges will help integrate genetic testing into patients care and impact clinical practice.
Collapse
Affiliation(s)
- Michael Stadler
- The Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Sizheng Steven Zhao
- The Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - John Bowes
- The Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.
| |
Collapse
|
2
|
Wang X, Zhang Z, Ding Y, Chen T, Mucci L, Albanes D, Landi MT, Caporaso NE, Lam S, Tardon A, Chen C, Bojesen SE, Johansson M, Risch A, Bickeböller H, Wichmann HE, Rennert G, Arnold S, Brennan P, McKay JD, Field JK, Shete SS, Le Marchand L, Liu G, Andrew AS, Kiemeney LA, Zienolddiny-Narui S, Behndig A, Johansson M, Cox A, Lazarus P, Schabath MB, Aldrich MC, Hung RJ, Amos CI, Lin X, Christiani DC. Impact of individual level uncertainty of lung cancer polygenic risk score (PRS) on risk stratification. Genome Med 2024; 16:22. [PMID: 38317189 PMCID: PMC10840262 DOI: 10.1186/s13073-024-01298-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 01/26/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Although polygenic risk score (PRS) has emerged as a promising tool for predicting cancer risk from genome-wide association studies (GWAS), the individual-level accuracy of lung cancer PRS and the extent to which its impact on subsequent clinical applications remains largely unexplored. METHODS Lung cancer PRSs and confidence/credible interval (CI) were constructed using two statistical approaches for each individual: (1) the weighted sum of 16 GWAS-derived significant SNP loci and the CI through the bootstrapping method (PRS-16-CV) and (2) LDpred2 and the CI through posteriors sampling (PRS-Bayes), among 17,166 lung cancer cases and 12,894 controls with European ancestry from the International Lung Cancer Consortium. Individuals were classified into different genetic risk subgroups based on the relationship between their own PRS mean/PRS CI and the population level threshold. RESULTS Considerable variances in PRS point estimates at the individual level were observed for both methods, with an average standard deviation (s.d.) of 0.12 for PRS-16-CV and a much larger s.d. of 0.88 for PRS-Bayes. Using PRS-16-CV, only 25.0% of individuals with PRS point estimates in the lowest decile of PRS and 16.8% in the highest decile have their entire 95% CI fully contained in the lowest and highest decile, respectively, while PRS-Bayes was unable to find any eligible individuals. Only 19% of the individuals were concordantly identified as having high genetic risk (> 90th percentile) using the two PRS estimators. An increased relative risk of lung cancer comparing the highest PRS percentile to the lowest was observed when taking the CI into account (OR = 2.73, 95% CI: 2.12-3.50, P-value = 4.13 × 10-15) compared to using PRS-16-CV mean (OR = 2.23, 95% CI: 1.99-2.49, P-value = 5.70 × 10-46). Improved risk prediction performance with higher AUC was consistently observed in individuals identified by PRS-16-CV CI, and the best performance was achieved by incorporating age, gender, and detailed smoking pack-years (AUC: 0.73, 95% CI = 0.72-0.74). CONCLUSIONS Lung cancer PRS estimates using different methods have modest correlations at the individual level, highlighting the importance of considering individual-level uncertainty when evaluating the practical utility of PRS.
Collapse
Affiliation(s)
- Xinan Wang
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Harvard University, 667 Huntington Ave, Boston, MA, 02115, USA
| | - Ziwei Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, USA
| | - Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Lorelei Mucci
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Demetrios Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Neil E Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen Lam
- Department of Medicine, British Columbia Cancer Agency, University of British Columbia, Vancouver, Canada
| | - Adonina Tardon
- Faculty of Medicine, University of Oviedo and CIBERESP, Oviedo, Spain
| | - Chu Chen
- Department of Epidemiology, University of Washington School of Public Health, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stig E Bojesen
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Copenhagen, Denmark
| | - Mattias Johansson
- Genomic Epidemiology Branch, International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Angela Risch
- Department of Biosciences and Medical Biology, Allergy-Cancer-BioNano Research Centre, University of Salzburg, and Cancer Cluster Salzburg, Salzburg, Austria
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center, Georg August University Göttingen, Göttingen, Germany
| | - H-Erich Wichmann
- Institute of Medical Informatics, Biometry and Epidemiology, Ludwig Maximilians University, Munich, Germany
| | - Gadi Rennert
- Clalit National Cancer Control Center, Carmel Medical Center and Technion Faculty of Medicine, Carmel, Haifa, Israel
| | - Susanne Arnold
- Markey Cancer Center, University of Kentucky, Lexington, KY, USA
| | - Paul Brennan
- Genomic Epidemiology Branch, International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - James D McKay
- Genomic Epidemiology Branch, International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - John K Field
- Department of Molecular and Clinical Cancer Medicine, Institute of Translational Medicine, University of Liverpool, Liverpool, UK
| | - Sanjay S Shete
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Geoffrey Liu
- Princess Margaret Cancer Centre, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Angeline S Andrew
- Department of Epidemiology, Department of Community and Family Medicine, Dartmouth Geisel School of Medicine, Hanover, NH, USA
| | - Lambertus A Kiemeney
- Department for Health Evidence, Department of Urology, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Annelie Behndig
- Department of Public Health and Clinical Medicine, Umeå University, Umeå, Sweden
| | | | - Angie Cox
- Department of Oncology and Metabolism, The Medical School, University of Sheffield, Sheffield, UK
| | - Philip Lazarus
- Department of Pharmaceutical Sciences, College of Pharmacy, Washington State University, Spokane, WA, USA
| | - Matthew B Schabath
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida, USA
| | - Melinda C Aldrich
- Department of Medicine, Department of Biomedical Informatics and Department of Thoracic Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Rayjean J Hung
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Christopher I Amos
- Institute for Clinical and Translational Research, Department of Medicine, Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - David C Christiani
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Harvard University, 667 Huntington Ave, Boston, MA, 02115, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA.
| |
Collapse
|
3
|
Tanaka R, Wu D, Li X, Tibbs-Cortes LE, Wood JC, Magallanes-Lundback M, Bornowski N, Hamilton JP, Vaillancourt B, Li X, Deason NT, Schoenbaum GR, Buell CR, DellaPenna D, Yu J, Gore MA. Leveraging prior biological knowledge improves prediction of tocochromanols in maize grain. THE PLANT GENOME 2023; 16:e20276. [PMID: 36321716 DOI: 10.1002/tpg2.20276] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
With an essential role in human health, tocochromanols are mostly obtained by consuming seed oils; however, the vitamin E content of the most abundant tocochromanols in maize (Zea mays L.) grain is low. Several large-effect genes with cis-acting variants affecting messenger RNA (mRNA) expression are mostly responsible for tocochromanol variation in maize grain, with other relevant associated quantitative trait loci (QTL) yet to be fully resolved. Leveraging existing genomic and transcriptomic information for maize inbreds could improve prediction when selecting for higher vitamin E content. Here, we first evaluated a multikernel genomic best linear unbiased prediction (MK-GBLUP) approach for modeling known QTL in the prediction of nine tocochromanol grain phenotypes (12-21 QTL per trait) within and between two panels of 1,462 and 242 maize inbred lines. On average, MK-GBLUP models improved predictive abilities by 7.0-13.6% when compared with GBLUP. In a second approach with a subset of 545 lines from the larger panel, the highest average improvement in predictive ability relative to GBLUP was achieved with a multi-trait GBLUP model (15.4%) that had a tocochromanol phenotype and transcript abundances in developing grain for a few large-effect candidate causal genes (1-3 genes per trait) as multiple response variables. Taken together, our study illustrates the enhancement of prediction models when informed by existing biological knowledge pertaining to QTL and candidate causal genes.
Collapse
Affiliation(s)
- Ryokei Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Di Wu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | - Xiaowei Li
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| | | | - Joshua C Wood
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | | | - Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - John P Hamilton
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Brieanne Vaillancourt
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Xianran Li
- USDA ARS, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA, 99164, USA
| | - Nicholas T Deason
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | | | - C Robin Buell
- Institute for Plant Breeding, Genetics & Genomics, Center for Applied Genetic Technologies, Dep. of Crop & Soil Sciences, Univ. of Georgia, Athens, GA, 30602, USA
| | - Dean DellaPenna
- Dep. of Biochemistry and Molecular Biology, Michigan State Univ., East Lansing, MI, 48824, USA
| | - Jianming Yu
- Dep. of Agronomy, Iowa State Univ., Ames, IA, 50011, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
4
|
Vaskimo LM, Gomon G, Naamane N, Cordell HJ, Pratt A, Knevel R. The Application of Genetic Risk Scores in Rheumatic Diseases: A Perspective. Genes (Basel) 2023; 14:2167. [PMID: 38136989 PMCID: PMC10743278 DOI: 10.3390/genes14122167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open
Abstract
Modest effect sizes have limited the clinical applicability of genetic associations with rheumatic diseases. Genetic risk scores (GRSs) have emerged as a promising solution to translate genetics into useful tools. In this review, we provide an overview of the recent literature on GRSs in rheumatic diseases. We describe six categories for which GRSs are used: (a) disease (outcome) prediction, (b) genetic commonalities between diseases, (c) disease differentiation, (d) interplay between genetics and environmental factors, (e) heritability and transferability, and (f) detecting causal relationships between traits. In our review of the literature, we identified current lacunas and opportunities for future work. First, the shortage of non-European genetic data restricts the application of many GRSs to European populations. Next, many GRSs are tested in settings enriched for cases that limit the transferability to real life. If intended for clinical application, GRSs are ideally tested in the relevant setting. Finally, there is much to elucidate regarding the co-occurrence of clinical traits to identify shared causal paths and elucidate relationships between the diseases. GRSs are useful instruments for this. Overall, the ever-continuing research on GRSs gives a hopeful outlook into the future of GRSs and indicates significant progress in their potential applications.
Collapse
Affiliation(s)
- Lotta M. Vaskimo
- Department of Rheumatology, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands
| | - Georgy Gomon
- Department of Rheumatology, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands
| | - Najib Naamane
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne NE2 4AX, UK
| | - Heather J. Cordell
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne NE2 4AX, UK
| | - Arthur Pratt
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Department of Rheumatology, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne NE7 7DN, UK
| | - Rachel Knevel
- Department of Rheumatology, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| |
Collapse
|
5
|
Rohde PD, Fourie Sørensen I, Sørensen P. Expanded utility of the R package, qgg, with applications within genomic medicine. Bioinformatics 2023; 39:btad656. [PMID: 37882742 PMCID: PMC10627350 DOI: 10.1093/bioinformatics/btad656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 09/17/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
SUMMARY Here, we present an expanded utility of the R package qgg for genetic analyses of complex traits and diseases. One of the major updates of the package is, that it now includes Bayesian linear regression modeling procedures, which provide a unified framework for mapping of genetic variants, estimation of heritability and genomic prediction from either individual level data or from genome-wide association study summary data. With this release, the qgg package now provides a wealth of the commonly used methods in analysis of complex traits and diseases, without the need to switch between software and data formats. AVAILABILITY AND IMPLEMENTATION The methodologies are implemented in the publicly available R software package, qgg, using fast and memory efficient algorithms in C++ and is available on CRAN or as a developer version at our GitHub page (https://github.com/psoerensen/qgg). Notes on the implemented statistical genetic models, tutorials and example scripts are available at our GitHub page https://psoerensen.github.io/qgg/.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Genomic Medicine, Department of Health Science and Technology, Aalborg University, 9260 Gistrup, Denmark
| | - Izel Fourie Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000 Aarhus, Denmark
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|
6
|
Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023; 39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Accurate disease risk prediction is an essential step in the modern quest for precision medicine. While high-dimensional multi-omics data have provided unprecedented data resources for prediction studies, their high-dimensionality and complex inter/intra-relationships have posed significant analytical challenges. RESULTS We proposed a two-step Bayesian linear mixed model framework (TBLMM) for risk prediction analysis on multi-omics data. TBLMM models the predictive effects from multi-omics data using a hybrid of the sparsity regression and linear mixed model with multiple random effects. It can resemble the shape of the true effect size distributions and accounts for non-linear, including interaction effects, among multi-omics data via kernel fusion. It infers its parameters via a computationally efficient variational Bayes algorithm. Through extensive simulation studies and the prediction analyses on the positron emission tomography imaging outcomes using data obtained from the Alzheimer's Disease Neuroimaging Initiative, we have demonstrated that TBLMM can consistently outperform the existing method in predicting the risk of complex traits. AVAILABILITY AND IMPLEMENTATION The corresponding R package is available on GitHub (https://github.com/YaluWen/TBLMM).
Collapse
Affiliation(s)
- Yang Hai
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Jixiang Ma
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Kaixin Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
7
|
Hai Y, Zhao W, Meng Q, Liu L, Wen Y. Bayesian linear mixed model with multiple random effects for family-based genetic studies. Front Genet 2023; 14:1267704. [PMID: 37928242 PMCID: PMC10620972 DOI: 10.3389/fgene.2023.1267704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/25/2023] [Indexed: 11/07/2023] Open
Abstract
Motivation: Family-based study design is one of the popular designs used in genetic research, and the whole-genome sequencing data obtained from family-based studies offer many unique features for risk prediction studies. They can not only provide a more comprehensive view of many complex diseases, but also utilize information in the design to further improve the prediction accuracy. While promising, existing analytical methods often ignore the information embedded in the study design and overlook the predictive effects of rare variants, leading to a prediction model with sub-optimal performance. Results: We proposed a Bayesian linear mixed model for the prediction analysis of sequencing data obtained from family-based studies. Our method can not only capture predictive effects from both common and rare variants, but also easily accommodate various disease model assumptions. It uses information embedded in the study design to form surrogates, where the predictive effects from unmeasured/unknown genetic and environmental risk factors can be modelled. Through extensive simulation studies and the analysis of sequencing data obtained from the Michigan State University Twin Registry study, we have demonstrated that the proposed method outperforms commonly adopted techniques. Availability: R package is available at https://github.com/yhai943/FBLMM.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Wenxuan Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Qingyu Meng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
8
|
Ul Islam Z, Baneen U, Khaliq T, Nurulain SM, Muneer Z, Hussain S. Association analysis of miRNA-146a and miRNA-499 polymorphisms with rheumatoid arthritis: a case-control and trio-family study. Clin Exp Med 2023; 23:1667-1675. [PMID: 36303006 DOI: 10.1007/s10238-022-00916-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 10/08/2022] [Indexed: 11/03/2022]
Abstract
Single nucleotide polymorphism is known to alter the expression and processing of miRNAs leading to a variety of diseases including rheumatoid arthritis (RA). However, disagreement is present up to date regarding the association of miRNA-146a and miRNA-499 polymorphisms with RA. The goal of this study was to assess the association of polymorphisms at miRNA-146a and miRNA-499 with the pathogenesis of RA in patients originating from Pakistan. Initially, eleven hundred subjects (1100) comprises of 550 RA patients and 550 healthy controls were investigated in the case-control analysis. Spectrophotometric measurement of lipids and C-reactive protein was used, whereas interleukin-1 receptor associated kinase-1 and TNF-receptor associated factor-6 values were quantified by an enzyme-linked immunosorbent assay. Secondly, heritability of susceptible alleles was tested from 70 trio-families. The miRNA-146a rs2910164 and miRNA-499 rs3746444 polymorphisms were genotyped using the polymerase chain reaction followed by restriction digestion. A Significant association of miRNA-146a and miRNA-499 genotypes was observed with RA patients (P < 0.05, respectively). The miRNA-146a rs2910164 G (OR = 1.4, P < 0.05) and miRNA-499 rs3746444 C (OR = 1.6, P < 0.0001) allele was significantly associated with RA in comparison with controls, respectively. Besides, the transmission analysis revealed a significant (P < 0.05) inheritance of rs2910164 G and rs3746444 C allele from parents to affected offspring. The current research concludes that miRNA-146a (rs2910164; C > G) and miRNA-499 (rs3746444; T > C) polymorphisms are linked to RA in the population studied. Furthermore, it was demonstrated for the first time in our high-risk cohort that the rs2910164 G and rs3746444 C allele was strongly related to familial RA.
Collapse
Affiliation(s)
- Zia Ul Islam
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad, 45550, Pakistan
| | - Umul Baneen
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad, 45550, Pakistan
| | - Taqdees Khaliq
- Department of Rheumatology, Federal Government Polyclinic Hospital, 44 Luqman Hakeem Road G/6, Islamabad, 46000, Pakistan
| | - Syed Muhammad Nurulain
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad, 45550, Pakistan
| | - Zahid Muneer
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad, 45550, Pakistan
| | - Sabir Hussain
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad, 45550, Pakistan.
| |
Collapse
|
9
|
Fang Y, Wang D, Xiao L, Quan M, Qi W, Song F, Zhou J, Liu X, Qin S, Du Q, Liu Q, El-Kassaby YA, Zhang D. Allelic variation in transcription factor PtoWRKY68 contributes to drought tolerance in Populus. PLANT PHYSIOLOGY 2023; 193:736-755. [PMID: 37247391 PMCID: PMC10469405 DOI: 10.1093/plphys/kiad315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/21/2023] [Accepted: 04/30/2023] [Indexed: 05/31/2023]
Abstract
Drought stress limits woody species productivity and influences tree distribution. However, dissecting the molecular mechanisms that underpin drought responses in forest trees can be challenging due to trait complexity. Here, using a panel of 300 Chinese white poplar (Populus tomentosa) accessions collected from different geographical climatic regions in China, we performed a genome-wide association study (GWAS) on seven drought-related traits and identified PtoWRKY68 as a candidate gene involved in the response to drought stress. A 12-bp insertion and/or deletion and three nonsynonymous variants in the PtoWRKY68 coding sequence categorized natural populations of P. tomentosa into two haplotype groups, PtoWRKY68hap1 and PtoWRKY68hap2. The allelic variation in these two PtoWRKY68 haplotypes conferred differential transcriptional regulatory activities and binding to the promoters of downstream abscisic acid (ABA) efflux and signaling genes. Overexpression of PtoWRKY68hap1 and PtoWRKY68hap2 in Arabidopsis (Arabidopsis thaliana) ameliorated the drought tolerance of two transgenic lines and increased ABA content by 42.7% and 14.3% compared to wild-type plants, respectively. Notably, PtoWRKY68hap1 (associated with drought tolerance) is ubiquitous in accessions in water-deficient environments, whereas the drought-sensitive allele PtoWRKY68hap2 is widely distributed in well-watered regions, consistent with the trends in local precipitation, suggesting that these alleles correspond to geographical adaptation in Populus. Moreover, quantitative trait loci analysis and an electrophoretic mobility shift assay showed that SHORT VEGETATIVE PHASE (PtoSVP.3) positively regulates the expression of PtoWRKY68 under drought stress. We propose a drought tolerance regulatory module in which PtoWRKY68 modulates ABA signaling and accumulation, providing insight into the genetic basis of drought tolerance in trees. Our findings will facilitate molecular breeding to improve the drought tolerance of forest trees.
Collapse
Affiliation(s)
- Yuanyuan Fang
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Dan Wang
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Liang Xiao
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Mingyang Quan
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Weina Qi
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Fangyuan Song
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Jiaxuan Zhou
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Xin Liu
- Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100093, People’s Republic of China
| | - Shitong Qin
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Qingzhang Du
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| | - Qing Liu
- The Institute of Agriculture and Food Research, CSIRO Agriculture and Food, Black Mountain, Canberra ACT 2601, Australia
| | - Yousry A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, Forest Sciences Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Deqiang Zhang
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, People’s Republic of China
| |
Collapse
|
10
|
Badré A, Pan C. Explainable multi-task learning improves the parallel estimation of polygenic risk scores for many diseases through shared genetic basis. PLoS Comput Biol 2023; 19:e1011211. [PMID: 37418352 DOI: 10.1371/journal.pcbi.1011211] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 05/23/2023] [Indexed: 07/09/2023] Open
Abstract
Many complex diseases share common genetic determinants and are comorbid in a population. We hypothesized that the co-occurrences of diseases and their overlapping genetic etiology can be exploited to simultaneously improve multiple diseases' polygenic risk scores (PRS). This hypothesis was tested using a multi-task learning (MTL) approach based on an explainable neural network architecture. We found that parallel estimations of the PRS for 17 prevalent cancers in a pan-cancer MTL model were generally more accurate than independent estimations for individual cancers in comparable single-task learning (STL) models. Such performance improvement conferred by positive transfer learning was also observed consistently for 60 prevalent non-cancer diseases in a pan-disease MTL model. Interpretation of the MTL models revealed significant genetic correlations between the important sets of single nucleotide polymorphisms used by the neural network for PRS estimation. This suggested a well-connected network of diseases with shared genetic basis.
Collapse
Affiliation(s)
- Adrien Badré
- School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Chongle Pan
- School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, United States of America
| |
Collapse
|
11
|
Wang X, Li W, Feng X, Li J, Liu GE, Fang L, Yu Y. Harnessing male germline epigenomics for the genetic improvement in cattle. J Anim Sci Biotechnol 2023; 14:76. [PMID: 37277852 PMCID: PMC10242889 DOI: 10.1186/s40104-023-00874-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 04/02/2023] [Indexed: 06/07/2023] Open
Abstract
Sperm is essential for successful artificial insemination in dairy cattle, and its quality can be influenced by both epigenetic modification and epigenetic inheritance. The bovine germline differentiation is characterized by epigenetic reprogramming, while intergenerational and transgenerational epigenetic inheritance can influence the offspring's development through the transmission of epigenetic features to the offspring via the germline. Therefore, the selection of bulls with superior sperm quality for the production and fertility traits requires a better understanding of the epigenetic mechanism and more accurate identifications of epigenetic biomarkers. We have comprehensively reviewed the current progress in the studies of bovine sperm epigenome in terms of both resources and biological discovery in order to provide perspectives on how to harness this valuable information for genetic improvement in the cattle breeding industry.
Collapse
Affiliation(s)
- Xiao Wang
- Laboratory of Animal Genetics and Breeding, Ministry of Agriculture and Rural Affairs of China, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
- Konge Larsen ApS, Kongens Lyngby, 2800, Denmark
- Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, 250100, China
| | - Wenlong Li
- Laboratory of Animal Genetics and Breeding, Ministry of Agriculture and Rural Affairs of China, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xia Feng
- Laboratory of Animal Genetics and Breeding, Ministry of Agriculture and Rural Affairs of China, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jianbing Li
- Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, 250100, China
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Henry A. Wallace Beltsville Agricultural Research Center, USDA, Beltsville, MD, 20705, USA
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, 8000, Denmark.
| | - Ying Yu
- Laboratory of Animal Genetics and Breeding, Ministry of Agriculture and Rural Affairs of China, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
12
|
Zhuang Z, Wu J, Qiu Y, Ruan D, Ding R, Xu C, Zhou S, Zhang Y, Liu Y, Ma F, Yang J, Sun Y, Zheng E, Yang M, Cai G, Yang J, Wu Z. Improving the accuracy of genomic prediction for meat quality traits using whole genome sequence data in pigs. J Anim Sci Biotechnol 2023; 14:67. [PMID: 37161604 PMCID: PMC10170792 DOI: 10.1186/s40104-023-00863-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 03/05/2023] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND Pork quality can directly affect customer purchase tendency and meat quality traits have become valuable in modern pork production. However, genetic improvement has been slow due to high phenotyping costs. In this study, whole genome sequence (WGS) data was used to evaluate the prediction accuracy of genomic best linear unbiased prediction (GBLUP) for meat quality in large-scale crossbred commercial pigs. RESULTS We produced WGS data (18,695,907 SNPs and 2,106,902 INDELs exceed quality control) from 1,469 sequenced Duroc × (Landrace × Yorkshire) pigs and developed a reference panel for meat quality including meat color score, marbling score, L* (lightness), a* (redness), and b* (yellowness) of genomic prediction. The prediction accuracy was defined as the Pearson correlation coefficient between adjusted phenotypes and genomic estimated breeding values in the validation population. Using different marker density panels derived from WGS data, accuracy differed substantially among meat quality traits, varied from 0.08 to 0.47. Results showed that MultiBLUP outperform GBLUP and yielded accuracy increases ranging from 17.39% to 75%. We optimized the marker density and found medium- and high-density marker panels are beneficial for the estimation of heritability for meat quality. Moreover, we conducted genotype imputation from 50K chip to WGS level in the same population and found average concordance rate to exceed 95% and r2 = 0.81. CONCLUSIONS Overall, estimation of heritability for meat quality traits can benefit from the use of WGS data. This study showed the superiority of using WGS data to genetically improve pork quality in genomic prediction.
Collapse
Affiliation(s)
- Zhanwei Zhuang
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Jie Wu
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Yibin Qiu
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Donglin Ruan
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Rongrong Ding
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Cineng Xu
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Shenping Zhou
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Yuling Zhang
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Yiyi Liu
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Fucai Ma
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Jifei Yang
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Ying Sun
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Enqin Zheng
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Ming Yang
- College of Animal Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Gengyuan Cai
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China
| | - Jie Yang
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China.
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China.
| | - Zhenfang Wu
- College of Animal Science and National Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, 510642, China.
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, Guangzhou, 510642, China.
- Yunfu Subcenter of Guangdong Laboratory for Lingnan Modern Agriculture, Yunfu, 527400, China.
| |
Collapse
|
13
|
Hou X, Ma B, Liu M, Zhao Y, Chai B, Pan J, Wang P, Li D, Liu S, Song F. The transcriptional risk scores for kidney renal clear cell carcinoma using XGBoost and multiple omics data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:11676-11687. [PMID: 37501415 DOI: 10.3934/mbe.2023519] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Most kidney cancers are kidney renal clear cell carcinoma (KIRC) that is a main cause of cancer-related deaths. Polygenic risk score (PRS) is a weighted linear combination of phenotypic related alleles on the genome that can be used to assess KIRC risk. However, standalone SNP data as input to the PRS model may not provide satisfactory result. Therefore, Transcriptional risk scores (TRS) based on multi-omics data and machine learning models were proposed to assess the risk of KIRC. First, we collected four types of multi-omics data (DNA methylation, miRNA, mRNA and lncRNA) of KIRC patients from the TCGA database. Subsequently, a novel TRS method utilizing multiple omics data and XGBoost model was developed. Finally, we performed prevalence analysis and prognosis prediction to evaluate the utility of the TRS generated by our method. Our TRS methods exhibited better predictive performance than the linear models and other machine learning models. Furthermore, the prediction accuracy of combined TRS model was higher than that of single-omics TRS model. The KM curves showed that TRS was a valid prognostic indicator for cancer staging. Our proposed method extended the current definition of TRS from standalone SNP data to multi-omics data and was superior to the linear models and other machine learning models, which may provide a useful implement for diagnostic and prognostic prediction of KIRC.
Collapse
Affiliation(s)
- Xiaoyu Hou
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Ming Liu
- Physical Department of Science and Technology, Dalian University, Dalian 116622, China
| | - Yuxuan Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bingjie Chai
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Jianqiao Pan
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Pengcheng Wang
- Department of Mechanical Engineering, University of Houston, Houston 77204, USA
| | - Di Li
- Department of Neuro Intervention, Dalian Medical University affiliated Dalian Municipal Central Hospital, Dalian 116033, China
| | - Shuxin Liu
- Department of Nephrology, Dalian Medical University affiliated Dalian Municipal Central Hospital, Dalian 116033, China
| | - Fengju Song
- Department of Epidemiology and Biostatistics, Key Laboratory of Molecular Cancer Epidemiology, Tianjin, National Clinical Research Center of Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300060, China
| |
Collapse
|
14
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 06/06/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
15
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 11/23/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
16
|
Liu P, Bu C, Chen P, El-Kassaby YA, Zhang D, Song Y. Enhanced genome-wide association reveals the role of YABBY11-NGATHA-LIKE1 in leaf serration development of Populus. PLANT PHYSIOLOGY 2023; 191:1702-1718. [PMID: 36535002 PMCID: PMC10022644 DOI: 10.1093/plphys/kiac585] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Leaf margins are complex plant morphological features that contribute to leaf shape diversity, which affects plant structure, yield, and adaptation. Although several leaf margin regulators have been identified to date, the genetic basis of their natural variation has not been fully elucidated. In this study, we profiled two distinct leaf morphology types (serrated and smooth) using the persistent homology mathematical framework (PHMF) in two poplar species (Populus tomentosa and Populus simonii, respectively). A combined genome-wide association study (GWAS) and expression quantitative trait nucleotide (eQTN) mapping were applied to create a leaf morphology control module using data from P. tomentosa and P. simonii populations. Natural variation in leaf margins was associated with YABBY11 (YAB11) transcript abundance in poplar. In P. tomentosa, PtoYAB11 carries a premature stop codon (PtoYAB11PSC), resulting in the loss of its positive regulation of NGATHA-LIKE1 (PtoNGAL-1) and RIBULOSE BISPHOSPHATE CARBOXYLASE LARGE SUBUNIT (PtoRBCL). Overexpression of PtoYAB11PSC promoted serrated leaf margins, enlarged leaves, enhanced photosynthesis, and increased biomass. Overexpression of PsiYAB11 in P. tomentosa promoted smooth leaf margins, higher stomatal density, and greater light damage repair ability. In poplar, YAB11-NGAL1 is sensitive to environmental conditions, acts as a positive regulator of leaf margin serration, and may also link environmental signaling to leaf morphological plasticity.
Collapse
Affiliation(s)
- Peng Liu
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
| | - Chenhao Bu
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
| | - Panfei Chen
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
| | - Yousry A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Deqiang Zhang
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
| | - Yuepeng Song
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35, Qinghua East Road, Beijing 100083, P.R. China
| |
Collapse
|
17
|
Faucon A, Samaroo J, Ge T, Davis LK, Cox NJ, Tao R, Shuey MM. Improving the computation efficiency of polygenic risk score modeling: faster in Julia. Life Sci Alliance 2022; 5:5/12/e202201382. [PMID: 35851544 PMCID: PMC9297586 DOI: 10.26508/lsa.202201382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 11/24/2022] Open
Abstract
To enable computationally efficient polygenic risk score (PRS) calculations, PRS.jl translates a field standard PRS construction method, PRS-CS, to the Julia programming language. To enable large-scale application of polygenic risk scores (PRSs) in a computationally efficient manner, we translate a widely used PRS construction method, PRS–continuous shrinkage, to the Julia programming language, PRS.jl. On nine different traits with varying genetic architectures, we demonstrate that PRS.jl maintains accuracy of prediction while decreasing the average runtime by 5.5×. Additional programmatic modifications improve usability and robustness. This freely available software substantially improves work flow and democratizes usage of PRSs by lowering the computational burden of the PRS–continuous shrinkage method.
Collapse
Affiliation(s)
- Annika Faucon
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Julian Samaroo
- JuliaLab, Massachusetts Institute of Technology, Boston, MA, USA
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Lea K Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Megan M Shuey
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA .,Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
18
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments estimators for complex phenotype prediction. Bioinformatics 2022; 38:5222-5228. [PMID: 36205617 DOI: 10.1093/bioinformatics/btac659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/27/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Linear mixed models (LMMs) have long been the method of choice for risk prediction analysis on high-dimensional data. However, it remains computationally challenging to simultaneously model a large amount of variants that can be noise or have predictive effects of complex forms. RESULTS In this work, we have developed a penalized LMM with generalized method of moments (pLMMGMM) estimators for prediction analysis. pLMMGMM is built within the LMM framework, where random effects are used to model the joint predictive effects from all variants within a region. Different from existing methods that focus on linear relationships and use empirical criteria for variable screening, pLMMGMM can efficiently detect regions that harbor genetic variants with both linear and non-linear predictive effects. In addition, unlike existing LMMs that can only handle a very limited number of random effects, pLMMGMM is much less computationally demanding. It can jointly consider a large number of regions and accurately detect those that are predictive. Through theoretical investigations, we have shown that our method has the selection consistency and asymptotic normality. Through extensive simulations and the analysis of PET-imaging outcomes, we have demonstrated that pLMMGMM outperformed existing models and it can accurately detect regions that harbor risk factors with various forms of predictive effects. AVAILABILITY AND IMPLEMENTATION The R-package is available at https://github.com/XiaQiong/GMMLasso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
19
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
20
|
Namjou B, Lape M, Malolepsza E, DeVore SB, Weirauch MT, Dikilitas O, Jarvik GP, Kiryluk K, Kullo IJ, Liu C, Luo Y, Satterfield BA, Smoller JW, Walunas TL, Connolly J, Sleiman P, Mersha TB, Mentch FD, Hakonarson H, Prows CA, Biagini JM, Khurana Hershey GK, Martin LJ, Kottyan L. Multiancestral polygenic risk score for pediatric asthma. J Allergy Clin Immunol 2022; 150:1086-1096. [PMID: 35595084 PMCID: PMC9643615 DOI: 10.1016/j.jaci.2022.03.035] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 03/07/2022] [Accepted: 03/29/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND Asthma is the most common chronic condition in children and the third leading cause of hospitalization in pediatrics. The genome-wide association study catalog reports 140 studies with genome-wide significance. A polygenic risk score (PRS) with predictive value across ancestries has not been evaluated for this important trait. OBJECTIVES This study aimed to train and validate a PRS relying on genetic determinants for asthma to provide predictions for disease occurrence in pediatric cohorts of diverse ancestries. METHODS This study applied a Bayesian regression framework method using the Trans-National Asthma Genetic Consortium genome-wide association study summary statistics to derive a multiancestral PRS score, used one Electronic Medical Records and Genomics (eMERGE) cohort as a training set, used a second independent eMERGE cohort to validate the score, and used the UK Biobank data to replicate the findings. A phenome-wide association study was performed using the PRS to identify shared genetic etiology with other phenotypes. RESULTS The multiancestral asthma PRS was associated with asthma in the 2 pediatric validation datasets. Overall, the multiancestral asthma PRS has an area under the curve (AUC) of 0.70 (95% CI, 0.69-0.72) in the pediatric validation 1 and AUC of 0.66 (0.65-0.66) in the pediatric validation 2 datasets. We found significant discrimination across pediatric subcohorts of European (AUC, 95% CI, 0.60 and 0.66), African (AUC, 95% CI, 0.61 and 0.66), admixed American (AUC, 0.64 and 0.70), Southeast Asian (AUC, 0.65), and East Asian (AUC, 0.73) ancestry. Pediatric participants with the top 5% PRS had 2.80 to 5.82 increased odds of asthma compared to the bottom 5% across the training, validation 1, and validation 2 cohorts when adjusted for ancestry. Phenome-wide association study analysis confirmed the strong association of the identified PRS with asthma (odds ratio, 2.71, PFDR = 3.71 × 10-65) and related phenotypes. CONCLUSIONS A multiancestral PRS for asthma based on Bayesian posterior genomic effect sizes identifies increased odds of pediatric asthma.
Collapse
Affiliation(s)
- Bahram Namjou
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
| | - Michael Lape
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Edyta Malolepsza
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02142
| | - Stanley B. DeVore
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Asthma Research, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Matthew T. Weirauch
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Ozan Dikilitas
- Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota 55905
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota 55905
| | - Gail P. Jarvik
- Departments of Medicine (Division of Medical Genetics) and Genome Sciences, University of Washington Medical Center, Seattle, Washington 98195
| | - Krzysztof Kiryluk
- Department of Medicine, Division of Nephrology, College of Physicians and Surgeons, Columbia University, New York, New York 10032
| | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota 55905
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, New York 10032
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611
| | | | - Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
- Department of Psychiatry, Harvard Medical School, Boston, Massachusetts 02115
| | - Theresa L. Walunas
- Division of General Internal Medicine and Geriatrics, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611
| | - John Connolly
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Department of Pediatrics, Philadelphia, Pennsylvania 19104
| | - Patrick Sleiman
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Department of Pediatrics, Philadelphia, Pennsylvania 19104
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Tesfaye B. Mersha
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Asthma Research, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Frank D Mentch
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Department of Pediatrics, Philadelphia, Pennsylvania 19104
| | - Hakon Hakonarson
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Department of Pediatrics, Philadelphia, Pennsylvania 19104
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Cynthia A. Prows
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Department of Patient Services, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Jocelyn M. Biagini
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Asthma Research, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Gurjit K. Khurana Hershey
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Asthma Research, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Division of Allergy & Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Lisa J. Martin
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - Leah Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, Ohio 45229
- Division of Allergy & Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio 45229
| | - The eMERGE Network
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892
| |
Collapse
|
21
|
Construction and evaluation of a polygenic hazard score for prognostic assessment in localized gastric cancer. FUNDAMENTAL RESEARCH 2022. [DOI: 10.1016/j.fmre.2022.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
22
|
Zhuang Y, Wolford BN, Nam K, Bi W, Zhou W, Willer CJ, Mukherjee B, Lee S. Incorporating family disease history and controlling case-control imbalance for population-based genetic association studies. Bioinformatics 2022; 38:4337-4343. [PMID: 35876838 PMCID: PMC9477535 DOI: 10.1093/bioinformatics/btac459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 05/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION In the genome-wide association analysis of population-based biobanks, most diseases have low prevalence, which results in low detection power. One approach to tackle the problem is using family disease history, yet existing methods are unable to address type I error inflation induced by increased correlation of phenotypes among closely related samples, as well as unbalanced phenotypic distribution. RESULTS We propose a new method for genetic association test with family disease history, mixed-model-based Test with Adjusted Phenotype and Empirical saddlepoint approximation, which controls for increased phenotype correlation by adopting a two-variance-component mixed model, accounts for case-control imbalance by using empirical saddlepoint approximation, and is flexible to incorporate any existing adjusted phenotypes, such as phenotypes from the LT-FH method. We show through simulation studies and analysis of UK Biobank data of white British samples and the Korean Genome and Epidemiology Study of Korean samples that the proposed method is robust and yields better calibration compared to existing methods while gaining power for detection of variant-phenotype associations. AVAILABILITY AND IMPLEMENTATION The summary statistics and code generated in this study are available at https://github.com/styvon/TAPE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yongwen Zhuang
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Brooke N Wolford
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Kisung Nam
- Graduate School of Data Science, Seoul National University, Seoul, Korea
| | - Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Wei Zhou
- Massachusetts General Hospital, Broad Institute, Boston, MA, USA
| | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Internal Medicine, Division of Cardiology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Michigan Institute of Data Science, University of Michigan, Ann Arbor, MI, USA
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Korea
| |
Collapse
|
23
|
Tian P, Chan TH, Wang YF, Yang W, Yin G, Zhang YD. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front Genet 2022; 13:906965. [PMID: 36061179 PMCID: PMC9438789 DOI: 10.3389/fgene.2022.906965] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 06/27/2022] [Indexed: 11/28/2022] Open
Abstract
Polygenic risk scores (PRS) leverage the genetic contribution of an individual’s genotype to a complex trait by estimating disease risk. Traditional PRS prediction methods are predominantly for the European population. The accuracy of PRS prediction in non-European populations is diminished due to much smaller sample size of genome-wide association studies (GWAS). In this article, we introduced a novel method to construct PRS for non-European populations, abbreviated as TL-Multi, by conducting a transfer learning framework to learn useful knowledge from the European population to correct the bias for non-European populations. We considered non-European GWAS data as the target data and European GWAS data as the informative auxiliary data. TL-Multi borrows useful information from the auxiliary data to improve the learning accuracy of the target data while preserving the efficiency and accuracy. To demonstrate the practical applicability of the proposed method, we applied TL-Multi to predict the risk of systemic lupus erythematosus (SLE) in the Asian population and the risk of asthma in the Indian population by borrowing information from the European population. TL-Multi achieved better prediction accuracy than the competing methods, including Lassosum and meta-analysis in both simulations and real applications.
Collapse
Affiliation(s)
- Peixin Tian
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Tsai Hor Chan
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yong-Fei Wang
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Wanling Yang
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Yan Dora Zhang,
| |
Collapse
|
24
|
Hao X, Liang A, Plastow G, Zhang C, Wang Z, Liu J, Salzano A, Gasparrini B, Campanile G, Zhang S, Yang L. An Integrative Genomic Prediction Approach for Predicting Buffalo Milk Traits by Incorporating Related Cattle QTLs. Genes (Basel) 2022; 13:genes13081430. [PMID: 36011341 PMCID: PMC9408041 DOI: 10.3390/genes13081430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/16/2022] Open
Abstract
Background: The 90K Axiom Buffalo SNP Array is expected to improve and speed up various genomic analyses for the buffalo (Bubalus bubalis). Genomic prediction is an effective approach in animal breeding to improve selection and reduce costs. As buffalo genome research is lagging behind that of the cow and production records are also limited, genomic prediction performance will be relatively poor. To improve the genomic prediction in buffalo, we introduced a new approach (pGBLUP) for genomic prediction of six buffalo milk traits by incorporating QTL information from the cattle milk traits in order to help improve the prediction performance for buffalo. Results: In simulations, the pGBLUP could outperform BayesR and the GBLUP if the prior biological information (i.e., the known causal loci) was appropriate; otherwise, it performed slightly worse than BayesR and equal to or better than the GBLUP. In real data, the heritability of the buffalo genomic region corresponding to the cattle milk trait QTLs was enriched (fold of enrichment > 1) in four buffalo milk traits (FY270, MY270, PY270, and PM) when the EBV was used as the response variable. The DEBV as the response variable yielded more reliable genomic predictions than the traditional EBV, as has been shown by previous research. The performance of the three approaches (GBLUP, BayesR, and pGBLUP) did not vary greatly in this study, probably due to the limited sample size, incomplete prior biological information, and less artificial selection in buffalo. Conclusions: To our knowledge, this study is the first to apply genomic prediction to buffalo by incorporating prior biological information. The genomic prediction of buffalo traits can be further improved with a larger sample size, higher-density SNP chips, and more precise prior biological information.
Collapse
Affiliation(s)
- Xingjie Hao
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Correspondence: (X.H.); (L.Y.)
| | - Aixin Liang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Graham Plastow
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Chunyan Zhang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Zhiquan Wang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Jiajia Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Angela Salzano
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Bianca Gasparrini
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Giuseppe Campanile
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Liguo Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
- Correspondence: (X.H.); (L.Y.)
| |
Collapse
|
25
|
Wang Y, Tsuo K, Kanai M, Neale BM, Martin AR. Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. Annu Rev Biomed Data Sci 2022; 5:293-320. [PMID: 35576555 PMCID: PMC9828290 DOI: 10.1146/annurev-biodatasci-111721-074830] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| |
Collapse
|
26
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data. Brief Bioinform 2022; 23:bbac193. [PMID: 35649346 PMCID: PMC9310531 DOI: 10.1093/bib/bbac193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
With the advances in high-throughput biotechnologies, high-dimensional multi-layer omics data become increasingly available. They can provide both confirmatory and complementary information to disease risk and thus have offered unprecedented opportunities for risk prediction studies. However, the high-dimensionality and complex inter/intra-relationships among multi-omics data have brought tremendous analytical challenges. Here we present a computationally efficient penalized linear mixed model with generalized method of moments estimator (MpLMMGMM) for the prediction analysis on multi-omics data. Our method extends the widely used linear mixed model proposed for genomic risk predictions to model multi-omics data, where kernel functions are used to capture various types of predictive effects from different layers of omics data and penalty terms are introduced to reduce the impact of noise. Compared with existing penalized linear mixed models, the proposed method adopts the generalized method of moments estimator and it is much more computationally efficient. Through extensive simulation studies and the analysis of positron emission tomography imaging outcomes, we have demonstrated that MpLMMGMM can simultaneously consider a large number of variables and efficiently select those that are predictive from the corresponding omics layers. It can capture both linear and nonlinear predictive effects and achieves better prediction performance than competing methods.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| |
Collapse
|
27
|
Liu L, Meng Q, Weng C, Lu Q, Wang T, Wen Y. Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data. PLoS Comput Biol 2022; 18:e1010328. [PMID: 35839250 PMCID: PMC9328574 DOI: 10.1371/journal.pcbi.1010328] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 07/27/2022] [Accepted: 06/27/2022] [Indexed: 11/19/2022] Open
Abstract
Building an accurate disease risk prediction model is an essential step in the modern quest for precision medicine. While high-dimensional genomic data provides valuable data resources for the investigations of disease risk, their huge amount of noise and complex relationships between predictors and outcomes have brought tremendous analytical challenges. Deep learning model is the state-of-the-art methods for many prediction tasks, and it is a promising framework for the analysis of genomic data. However, deep learning models generally suffer from the curse of dimensionality and the lack of biological interpretability, both of which have greatly limited their applications. In this work, we have developed a deep neural network (DNN) based prediction modeling framework. We first proposed a group-wise feature importance score for feature selection, where genes harboring genetic variants with both linear and non-linear effects are efficiently detected. We then designed an explainable transfer-learning based DNN method, which can directly incorporate information from feature selection and accurately capture complex predictive effects. The proposed DNN-framework is biologically interpretable, as it is built based on the selected predictive genes. It is also computationally efficient and can be applied to genome-wide data. Through extensive simulations and real data analyses, we have demonstrated that our proposed method can not only efficiently detect predictive features, but also accurately predict disease risk, as compared to many existing methods. Accurate disease risk prediction is an essential step towards precision medicine. Deep learning models have achieved the state-of-the-art performance for many prediction tasks. However, they generally suffer from the curse of dimensionality and lack of biological interpretability, both of which have greatly limited their applications to the prediction analysis of whole-genome sequencing data. We present here an explainable deep transfer learning model for the analysis of high-dimensional genomic data. Our proposed method can detect predictive genes that harbor genetic variants with both linear and non-linear effects via the proposed group-wise feature importance score. It can also efficiently and accurately model disease risk based on the detected predictive genes using the proposed transfer-learning based network architecture. Our proposed method is built at the gene level, and thus is much more biologically interpretable. It is also computationally efficiently and can be applied to whole-exome sequencing data that have millions of potential predictors. Through both simulation studies and the analysis of whole-exome data obtained from the Alzheimer’s Disease Neuroimaging Initiative, we have demonstrated that our method can efficiently detect predictive genes and it has better prediction performance than many existing methods.
Collapse
Affiliation(s)
- Long Liu
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Qingyu Meng
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Cherry Weng
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Qing Lu
- Department of Biostatistics, University of Florida, Gainesville, Florida, United States of America
| | - Tong Wang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi, China
- * E-mail: (TW); (YW)
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi, China
- Department of Statistics, University of Auckland, Auckland, New Zealand
- * E-mail: (TW); (YW)
| |
Collapse
|
28
|
Moscovich A, Rosset S. On the cross‐validation bias due to unsupervised preprocessing. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Amit Moscovich
- Department of Statistics and Operations Research Tel Aviv University Tel Aviv Israel
| | - Saharon Rosset
- Department of Statistics and Operations Research Tel Aviv University Tel Aviv Israel
| |
Collapse
|
29
|
Seal S, Datta A, Basu S. Efficient estimation of SNP heritability using Gaussian predictive process in large scale cohort studies. PLoS Genet 2022; 18:e1010151. [PMID: 35442943 PMCID: PMC9060362 DOI: 10.1371/journal.pgen.1010151] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 05/02/2022] [Accepted: 03/16/2022] [Indexed: 12/15/2022] Open
Abstract
With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.
Collapse
Affiliation(s)
- Souvik Seal
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Saonli Basu
- Department of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
- * E-mail:
| |
Collapse
|
30
|
Liu YH, Zhang M, Scheuring CF, Cilkiz M, Sze SH, Smith CW, Murray SC, Xu W, Zhang HB. Accurate prediction of complex traits for individuals and offspring from parents using a simple, rapid, and efficient method for gene-based breeding in cotton and maize. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2022; 316:111153. [PMID: 35151437 DOI: 10.1016/j.plantsci.2021.111153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 12/11/2021] [Indexed: 06/14/2023]
Abstract
Accurate, simple, rapid, and inexpensive prediction of complex traits controlled by numerous genes is paramount to enhanced plant breeding, animal breeding, and human medicine. Here we report a novel method that enables accurate, simple, and rapid prediction of complex traits of individuals or offspring from parents based on the number of favorable alleles (NFAs) of the genes controlling the objective traits. The NFAs of 226 cotton fiber length (GFL) genes and nine maize hybrid grain yield related (ZmF1GY) genes were directly used to predict cotton fiber lengths of individual plants and maize grain yields of F1 hybrids from parents, respectively, using prediction model-based methods as controls. The NFAs of the 226 GFL genes predicted cotton fiber lengths at an accuracy of 0.85, as the model methods and outperforming genomic prediction by 82 % - 170 %. The NFAs of the nine ZmF1GY genes predicted grain yields of maize hybrids from parents at an accuracy of 0.80, outperforming genomic prediction by 67 %. Moreover, the prediction accuracies of these traits were consistent across years, environments, and eco-agricultural systems. Importantly, the accurate prediction of these traits directly using the NFAs of the genes allows breeding to be performed in greenhouse, phytotron, or off-season, without the need of the model training and validation steps essential and costly for model-based genomic or genic prediction. Therefore, this new method dramatically outperforms the current model-based genomic methods used for phenotype prediction and streamlines the process of breeding, thus promising to substantially enhance current plant and animal breeding.
Collapse
Affiliation(s)
- Yun-Hua Liu
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Meiping Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Chantel F Scheuring
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Mustafa Cilkiz
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Sing-Hoi Sze
- Department of Computer Science and Engineering and Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX 77843, USA
| | - C Wayne Smith
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Wenwei Xu
- Texas A&M AgriLife Research, Lubbock, TX 79403, USA
| | - Hong-Bin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
31
|
Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022; 23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open
Abstract
Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Jiahao Qiao
- Department of Biostatistics at Xuzhou Medical University, China
| | - Shuo Zhang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Yongyue Wei
- Department of Biostatistics at Nanjing Medical University, China
| | - Ping Zeng
- Department of Biostatistics, Center for Medical Statistics and Data Analysis and Key Laboratory of Human Genetics and Environmental Medicine at Xuzhou Medical University, China
| |
Collapse
|
32
|
Ding Y, Hou K, Burch KS, Lapinska S, Privé F, Vilhjálmsson B, Sankararaman S, Pasaniuc B. Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification. Nat Genet 2022; 54:30-39. [PMID: 34931067 PMCID: PMC8758557 DOI: 10.1038/s41588-021-00961-5] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Accepted: 09/29/2021] [Indexed: 01/05/2023]
Abstract
Although the cohort-level accuracy of polygenic risk scores (PRSs)-estimates of genetic value at the individual level-has been widely assessed, uncertainty in PRSs remains underexplored. In the present study, we show that Bayesian PRS methods can estimate the variance of an individual's PRS and can yield well-calibrated credible intervals via posterior sampling. For 13 real traits in the UK Biobank (n = 291,273 unrelated 'white British'), we observe large variances in individual PRS estimates which impact interpretation of PRS-based stratification; averaging across traits, only 0.8% (s.d. = 1.6%) of individuals with PRS point estimates in the top decile have corresponding 95% credible intervals fully contained in the top decile. We provide an analytical estimator for the expectation of individual PRS variance as a function of SNP heritability, number of causal SNPs and sample size. Our results showcase the importance of incorporating uncertainty in individual PRS estimates into subsequent analyses.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Sandra Lapinska
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
| | - Florian Privé
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni Vilhjálmsson
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| |
Collapse
|
33
|
Schendel D, Laursen TM, Albiñana C, Vilhjalmsson B, Ladd-Acosta C, Fallin MD, Benke K, Lee B, Grove J, Kalkbrenner A, Ejlskov L, Hougaard D, Bybjerg-Grauholm J, Bækvad-Hansen M, Børglum AD, Werge T, Nordentoft M, Mortensen PB, Agerbo E. Evaluating the interrelations between the autism polygenic score and psychiatric family history in risk for autism. Autism Res 2022; 15:171-182. [PMID: 34664785 PMCID: PMC11289736 DOI: 10.1002/aur.2629] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/19/2021] [Accepted: 10/01/2021] [Indexed: 01/25/2023]
Abstract
Psychiatric family history or a high autism polygenic risk score (PRS) have been separately linked to autism spectrum disorder (ASD) risk. The study aimed to simultaneously consider psychiatric family history and individual autism genetic liability (PRS) in autism risk. We performed a case-control study of all Denmark singleton births, May 1981-December 2005, in Denmark at their first birthday and a known mother. Cases were diagnosed with ASD before 2013 and controls comprised a random sample of 30,000 births without ASD, excluding persons with non-Denmark-born parents, missing ASD PRS, non-European ancestry. Adjusted odds ratios (aOR) were estimated for ASD by PRS decile and by psychiatric history in parents or full siblings (8 mutually-exclusive categories) using logistic regression. Adjusted ASD PRS z-score least-squares means were estimated by psychiatric family history category. ASD risk (11,339 ASD cases; 20,175 controls) from ASD PRS was not substantially altered after accounting for psychiatric family history (e.g., ASD PRS 10th decile aOR: 2.35 (95% CI 2.11-2.63) before vs 2.11 (95% CI 1.91-2.40) after adjustment) nor from psychiatric family history after accounting for ASD PRS (e.g., ASD family history aOR: 6.73 (95% CI 5.89-7.68) before vs 6.32 (95% CI 5.53-7.22) after adjustment). ASD risk from ASD PRS varied slightly by psychiatric family history. While ASD risk from psychiatric family history was not accounted for by ASD PRS and vice versa, risk overlap between the two factors will likely increase as measures of genetic risk improve. The two factors are best viewed as complementary measures of family-based autism risk. LAY SUMMARY: Autism risk from a history of mental disorders in the immediate family was not explained by a measure of individual genetic risk (autism polygenic risk score) and vice versa. That is, genetic risk did not appear to overlap family history risk. As genetic measures for autism improve then the overlap in autism risk from family history versus genetic factors will likely increase, but further study may be needed to fully determine the components of risk and how they are inter-related between these key family factors. Meanwhile, the two factors may be best viewed as complementary measures of autism family-based risk.
Collapse
Affiliation(s)
- Diana Schendel
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Department of Public Health, Aarhus University, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
- A. J. Drexel Autism Institute, Drexel University, Philadelphia, USA
| | - Thomas Munk Laursen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
| | - Clara Albiñana
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
| | - Bjarni Vilhjalmsson
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - M Danielle Fallin
- Wendy Klag Center for Autism and Developmental Disabilities, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Brian Lee
- A. J. Drexel Autism Institute, Drexel University, Philadelphia, USA
- Drexel University Dornsife School of Public Health, Philadelphia, USA
- Department of Public Health Sciences, Karolinska Institutet, Stockholm, Sweden
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Centre for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, Denmark
- Department of Biomedicine – Human Genetics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Centre for Genomics and Personalized Medicine, Aarhus, Denmark
| | - Amy Kalkbrenner
- University of Wisconsin Milwaukee, Joseph J Zilber School of Public Health, Milwaukee, WI
| | - Linda Ejlskov
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
| | - David Hougaard
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Jonas Bybjerg-Grauholm
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Marie Bækvad-Hansen
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Centre for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, Denmark
- Department of Biomedicine – Human Genetics, Aarhus University, Aarhus, Denmark
- Centre for Genomics and Personalized Medicine, Aarhus, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Mental Health Center Copenhagen, Copenhagen, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research (CIRRAU), Aarhus University, Aarhus, Denmark
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research (NCRR), Department of Economics and Business, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research (CIRRAU), Aarhus University, Aarhus, Denmark
| |
Collapse
|
34
|
Duan J, Zhang J, Liu L, Wen Y. A guidance of model selection for genomic prediction based on linear mixed models for complex traits. Front Genet 2022; 13:1017380. [PMID: 36276959 PMCID: PMC9581223 DOI: 10.3389/fgene.2022.1017380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 09/20/2022] [Indexed: 11/27/2022] Open
Abstract
Brain imaging outcomes are important for Alzheimer's disease (AD) detection, and their prediction based on both genetic and demographic risk factors can facilitate the ongoing prevention and treatment of AD. Existing studies have identified numerous significantly AD-associated SNPs. However, how to make the best use of them for prediction analyses remains unknown. In this research, we first explored the relationship between genetic architecture and prediction accuracy of linear mixed models via visualizing the Manhattan plots generated based on the data obtained from the Wellcome Trust Case Control Consortium, and then constructed prediction models for eleven AD-related brain imaging outcomes using data from United Kingdom Biobank and Alzheimer's Disease Neuroimaging Initiative studies. We found that the simple Manhattan plots can be informative for the selection of prediction models. For traits that do not exhibit any significant signals from the Manhattan plots, the simple genomic best linear unbiased prediction (gBLUP) model is recommended due to its robust and accurate prediction performance as well as its computational efficiency. For diseases and traits that show spiked signals on the Manhattan plots, the latent Dirichlet process regression is preferred, as it can flexibly accommodate both the oligogenic and omnigenic models. For the prediction of AD-related traits, the Manhattan plots suggest their polygenic nature, and gBLUP has achieved robust performance for all these traits. We found that for these AD-related traits, genetic factors themselves only explain a very small proportion of the heritability, and the well-known AD risk factors can substantially improve the prediction model.
Collapse
Affiliation(s)
- Jiefang Duan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Jiayu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Yalu Wen
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.,Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
35
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
36
|
Wang Z, Cheng H. Single-Trait and Multiple-Trait Genomic Prediction From Multi-Class Bayesian Alphabet Models Using Biological Information. Front Genet 2021; 12:717457. [PMID: 34707638 PMCID: PMC8542848 DOI: 10.3389/fgene.2021.717457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 08/23/2021] [Indexed: 11/13/2022] Open
Abstract
Genomic prediction has been widely used in multiple areas and various genomic prediction methods have been developed. The majority of these methods, however, focus on statistical properties and ignore the abundant useful biological information like genome annotation or previously discovered causal variants. Therefore, to improve prediction performance, several methods have been developed to incorporate biological information into genomic prediction, mostly in single-trait analysis. A commonly used method to incorporate biological information is allocating molecular markers into different classes based on the biological information and assigning separate priors to molecular markers in different classes. It has been shown that such methods can achieve higher prediction accuracy than conventional methods in some circumstances. However, these methods mainly focus on single-trait analysis, and available priors of these methods are limited. Thus, in both single-trait and multiple-trait analysis, we propose the multi-class Bayesian Alphabet methods, in which multiple Bayesian Alphabet priors, including RR-BLUP, BayesA, BayesB, BayesCΠ, and Bayesian LASSO, can be used for markers allocated to different classes. The superior performance of the multi-class Bayesian Alphabet in genomic prediction is demonstrated using both real and simulated data. The software tool JWAS offers open-source routines to perform these analyses.
Collapse
Affiliation(s)
- Zigui Wang
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|
37
|
Márquez-Luna C, Gazal S, Loh PR, Kim SS, Furlotte N, Auton A, Price AL. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat Commun 2021; 12:6052. [PMID: 34663819 PMCID: PMC8523709 DOI: 10.1038/s41467-021-25171-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 07/16/2021] [Indexed: 12/23/2022] Open
Abstract
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Collapse
Affiliation(s)
- Carla Márquez-Luna
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Steven Gazal
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel S Kim
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
38
|
Zhao Z, Yi Y, Song J, Wu Y, Zhong X, Lin Y, Hohman TJ, Fletcher J, Lu Q. PUMAS: fine-tuning polygenic risk scores with GWAS summary statistics. Genome Biol 2021; 22:257. [PMID: 34488838 PMCID: PMC8419981 DOI: 10.1186/s13059-021-02479-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 08/25/2021] [Indexed: 12/20/2022] Open
Abstract
Polygenic risk scores (PRSs) have wide applications in human genetics research, but often include tuning parameters which are difficult to optimize in practice due to limited access to individual-level data. Here, we introduce PUMAS, a novel method to fine-tune PRS models using summary statistics from genome-wide association studies (GWASs). Through extensive simulations, external validations, and analysis of 65 traits, we demonstrate that PUMAS can perform various model-tuning procedures using GWAS summary statistics and effectively benchmark and optimize PRS models under diverse genetic architecture. Furthermore, we show that fine-tuned PRSs will significantly improve statistical power in downstream association analysis.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53703 USA
| | - Yanyao Yi
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Jie Song
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53703 USA
| | | | - Yupei Lin
- University of Wisconsin-Madison, Madison, WI USA
| | - Timothy J. Hohman
- Vanderbilt Memory and Alzheimer’s Center, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, TN USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN USA
| | - Jason Fletcher
- La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI USA
- Department of Sociology, University of Wisconsin-Madison, Madison, WI USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53703 USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI USA
| |
Collapse
|
39
|
Song Y, Chen P, Xuan A, Bu C, Liu P, Ingvarsson PK, El-Kassaby YA, Zhang D. Integration of genome wide association studies and co-expression networks reveal roles of PtoWRKY 42-PtoUGT76C1-1 in trans-zeatin metabolism and cytokinin sensitivity in poplar. THE NEW PHYTOLOGIST 2021; 231:1462-1477. [PMID: 33999454 DOI: 10.1111/nph.17469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 05/07/2021] [Indexed: 06/12/2023]
Abstract
Cytokinins are important for in vitro shoot regeneration in plants. Cytokinin N-glucosides are produced via an irreversible glycosylation pathway, which regulates the endogenous cytokinin content. Although cytokinin N-glucoside pathways have been uncovered in higher plants, no regulator has been identified to date. We performed a metabolome genome-wide association study (mGWAS), weighted gene co-expression network analysis (WGCNA), and expression quantitative trait nucleotide (eQTN) mappings to build a core triple genetic network (mGWAS-gene expression-phenotype) for the trans-zeatin N-glucoside (ZNG) metabolite using data from 435 unrelated Populus tomentosa individuals. Variation of the ZNG level in poplar is attributed to the differential transcription of PtoWRKY42, a member of WRKY multigene family group IIb. Functional analysis revealed that PtoWRKY42 negatively regulated ZNG accumulation by binding directly to the W-box of the UDP-glycosyltransferase 76C 1-1 (PtoUGT761-1) promoter. Also, PtoWRKY42 was strongly induced by leaf senescence, 6-BA, wounding, and salt stress, resulting in a reduced ZNG level. We identified PtoWRKY42, a negative regulator of cytokinin N-glucosides, which contributes to the natural variation in ZNG level and mediates ZNG accumulation by directly modulating the key glycosyltransferase gene PtoUGT76C1-1.
Collapse
Affiliation(s)
- Yuepeng Song
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
| | - Panfei Chen
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
| | - Anran Xuan
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
| | - Chenhao Bu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
| | - Peng Liu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
| | - Pär K Ingvarsson
- Department of Plant Biology, Linnean Center for Plant Biology, Swedish University of Agricultural Sciences, Box 7080, Uppsala, SE-750 07, Sweden
| | - Yousry A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Deqiang Zhang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, College of Biological Sciences and Technology, Beijing Forestry University, No. 35 Qinghua East Road, Beijing, 100083, China
| |
Collapse
|
40
|
Quan M, Liu X, Du Q, Xiao L, Lu W, Fang Y, Li P, Ji L, Zhang D. Genome-wide association studies reveal the coordinated regulatory networks underlying photosynthesis and wood formation in Populus. JOURNAL OF EXPERIMENTAL BOTANY 2021; 72:5372-5389. [PMID: 33733665 DOI: 10.1093/jxb/erab122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 03/16/2021] [Indexed: 06/12/2023]
Abstract
Photosynthesis and wood formation underlie the ability of trees to provide renewable resources and perform ecological functions; however, the genetic basis and regulatory pathways coordinating these two linked processes remain unclear. Here, we used a systems genetics strategy, integrating genome-wide association studies, transcriptomic analyses, and transgenic experiments, to investigate the genetic architecture of photosynthesis and wood properties among 435 unrelated individuals of Populus tomentosa, and unravel the coordinated regulatory networks resulting in two trait categories. We detected 222 significant single-nucleotide polymorphisms, annotated to 177 candidate genes, for 10 traits of photosynthesis and wood properties. Epistasis uncovered 74 epistatic interactions for phenotypes. Strikingly, we deciphered the coordinated regulation patterns of pleiotropic genes underlying phenotypic variations for two trait categories. Furthermore, expression quantitative trait nucleotide mapping and coexpression analysis were integrated to unravel the potential transcriptional regulatory networks of candidate genes coordinating photosynthesis and wood properties. Finally, heterologous expression of two pleiotropic genes, PtoMYB62 and PtoMYB80, in Arabidopsis thaliana demonstrated that they control regulatory networks balancing photosynthesis and stem secondary cell wall components, respectively. Our study provides insights into the regulatory mechanisms coordinating photosynthesis and wood formation in poplar, and should facilitate genetic breeding in trees via molecular design.
Collapse
Affiliation(s)
- Mingyang Quan
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, P. R. China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Xin Liu
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Qingzhang Du
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, P. R. China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Liang Xiao
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Wenjie Lu
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Yuanyuan Fang
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Peng Li
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Li Ji
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, P. R. China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| | - Deqiang Zhang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, P. R. China
- National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
- Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China
| |
Collapse
|
41
|
Zhang Q, Privé F, Vilhjálmsson B, Speed D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat Commun 2021; 12:4192. [PMID: 34234142 PMCID: PMC8263809 DOI: 10.1038/s41467-021-24485-y] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 06/17/2021] [Indexed: 02/06/2023] Open
Abstract
Most existing tools for constructing genetic prediction models begin with the assumption that all genetic variants contribute equally towards the phenotype. However, this represents a suboptimal model for how heritability is distributed across the genome. Therefore, we develop prediction tools that allow the user to specify the heritability model. We compare individual-level data prediction tools using 14 UK Biobank phenotypes; our new tool LDAK-Bolt-Predict outperforms the existing tools Lasso, BLUP, Bolt-LMM and BayesR for all 14 phenotypes. We compare summary statistic prediction tools using 225 UK Biobank phenotypes; our new tool LDAK-BayesR-SS outperforms the existing tools lassosum, sBLUP, LDpred and SBayesR for 223 of the 225 phenotypes. When we improve the heritability model, the proportion of phenotypic variance explained increases by on average 14%, which is equivalent to increasing the sample size by a quarter.
Collapse
Affiliation(s)
- Qianqian Zhang
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
| | - Florian Privé
- National Center for Register-Based Research (NCRR), Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
| | - Bjarni Vilhjálmsson
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark
- National Center for Register-Based Research (NCRR), Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
| | - Doug Speed
- Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark.
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
- Aarhus Institute of Advanced Studies (AIAS), Aarhus University, Aarhus, Denmark.
| |
Collapse
|
42
|
Albiñana C, Grove J, McGrath JJ, Agerbo E, Wray NR, Bulik CM, Nordentoft M, Hougaard DM, Werge T, Børglum AD, Mortensen PB, Privé F, Vilhjálmsson BJ. Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction. Am J Hum Genet 2021; 108:1001-1011. [PMID: 33964208 PMCID: PMC8206385 DOI: 10.1016/j.ajhg.2021.04.014] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 04/20/2021] [Indexed: 12/12/2022] Open
Abstract
The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.
Collapse
Affiliation(s)
- Clara Albiñana
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark.
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000 Aarhus C, Denmark; Center for Genomics and Personalized Medicine, CGPM, Aarhus University, 8000 Aarhus C, Denmark; Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark; Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD 4076, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark
| | - Naomi R Wray
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| | - Cynthia M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 171 77 Stockholm, Sweden; Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Copenhagen University Hospital, Mental Health Centre Copenhagen Mental Health Services in the Capital Region of Denmark, 2100 Copenhagen Ø, Denmark; Department of Clinical Medicine, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, 2300 Copenhagen S, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Institute of Biological Psychiatry, MHC Sct. Hans, Mental Health Services Copenhagen, 4000 Roskilde, Denmark; Department of Clinical Medicine, University of Copenhagen, 2200 Copenhagen N, Denmark; Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000 Aarhus C, Denmark; Center for Genomics and Personalized Medicine, CGPM, Aarhus University, 8000 Aarhus C, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark
| | - Florian Privé
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark
| | - Bjarni J Vilhjálmsson
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark; Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark.
| |
Collapse
|
43
|
Arouisse B, Theeuwen TPJM, van Eeuwijk FA, Kruijer W. Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes. Front Genet 2021; 12:667358. [PMID: 34108993 PMCID: PMC8181460 DOI: 10.3389/fgene.2021.667358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 04/14/2021] [Indexed: 11/17/2022] Open
Abstract
In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or “secondary” traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set.
Collapse
Affiliation(s)
- Bader Arouisse
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| | - Tom P J M Theeuwen
- Laboratory of Genetics, Wageningen University and Research, Wageningen, Netherlands
| | | | - Willem Kruijer
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
44
|
Williams CJ, Li Z, Harvey N, Lea RA, Gurd BJ, Bonafiglia JT, Papadimitriou I, Jacques M, Croci I, Stensvold D, Wisloff U, Taylor JL, Gajanand T, Cox ER, Ramos JS, Fassett RG, Little JP, Francois ME, Hearon CM, Sarma S, Janssen SLJE, Van Craenenbroeck EM, Beckers P, Cornelissen VA, Howden EJ, Keating SE, Yan X, Bishop DJ, Bye A, Haupt LM, Griffiths LR, Ashton KJ, Brown MA, Torquati L, Eynon N, Coombes JS. Genome wide association study of response to interval and continuous exercise training: the Predict-HIIT study. J Biomed Sci 2021; 28:37. [PMID: 33985508 PMCID: PMC8117553 DOI: 10.1186/s12929-021-00733-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low cardiorespiratory fitness (V̇O2peak) is highly associated with chronic disease and mortality from all causes. Whilst exercise training is recommended in health guidelines to improve V̇O2peak, there is considerable inter-individual variability in the V̇O2peak response to the same dose of exercise. Understanding how genetic factors contribute to V̇O2peak training response may improve personalisation of exercise programs. The aim of this study was to identify genetic variants that are associated with the magnitude of V̇O2peak response following exercise training. METHODS Participant change in objectively measured V̇O2peak from 18 different interventions was obtained from a multi-centre study (Predict-HIIT). A genome-wide association study was completed (n = 507), and a polygenic predictor score (PPS) was developed using alleles from single nucleotide polymorphisms (SNPs) significantly associated (P < 1 × 10-5) with the magnitude of V̇O2peak response. Findings were tested in an independent validation study (n = 39) and compared to previous research. RESULTS No variants at the genome-wide significance level were found after adjusting for key covariates (baseline V̇O2peak, individual study, principal components which were significantly associated with the trait). A Quantile-Quantile plot indicates there was minor inflation in the study. Twelve novel loci showed a trend of association with V̇O2peak response that reached suggestive significance (P < 1 × 10-5). The strongest association was found near the membrane associated guanylate kinase, WW and PDZ domain containing 2 (MAGI2) gene (rs6959961, P = 2.61 × 10-7). A PPS created from the 12 lead SNPs was unable to predict V̇O2peak response in a tenfold cross validation, or in an independent (n = 39) validation study (P > 0.1). Significant correlations were found for beta coefficients of variants in the Predict-HIIT (P < 1 × 10-4) and the validation study (P < × 10-6), indicating that general effects of the loci exist, and that with a higher statistical power, more significant genetic associations may become apparent. CONCLUSIONS Ongoing research and validation of current and previous findings is needed to determine if genetics does play a large role in V̇O2peak response variance, and whether genomic predictors for V̇O2peak response trainability can inform evidence-based clinical practice. Trial registration Australian New Zealand Clinical Trials Registry (ANZCTR), Trial Id: ACTRN12618000501246, Date Registered: 06/04/2018, http://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=374601&isReview=true .
Collapse
Affiliation(s)
- Camilla J Williams
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia
| | - Zhixiu Li
- Translational Genomics Group, Institute of Health and Biomedical Innovation, Woolloongabba, Brisbane, QLD, Australia
| | - Nicholas Harvey
- Faculty of Health Sciences and Medicine, Bond University, Robina, QLD, Australia.,Queensland University of Technology (QUT), Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Kelvin Grove, Brisbane, QLD, Australia
| | - Rodney A Lea
- Queensland University of Technology (QUT), Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Kelvin Grove, Brisbane, QLD, Australia
| | - Brendon J Gurd
- School of Kinesiology and Health Studies, Queen's University, Kingston, ON, Canada
| | - Jacob T Bonafiglia
- School of Kinesiology and Health Studies, Queen's University, Kingston, ON, Canada
| | - Ioannis Papadimitriou
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, VIC, Australia
| | - Macsue Jacques
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, VIC, Australia
| | - Ilaria Croci
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia.,Cardiac Exercise Research Group (CERG), Department of Circulation and Medical Imaging, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway.,Department of Sport, Movement and Health, University of Basel, Basel, Switzerland
| | - Dorthe Stensvold
- Cardiac Exercise Research Group (CERG), Department of Circulation and Medical Imaging, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Ulrik Wisloff
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia.,Cardiac Exercise Research Group (CERG), Department of Circulation and Medical Imaging, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Jenna L Taylor
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia
| | - Trishan Gajanand
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia
| | - Emily R Cox
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia
| | - Joyce S Ramos
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia.,Caring Futures Institute, SHAPE Research Centre, Exercise Science and Clinical Exercise Physiology, College of Nursing and Health Sciences, Flinders University, Adelaide, SA, Australia
| | - Robert G Fassett
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia
| | - Jonathan P Little
- School of Health and Exercise Sciences, University of British Columbia, Kelowna, BC, Canada
| | - Monique E Francois
- School of Health and Exercise Sciences, University of British Columbia, Kelowna, BC, Canada
| | - Christopher M Hearon
- Internal Medicine, Institute for Exercise and Environmental Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Satyam Sarma
- Internal Medicine, Institute for Exercise and Environmental Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Sylvan L J E Janssen
- Internal Medicine, Institute for Exercise and Environmental Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA.,Department of Physiology, Radboud University Medical Center, Nijmegen, Netherlands
| | | | - Paul Beckers
- Department of Cardiology, Antwerp University Hospital, Antwerp, Belgium
| | - Véronique A Cornelissen
- Department of Rehabilitation Sciences - Research Group for Rehabilitation in Internal Disorders, Catholic University of Leuven, Leuven, Belgium
| | - Erin J Howden
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Shelley E Keating
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia
| | - Xu Yan
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, VIC, Australia.,Australia Institute for Musculoskeletal Sciences (AIMSS), Melbourne, VIC, Australia
| | - David J Bishop
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, VIC, Australia.,School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
| | - Anja Bye
- Cardiac Exercise Research Group (CERG), Department of Circulation and Medical Imaging, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway.,Department of Cardiology, St. Olavs Hospital, Trondheim, Norway
| | - Larisa M Haupt
- Queensland University of Technology (QUT), Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Kelvin Grove, Brisbane, QLD, Australia
| | - Lyn R Griffiths
- Queensland University of Technology (QUT), Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Kelvin Grove, Brisbane, QLD, Australia
| | - Kevin J Ashton
- Faculty of Health Sciences and Medicine, Bond University, Robina, QLD, Australia
| | - Matthew A Brown
- Guy's and St Thomas' NHS Foundation Trust and King's College London, London, UK
| | - Luciana Torquati
- Department of Sport and Health Sciences, University of Exeter, Exeter, UK
| | - Nir Eynon
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, VIC, Australia
| | - Jeff S Coombes
- Centre for Research on Exercise, Physical Activity and Health, School of Human Movement and Nutrition Sciences, University of Queensland, St. Lucia, Brisbane, QLD, Australia.
| |
Collapse
|
45
|
Rice BR, Lipka AE. Diversifying maize genomic selection models. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:33. [PMID: 37309328 PMCID: PMC10236107 DOI: 10.1007/s11032-021-01221-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/07/2021] [Indexed: 06/14/2023]
Abstract
Genomic selection (GS) is one of the most powerful tools available for maize breeding. Its use of genome-wide marker data to estimate breeding values translates to increased genetic gains with fewer breeding cycles. In this review, we cover the history of GS and highlight particular milestones during its adaptation to maize breeding. We discuss how GS can be applied to developing superior maize inbreds and hybrids. Additionally, we characterize refinements in GS models that could enable the encapsulation of non-additive genetic effects, genotype by environment interactions, and multiple levels of the biological hierarchy, all of which could ultimately result in more accurate predictions of breeding values. Finally, we suggest the stages in a maize breeding program where it would be beneficial to apply GS. Given the current sophistication of high-throughput phenotypic, genotypic, and other -omic level data currently available to the maize community, now is the time to explore the implications of their incorporation into GS models and thus ensure that genetic gains are being achieved as quickly and efficiently as possible.
Collapse
Affiliation(s)
- Brian R. Rice
- Department of Crop Sciences, University of Illinois, Urbana, IL USA
| | | |
Collapse
|
46
|
Rohde PD, Kristensen TN, Sarup P, Muñoz J, Malmendal A. Prediction of complex phenotypes using the Drosophila melanogaster metabolome. Heredity (Edinb) 2021; 126:717-732. [PMID: 33510469 PMCID: PMC8102504 DOI: 10.1038/s41437-021-00404-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/04/2021] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
Understanding the genotype-phenotype map and how variation at different levels of biological organization is associated are central topics in modern biology. Fast developments in sequencing technologies and other molecular omic tools enable researchers to obtain detailed information on variation at DNA level and on intermediate endophenotypes, such as RNA, proteins and metabolites. This can facilitate our understanding of the link between genotypes and molecular and functional organismal phenotypes. Here, we use the Drosophila melanogaster Genetic Reference Panel and nuclear magnetic resonance (NMR) metabolomics to investigate the ability of the metabolome to predict organismal phenotypes. We performed NMR metabolomics on four replicate pools of male flies from each of 170 different isogenic lines. Our results show that metabolite profiles are variable among the investigated lines and that this variation is highly heritable. Second, we identify genes associated with metabolome variation. Third, using the metabolome gave better prediction accuracies than genomic information for four of five quantitative traits analyzed. Our comprehensive characterization of population-scale diversity of metabolomes and its genetic basis illustrates that metabolites have large potential as predictors of organismal phenotypes. This finding is of great importance, e.g., in human medicine, evolutionary biology and animal and plant breeding.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| | - Torsten Nygaard Kristensen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
- Department of Animal Science, Aarhus University, Tjele, Denmark
| | - Pernille Sarup
- Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
- Nordic Seed A/S, Odder, Denmark
| | - Joaquin Muñoz
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Anders Malmendal
- Department of Science and Environment, Roskilde University, Roskilde, Denmark.
| |
Collapse
|
47
|
Li Z, Wu X, Leo PJ, De Guzman E, Akkoc N, Breban M, Macfarlane GJ, Mahmoudi M, Marzo-Ortega H, Anderson LK, Wheeler L, Chou CT, Harrison AA, Stebbings S, Jones GT, Bang SY, Wang G, Jamshidi A, Farhadi E, Song J, Lin L, Li M, Wei JCC, Martin NG, Wright MJ, Lee M, Wang Y, Zhan J, Zhang JS, Wang X, Jin ZB, Weisman MH, Gensler LS, Ward MM, Rahbar MH, Diekman L, Kim TH, Reveille JD, Wordsworth BP, Xu H, Brown MA. Polygenic Risk Scores have high diagnostic capacity in ankylosing spondylitis. Ann Rheum Dis 2021; 80:1168-1174. [PMID: 34161253 PMCID: PMC8364478 DOI: 10.1136/annrheumdis-2020-219446] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 03/23/2021] [Accepted: 03/29/2021] [Indexed: 12/26/2022]
Abstract
Objective We sought to test the hypothesis that Polygenic Risk Scores (PRSs) have strong capacity to discriminate cases of ankylosing spondylitis (AS) from healthy controls and individuals in the community with chronic back pain. Methods PRSs were developed and validated in individuals of European and East Asian ethnicity, using data from genome-wide association studies in 15 585 AS cases and 20 452 controls. The discriminatory values of PRSs in these populations were compared with other widely used diagnostic tests, including C-reactive protein (CRP), HLA-B27 and sacroiliac MRI. Results In people of European descent, PRS had high discriminatory capacity with area under the curve (AUC) in receiver operator characteristic analysis of 0.924. This was significantly better than for HLA-B27 testing alone (AUC=0.869), MRI (AUC=0.885) or C-reactive protein (AUC=0.700). PRS developed and validated in individuals of East Asian descent performed similarly (AUC=0.948). Assuming a prior probability of AS of 10% such as in patients with chronic back pain under 45 years of age, compared with HLA-B27 testing alone, PRS provides higher positive values for 35% of patients and negative predictive values for 67.5% of patients. For PRS, in people of European descent, the maximum positive predictive value was 78.2% and negative predictive value was 100%, whereas for HLA-B27, these values were 51.9% and 97.9%, respectively. Conclusions PRS have higher discriminatory capacity for AS than CRP, sacroiliac MRI or HLA-B27 status alone. For optimal performance, PRS should be developed for use in the specific ethnic groups to which they are to be applied.
Collapse
Affiliation(s)
- Zhixiu Li
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Xin Wu
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai, Shanghai, China
| | - Paul J Leo
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Erika De Guzman
- Australian Translational Genomics Centre, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Nurullah Akkoc
- Department of Internal Medicine, Division of Rheumatology, School of Medicine, Manisa Celal Bayar University, Manisa, Turkey
| | - Maxime Breban
- UMR 1173, Inserm, University of Versailles Saint-Quentin, Montigny-le-Bretonneux, France.,Service de Rhumatologie, Hôpital Ambroise Paré, Assistance Publique-Hôpitaux de Paris, Boulogne-Billancourt, France.,Laboratoire d'Excellence Inflamex, Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Gary J Macfarlane
- Epidemiology Group, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Foresterhill, Aberdeen, UK.,Aberdeen Centre for Arthritis and Musculoskeletal Health, University of Aberdeen, Foresterhill, Aberdeen, UK
| | - Mahdi Mahmoudi
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Tehran, Iran (the Islamic Republic of)
| | - Helena Marzo-Ortega
- NIHR Leeds Biomedical Research Centre, Leeds Teaching Hospitals NHS Trust, Leeds, UK.,Leeds Institute of Rheumatic and Musculoskeletal Medicine, University of Leeds, Leeds, UK
| | - Lisa K Anderson
- Australian Translational Genomics Centre, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Lawrie Wheeler
- Australian Translational Genomics Centre, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Chung-Tei Chou
- Division of Allergy, Immunology, Rheumatology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan.,School of Medicine, National Yang-Ming University, Taipei, Taiwan
| | - Andrew A Harrison
- Department of Medicine, University of Otago Wellington, Wellington, New Zealand
| | - Simon Stebbings
- Department of Medicine, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand
| | - Gareth T Jones
- Epidemiology Group, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Foresterhill, Aberdeen, UK.,Aberdeen Centre for Arthritis and Musculoskeletal Health, University of Aberdeen, Foresterhill, Aberdeen, UK
| | - So-Young Bang
- Hanyang University Hospital for Rheumatic Diseases, Hanyang University, Seoul, Korea (the Republic of)
| | - Geng Wang
- University of Queensland Diamantina Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Ahmadreza Jamshidi
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Tehran, Iran (the Islamic Republic of)
| | - Elham Farhadi
- Rheumatology Research Center, Tehran University of Medical Sciences, Tehran, Tehran, Iran (the Islamic Republic of)
| | - Jing Song
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai, Shanghai, China
| | - Li Lin
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai, Shanghai, China
| | - Mengmeng Li
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai, Shanghai, China
| | - James Cheng-Chung Wei
- Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan.,Department of Medicine, Chung Shan Medical University, Taichung, Taiwan.,Graduate Institute of Integrated Medicine, China Medical University, Taichung, Taiwan
| | - Nicholas G Martin
- QIMR Berghofer Medical Research Institute, Herston, Queensland, Australia
| | - Margaret J Wright
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - MinJae Lee
- Population & Data Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Yuqin Wang
- State Key Laboratory of Optometry, Ophthalmology, and Vision Science, Affiliated Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Nathan, Queensland, Australia
| | - Jin-San Zhang
- Center for Precision Medicine, First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China.,Institute of Life Sciences, Wenzhou University, Wenzhou, Zhejiang, China
| | - Xiaobing Wang
- Rheumatology Department, First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Zi-Bing Jin
- Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing Ophthalmology & Visual Sciences Key Lab, Beijing, Beijing, China
| | - Michael H Weisman
- Department of Medicine/Rheumatology, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Lianne S Gensler
- Division of Medicine/Rheumatology, University of California San Francisco, San Francisco, California, USA
| | - Michael M Ward
- Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Mohammad Hossein Rahbar
- Internal Medicine, The University of Texas Health Science Center at Houston John P and Katherine G McGovern Medical School, Houston, Texas, USA
| | - Laura Diekman
- Department of Internal Medicine, Division of Rheumatology, McGovern Medical School at The University of Texas Health Science Center, Houston, Texas, USA
| | - Tae-Hwan Kim
- Hanyang University Hospital for Rheumatic Diseases, Hanyang University, Seoul, Korea (the Republic of)
| | - John D Reveille
- Department of Internal Medicine, Division of Rheumatology, McGovern Medical School at The University of Texas Health Science Center, Houston, Texas, USA
| | - Bryan Paul Wordsworth
- NIHR Oxford Musculoskeletal Biomedical Research Unit, Botnar Research Centre, University of Oxford, Oxford, Oxfordshire, UK
| | - Huji Xu
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai, Shanghai, China .,School of Clinical Medicine, Tsinghua University, Beijing, Beijing, China.,Peking-Tsinghua Center for Life Sciences, Tsinghua University, Beijing, China
| | - Matthew A Brown
- Center for Precision Medicine, First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China .,NIHR Biomedical Research Centre at Guy's and Saint Thomas' NHS Foundation Trust and King's College London, London, UK
| | | |
Collapse
|
48
|
Hai Y, Wen Y. A Bayesian linear mixed model for prediction of complex traits. Bioinformatics 2021; 36:5415-5423. [PMID: 33331865 PMCID: PMC8016495 DOI: 10.1093/bioinformatics/btaa1023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 11/24/2020] [Accepted: 11/27/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. RESULTS We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer's Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. AVAILABILITYAND IMPLEMENTATION The R-package is available at https://github.com/yhai943/BLMM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
49
|
Campbell MT, Hu H, Yeats TH, Brzozowski LJ, Caffe-Treml M, Gutiérrez L, Smith KP, Sorrells ME, Gore MA, Jannink JL. Improving Genomic Prediction for Seed Quality Traits in Oat (Avena sativa L.) Using Trait-Specific Relationship Matrices. Front Genet 2021; 12:643733. [PMID: 33868378 PMCID: PMC8044359 DOI: 10.3389/fgene.2021.643733] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/04/2021] [Indexed: 11/13/2022] Open
Abstract
The observable phenotype is the manifestation of information that is passed along different organization levels (transcriptional, translational, and metabolic) of a biological system. The widespread use of various omic technologies (RNA-sequencing, metabolomics, etc.) has provided plant genetics and breeders with a wealth of information on pertinent intermediate molecular processes that may help explain variation in conventional traits such as yield, seed quality, and fitness, among others. A major challenge is effectively using these data to help predict the genetic merit of new, unobserved individuals for conventional agronomic traits. Trait-specific genomic relationship matrices (TGRMs) model the relationships between individuals using genome-wide markers (SNPs) and place greater emphasis on markers that most relevant to the trait compared to conventional genomic relationship matrices. Given that these approaches define relationships based on putative causal loci, it is expected that these approaches should improve predictions for related traits. In this study we evaluated the use of TGRMs to accommodate information on intermediate molecular phenotypes (referred to as endophenotypes) and to predict an agronomic trait, total lipid content, in oat seed. Nine fatty acids were quantified in a panel of 336 oat lines. Marker effects were estimated for each endophenotype, and were used to construct TGRMs. A multikernel TRGM model (MK-TRGM-BLUP) was used to predict total seed lipid content in an independent panel of 210 oat lines. The MK-TRGM-BLUP approach significantly improved predictions for total lipid content when compared to a conventional genomic BLUP (gBLUP) approach. Given that the MK-TGRM-BLUP approach leverages information on the nine fatty acids to predict genetic values for total lipid content in unobserved individuals, we compared the MK-TGRM-BLUP approach to a multi-trait gBLUP (MT-gBLUP) approach that jointly fits phenotypes for fatty acids and total lipid content. The MK-TGRM-BLUP approach significantly outperformed MT-gBLUP. Collectively, these results highlight the utility of using TGRM to accommodate information on endophenotypes and improve genomic prediction for a conventional agronomic trait.
Collapse
Affiliation(s)
- Malachy T. Campbell
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Haixiao Hu
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Trevor H. Yeats
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Lauren J. Brzozowski
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Melanie Caffe-Treml
- Seed Technology Lab 113, Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, United States
| | - Lucía Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, United States
| | - Kevin P. Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, United States
| | - Mark E. Sorrells
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Michael A. Gore
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jean-Luc Jannink
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
- R.W. Holley Center for Agriculture & Health, US Department of Agriculture, Agricultural Research Service, Ithaca, NY, United States
| |
Collapse
|
50
|
Campbell MT, Hu H, Yeats TH, Caffe-Treml M, Gutiérrez L, Smith KP, Sorrells ME, Gore MA, Jannink JL. Translating insights from the seed metabolome into improved prediction for lipid-composition traits in oat (Avena sativa L.). Genetics 2021; 217:iyaa043. [PMID: 33789350 PMCID: PMC8045723 DOI: 10.1093/genetics/iyaa043] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022] Open
Abstract
Oat (Avena sativa L.) seed is a rich resource of beneficial lipids, soluble fiber, protein, and antioxidants, and is considered a healthful food for humans. Little is known regarding the genetic controllers of variation for these compounds in oat seed. We characterized natural variation in the mature seed metabolome using untargeted metabolomics on 367 diverse lines and leveraged this information to improve prediction for seed quality traits. We used a latent factor approach to define unobserved variables that may drive covariance among metabolites. One hundred latent factors were identified, of which 21% were enriched for compounds associated with lipid metabolism. Through a combination of whole-genome regression and association mapping, we show that latent factors that generate covariance for many metabolites tend to have a complex genetic architecture. Nonetheless, we recovered significant associations for 23% of the latent factors. These associations were used to inform a multi-kernel genomic prediction model, which was used to predict seed lipid and protein traits in two independent studies. Predictions for 8 of the 12 traits were significantly improved compared to genomic best linear unbiased prediction when this prediction model was informed using associations from lipid-enriched factors. This study provides new insights into variation in the oat seed metabolome and provides genomic resources for breeders to improve selection for health-promoting seed quality traits. More broadly, we outline an approach to distill high-dimensional "omics" data to a set of biologically meaningful variables and translate inferences on these data into improved breeding decisions.
Collapse
Affiliation(s)
- Malachy T Campbell
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Haixiao Hu
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Trevor H Yeats
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Melanie Caffe-Treml
- Department of Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD 57007, USA
| | - Lucía Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Kevin P Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Mark E Sorrells
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Michael A Gore
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- R.W. Holley Center for Agriculture & Health US Department of Agriculture, Agricultural Research Service, Ithaca, NY 14853, USA
| |
Collapse
|