1
|
Abdi H, Alipour H, Bernousi I, Jafarzadeh J, Rodrigues PC. Identification of novel putative alleles related to important agronomic traits of wheat using robust strategies in GWAS. Sci Rep 2023; 13:9927. [PMID: 37336905 DOI: 10.1038/s41598-023-36134-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 05/30/2023] [Indexed: 06/21/2023] Open
Abstract
Principal component analysis (PCA) is widely used in various genetics studies. In this study, the role of classical PCA (cPCA) and robust PCA (rPCA) was evaluated explicitly in genome-wide association studies (GWAS). We evaluated 294 wheat genotypes under well-watered and rain-fed, focusing on spike traits. First, we showed that some phenotypic and genotypic observations could be outliers based on cPCA and different rPCA algorithms (Proj, Grid, Hubert, and Locantore). Hubert's method provided a better approach to identifying outliers, which helped to understand the nature of these samples. These outliers led to the deviation of the heritability of traits from the actual value. Then, we performed GWAS with 36,000 single nucleotide polymorphisms (SNPs) based on the traditional approach and two robust strategies. In the conventional approach and using the first three components of cPCA as population structure, 184 and 139 marker-trait associations (MTAs) were identified for five traits in well-watered and rain-fed environments, respectively. In the first robust strategy and when rPCA was used as population structure in GWAS, we observed that the Hubert and Grid methods identified new MTAs, especially for yield and spike weight on chromosomes 7A and 6B. In the second strategy, we followed the classical and robust principal component-based GWAS, where the first two PCs obtained from phenotypic variables were used instead of traits. In the recent strategy, despite the similarity between the methods, some new MTAs were identified that can be considered pleiotropic. Hubert's method provided a better linear combination of traits because it had the most MTAs in common with the traditional approach. Newly identified SNPs, including rs19833 (5B) and rs48316 (2B), were annotated with important genes with vital biological processes and molecular functions. The approaches presented in this study can reduce the misleading GWAS results caused by the adverse effect of outlier observations.
Collapse
Affiliation(s)
- Hossein Abdi
- Department of Plant Production and Genetics, Faculty of Agriculture, Urmia University, Urmia, Iran
| | - Hadi Alipour
- Department of Plant Production and Genetics, Faculty of Agriculture, Urmia University, Urmia, Iran
| | - Iraj Bernousi
- Department of Plant Production and Genetics, Faculty of Agriculture, Urmia University, Urmia, Iran.
| | - Jafar Jafarzadeh
- Dryland Agricultural Research Institute (DARI), Agriculture Research, Education and Extension Organization (AREEO), Maragheh, Iran
| | | |
Collapse
|
2
|
Ren W, Liang Z, He S, Xiao J. Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study. Genes (Basel) 2020; 11:genes11111286. [PMID: 33138126 PMCID: PMC7692801 DOI: 10.3390/genes11111286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 10/26/2020] [Accepted: 10/27/2020] [Indexed: 11/16/2022] Open
Abstract
In genome-wide association studies, linear mixed models (LMMs) have been widely used to explore the molecular mechanism of complex traits. However, typical association approaches suffer from several important drawbacks: estimation of variance components in LMMs with large scale individuals is computationally slow; single-locus model is unsatisfactory to handle complex confounding and causes loss of statistical power. To address these issues, we propose an efficient two-stage method based on hybrid of restricted and penalized maximum likelihood, named HRePML. Firstly, we performed restricted maximum likelihood (REML) on single-locus LMM to remove unrelated markers, where spectral decomposition on covariance matrix was used to fast estimate variance components. Secondly, we carried out penalized maximum likelihood (PML) on multi-locus LMM for markers with reasonably large effects. To validate the effectiveness of HRePML, we conducted a series of simulation studies and real data analyses. As a result, our method always had the highest average statistical power compared with multi-locus mixed-model (MLMM), fixed and random model circulating probability unification (FarmCPU), and genome-wide efficient mixed model association (GEMMA). More importantly, HRePML can provide higher accuracy estimation of marker effects. HRePML also identifies 41 previous reported genes associated with development traits in Arabidopsis, which is more than was detected by the other methods.
Collapse
Affiliation(s)
- Wenlong Ren
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
| | - Zhikai Liang
- Plant and Microbial Biology Department, University of Minnesota, Saint Paul, MN 55108, USA;
| | - Shu He
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
| | - Jing Xiao
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, Nantong 226019, China; (W.R.); (S.H.)
- Correspondence:
| |
Collapse
|
3
|
Mengist MF, Grace MH, Xiong J, Kay CD, Bassil N, Hummer K, Ferruzzi MG, Lila MA, Iorizzo M. Diversity in Metabolites and Fruit Quality Traits in Blueberry Enables Ploidy and Species Differentiation and Establishes a Strategy for Future Genetic Studies. FRONTIERS IN PLANT SCIENCE 2020; 11:370. [PMID: 32318085 PMCID: PMC7147330 DOI: 10.3389/fpls.2020.00370] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 03/16/2020] [Indexed: 05/30/2023]
Abstract
Blueberry is well recognized as a rich source of health promoting phytochemicals such as flavonoids and phenolic acids. Multiple studies in blueberry and other crops indicated that flavonoids and phenolic acids function as bioactive compounds in the human body promoting multiple health effects. Despite their importance, information is limited about the levels of variation in bioactive compounds within and between ploidy level and species, and their association with fruit quality traits. Such information is crucial to define a strategy to study the genetic mechanisms controlling these traits and to select for these traits in blueberry breeding programs. Here we evaluated 33 health related phytochemicals belonging to four major groups of flavonoids and phenolic acids across 128 blueberry accessions over two years together with fruit quality traits, including fruit weight, titratable acidity, total soluble acids and pH. Highly significant variation between accessions, years, and accession by year interaction were identified for most of the traits. Cluster analysis grouped phytochemicals by their functional structure (e.g., anthocyanins, flavanols, flavonols, and phenolic acids). Multivariate analysis of the traits resulted in separation of diploid, tetraploid and hexaploid accessions. Broad sense heritability of the traits estimated in 100 tetraploid accessions, ranged from 20 to 90%, with most traits revealing moderate to high broad sense heritability (H2 > 40%), suggesting that strong genetic factors control these traits. Fruit size can be estimated as a proxy of fruit weight or volume and vice versa, and it was negatively correlated with content of most of phytochemicals evaluated here. However, size-independent variation for anthocyanin content and profile (e.g., acylated vs. non-acylated anthocyanin) exists in the tetraploid accessions and can be explored to identify other factors such as genes related to the biosynthetic pathway that control this trait. This result also suggests that metabolite concentrations and fruit size, to a certain degree can be improved simultaneously in breeding programs. Overall, the results of this study provide a framework to uncover the genetic basis of bioactive compounds and fruit quality traits and will be useful to advance blueberry-breeding programs focusing on integrating these traits.
Collapse
Affiliation(s)
- Molla F Mengist
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
| | - Mary H Grace
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
| | - Jia Xiong
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
| | - Colin D Kay
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
| | - Nahla Bassil
- USDA-ARS-National Clonal Germplasm Repository, Corvallis, OR, United States
| | - Kim Hummer
- USDA-ARS-National Clonal Germplasm Repository, Corvallis, OR, United States
| | - Mario G Ferruzzi
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
| | - Mary Ann Lila
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
| | - Massimo Iorizzo
- Plants for Human Health Institute, North Carolina State University, NCRC, Kannapolis, NC, United States
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
4
|
Lourenço VM, Ogutu JO, Piepho HP. Robust estimation of heritability and predictive accuracy in plant breeding: evaluation using simulation and empirical data. BMC Genomics 2020; 21:43. [PMID: 31937245 PMCID: PMC6958597 DOI: 10.1186/s12864-019-6429-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 12/24/2019] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Genomic prediction (GP) is used in animal and plant breeding to help identify the best genotypes for selection. One of the most important measures of the effectiveness and reliability of GP in plant breeding is predictive accuracy. An accurate estimate of this measure is thus central to GP. Moreover, regression models are the models of choice for analyzing field trial data in plant breeding. However, models that use the classical likelihood typically perform poorly, often resulting in biased parameter estimates, when their underlying assumptions are violated. This typically happens when data are contaminated with outliers. These biases often translate into inaccurate estimates of heritability and predictive accuracy, compromising the performance of GP. Since phenotypic data are susceptible to contamination, improving the methods for estimating heritability and predictive accuracy can enhance the performance of GP. Robust statistical methods provide an intuitively appealing and a theoretically well justified framework for overcoming some of the drawbacks of classical regression, most notably the departure from the normality assumption. We compare the performance of robust and classical approaches to two recently published methods for estimating heritability and predictive accuracy of GP using simulation of several plausible scenarios of random and block data contamination with outliers and commercial maize and rye breeding datasets. RESULTS The robust approach generally performed as good as or better than the classical approach in phenotypic data analysis and in estimating the predictive accuracy of heritability and genomic prediction under both the random and block contamination scenarios. Notably, it consistently outperformed the classical approach under the random contamination scenario. Analyses of the empirical maize and rye datasets further reinforce the stability and reliability of the robust approach in the presence of outliers or missing data. CONCLUSIONS The proposed robust approach enhances the predictive accuracy of heritability and genomic prediction by minimizing the deleterious effects of outliers for a broad range of simulation scenarios and empirical breeding datasets. Accordingly, plant breeders should seriously consider regularly using the robust alongside the classical approach and increasing the number of replicates to three or more, to further enhance the accuracy of the robust approach.
Collapse
Affiliation(s)
- Vanda Milheiro Lourenço
- Department of Mathematics, Faculty of Sciences and Technology - NOVA University of Lisbon, Caparica, 2829-516 Portugal
- Centro de Matemática e Aplicações (CMA), Caparica, 2829-516 Portugal
| | - Joseph Ochieng Ogutu
- Institute of Crop Science, Biostatistics Unit, University of Hohenheim, Stuttgart, Fruwirthstrasse 23, 70599 Germany
| | - Hans-Peter Piepho
- Institute of Crop Science, Biostatistics Unit, University of Hohenheim, Stuttgart, Fruwirthstrasse 23, 70599 Germany
| |
Collapse
|
5
|
Schmidt P, Hartung J, Bennewitz J, Piepho HP. Heritability in Plant Breeding on a Genotype-Difference Basis. Genetics 2019; 212:991-1008. [PMID: 31248886 PMCID: PMC6707473 DOI: 10.1534/genetics.119.302134] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/17/2019] [Indexed: 11/18/2022] Open
Abstract
In plant breeding, heritability is often calculated (i) as a measure of precision of trials and/or (ii) to compute the response to selection. It is usually estimated on an entry-mean basis, since the phenotype is usually an aggregated value, as genotypes are replicated in trials, which stands in contrast with animal breeding and human genetics. When this was first proposed, assumptions such as balanced data and independent genotypic effects were made that are often violated in modern plant breeding trials/analyses. Due to this, multiple alternative methods have been proposed, aiming to generalize heritability on an entry-mean basis. In this study, we propose an extension of the concept for heritability on an entry-mean to an entry-difference basis, which allows for more detailed insight and is more meaningful in the context of selection in plant breeding, because the correlation among entry means can be accounted for. We show that under certain circumstances our method reduces to other popular generalized methods for heritability estimation on an entry-mean basis. The approach is exemplified via four examples that show different levels of complexity, where we compare six methods for heritability estimation on an entry-mean basis to our approach (example codes: https://github.com/PaulSchmidtGit/Heritability). Results suggest that heritability on an entry-difference basis is a well-suited alternative for obtaining an overall heritability estimate, and in addition provides one heritability per genotype as well as one per difference between genotypes.
Collapse
Affiliation(s)
- Paul Schmidt
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, 70599, Germany
| | - Jens Hartung
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, 70599, Germany
| | - Jörn Bennewitz
- Institute of Animal Science, University of Hohenheim, Stuttgart, 70599, Germany
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, 70599, Germany
| |
Collapse
|
6
|
Montesinos-López A, Montesinos-López OA, Villa-Diharce ER, Gianola D, Crossa J. A robust Bayesian genome-based median regression model. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1587-1606. [PMID: 30747261 DOI: 10.1007/s00122-019-03303-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 02/02/2019] [Indexed: 06/09/2023]
Abstract
Current genome-enabled prediction models assumed errors normally distributed, which are sensitive to outliers. We propose a model with errors assumed to follow a Laplace distribution to deal better with outliers. Current genome-enabled prediction models use regressions that fit the expected value (mean) of a response variable with errors assumed normally distributed, which are often sensitive to outliers, either genetic or environmental. For this reason, we propose a robust Bayesian genome median regression (BGMR) model that fits regressions to the medians of a distribution, with errors assumed to follow a Laplace distribution to deal better with outliers. The BGMR model was evaluated under a Bayesian framework with Markov Chain Monte Carlo sampling using a location-scale mixture representation of the Laplace distribution. The BGMR was implemented with two simulated and two real genomic data sets, and we compared its prediction performance with that of a conventional genomic best linear unbiased prediction (GBLUP) model and the Laplace maximum a posteriori (LMAP) method. The prediction accuracies of BGMR were higher than those of the GBLUP and LMAP methods when there were outliers. The BGMR model could be useful to breeders who need to predict and select genotypes based on data with unknown outliers.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, JAL, Mexico
| | | | - Enrique R Villa-Diharce
- Departamento de Estadística, Centro de Investigación en Matemáticas (CIMAT), 36240, Guanajuato, Mexico
| | - Daniel Gianola
- Departments of Animal Sciences, Dairy Science, and Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - José Crossa
- Biometrics and Statistics Unit and Global Wheat Program, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600, Mexico, DF, Mexico
| |
Collapse
|