1
|
Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
|
2
|
Development of a novel machine learning-based weighted modeling approach to incorporate Salmonella enterica heterogeneity on a genetic scale in a dose-response modeling framework. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2023; 43:440-450. [PMID: 35413139 DOI: 10.1111/risa.13924] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Estimating microbial dose-response is an important aspect of a food safety risk assessment. In recent years, there has been considerable interest to advance these models with potential incorporation of gene expression data. The aim of this study was to develop a novel machine learning model that considers the weights of expression of Salmonella genes that could be associated with illness, given exposure, in hosts. Here, an elastic net-based weighted Poisson regression method was proposed to identify Salmonella enterica genes that could be significantly associated with the illness response, irrespective of serovar. The best-fit elastic net model was obtained by 10-fold cross-validation. The best-fit elastic net model identified 33 gene expression-dose interaction terms that added to the predictability of the model. Of these, nine genes associated with Salmonella metabolism and virulence were found to be significant by the best-fit Poisson regression model (p < 0.05). This method could improve or redefine dose-response relationships for illness from relative proportions of significant genes from a microbial genetic dataset, which would help in refining endpoint and risk estimations.
Collapse
|
3
|
The influence of genetic structure on phenotypic diversity in the Australian mango (Mangifera indica) gene pool. Sci Rep 2022; 12:20614. [PMID: 36450793 PMCID: PMC9712640 DOI: 10.1038/s41598-022-24800-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/21/2022] [Indexed: 12/11/2022] Open
Abstract
Genomic selection is a promising breeding technique for tree crops to accelerate the development of new cultivars. However, factors such as genetic structure can create spurious associations between genotype and phenotype due to the shared history between populations with different trait values. Genetic structure can therefore reduce the accuracy of the genotype to phenotype map, a fundamental requirement of genomic selection models. Here, we employed 272 single nucleotide polymorphisms from 208 Mangifera indica accessions to explore whether the genetic structure of the Australian mango gene pool explained variation in trunk circumference, fruit blush colour and intensity. Multiple population genetic analyses indicate the presence of four genetic clusters and show that the most genetically differentiated cluster contains accessions imported from Southeast Asia (mainly those from Thailand). We find that genetic structure was strongly associated with three traits: trunk circumference, fruit blush colour and intensity in M. indica. This suggests that the history of these accessions could drive spurious associations between loci and key mango phenotypes in the Australian mango gene pool. Incorporating such genetic structure in associations between genotype and phenotype can improve the accuracy of genomic selection, which can assist the future development of new cultivars.
Collapse
|
4
|
Multi-environment genomic prediction for soluble solids content in peach ( Prunus persica). FRONTIERS IN PLANT SCIENCE 2022; 13:960449. [PMID: 36275520 PMCID: PMC9583944 DOI: 10.3389/fpls.2022.960449] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 08/01/2022] [Indexed: 06/16/2023]
Abstract
Genotype-by-environment interaction (G × E) is a common phenomenon influencing genetic improvement in plants, and a good understanding of this phenomenon is important for breeding and cultivar deployment strategies. However, there is little information on G × E in horticultural tree crops, mostly due to evaluation costs, leading to a focus on the development and deployment of locally adapted germplasm. Using sweetness (measured as soluble solids content, SSC) in peach/nectarine assessed at four trials from three US peach-breeding programs as a case study, we evaluated the hypotheses that (i) complex data from multiple breeding programs can be connected using GBLUP models to improve the knowledge of G × E for breeding and deployment and (ii) accounting for a known large-effect quantitative trait locus (QTL) improves the prediction accuracy. Following a structured strategy using univariate and multivariate models containing additive and dominance genomic effects on SSC, a model that included a previously detected QTL and background genomic effects was a significantly better fit than a genome-wide model with completely anonymous markers. Estimates of an individual's narrow-sense and broad-sense heritability for SSC were high (0.57-0.73 and 0.66-0.80, respectively), with 19-32% of total genomic variance explained by the QTL. Genome-wide dominance effects and QTL effects were stable across environments. Significant G × E was detected for background genome effects, mostly due to the low correlation of these effects across seasons within a particular trial. The expected prediction accuracy, estimated from the linear model, was higher than the realised prediction accuracy estimated by cross-validation, suggesting that these two parameters measure different qualities of the prediction models. While prediction accuracy was improved in some cases by combining data across trials, particularly when phenotypic data for untested individuals were available from other trials, this improvement was not consistent. This study confirms that complex data can be combined into a single analysis using GBLUP methods to improve understanding of G × E and also incorporate known QTL effects. In addition, the study generated baseline information to account for population structure in genomic prediction models in horticultural crop improvement.
Collapse
|
5
|
Use of Milk Infrared Spectral Data as Environmental Covariates in Genomic Prediction Models for Production Traits in Canadian Holstein. Animals (Basel) 2022; 12:1189. [PMID: 35565615 PMCID: PMC9099576 DOI: 10.3390/ani12091189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/03/2022] [Accepted: 05/04/2022] [Indexed: 12/04/2022] Open
Abstract
The purpose of this study was to provide a procedure for the inclusion of milk spectral information into genomic prediction models. Spectral data were considered a set of covariates, in addition to genomic covariates. Milk yield and somatic cell score were used as traits to investigate. A cross-validation was employed, making a distinction for predicting new individuals' performance under known environments, known individuals' performance under new environments, and new individuals' performance under new environments. We found an advantage of including spectral data as environmental covariates when the genomic predictions had to be extrapolated to new environments. This was valid for both observed and, even more, unobserved families (genotypes). Overall, prediction accuracy was larger for milk yield than somatic cell score. Fourier-transformed infrared spectral data can be used as a source of information for the calculation of the 'environmental coordinates' of a given farm in a given time, extrapolating predictions to new environments. This procedure could serve as an example of integration of genomic and phenomic data. This could help using spectral data for traits that present poor predictability at the phenotypic level, such as disease incidence and behavior traits. The strength of the model is the ability to couple genomic with high-throughput phenomic information.
Collapse
|
6
|
A genome-wide association study of reproduction traits in four pig populations with different genetic backgrounds. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2019; 33:1400-1410. [PMID: 32054232 PMCID: PMC7468174 DOI: 10.5713/ajas.19.0411] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Accepted: 09/03/2019] [Indexed: 01/04/2023]
Abstract
Objective Genome-wide association study and two meta-analysis based on GWAS performed to explore the genetic mechanism underlying variation in pig number born alive (NBA) and total number born (TNB). Methods Single trait GWAS and two meta-analysis (single-trait meta analysis and multi-trait meta analysis) were used in our study for NBA and TNB on 3,121 Yorkshires from 4 populations, including three different American Yorkshire populations (n = 2,247) and one British Yorkshire populations (n = 874). Results The result of single trait GWAS showed that no significant associated single nucleotide polymorphisms (SNPs) were identified. Using single-trait meta analysis and multi-trait meta analysis within populations, 11 significant loci were identified associated with target traits. Spindlin 1, vascular endothelial growth factor A, forkhead box Q1, msh homeobox 1, and LHFPL tetraspan submily member 3 are five functionally plausible candidate genes for NBA and TNB. Compared to the single population GWAS, single-trait Meta analysis can improve the detection power to identify SNPs by integrating information of multiple populations. The multiple-trait analysis reduced the power to detect trait-specific loci but enhanced the power to identify the common loci across traits. Conclusion In total, our findings identified novel genes to be validated as candidates for NBA and TNB in pigs. Also, it enabled us to enlarge population size by including multiple populations with different genetic backgrounds and increase the power of GWAS by using meta analysis.
Collapse
|
7
|
WhoGEM: an admixture-based prediction machine accurately predicts quantitative functional traits in plants. Genome Biol 2019; 20:106. [PMID: 31138283 PMCID: PMC6537182 DOI: 10.1186/s13059-019-1697-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/23/2019] [Indexed: 12/13/2022] Open
Abstract
The explosive growth of genomic data provides an opportunity to make increased use of sequence variations for phenotype prediction. We have developed a prediction machine for quantitative phenotypes (WhoGEM) that overcomes some of the bottlenecks limiting the current methods. We demonstrated its performance by predicting quantitative disease resistance and quantitative functional traits in the wild model plant species, Medicago truncatula, using geographical locations as covariates for admixture analysis. The method's prediction reliability equals or outperforms all existing algorithms for quantitative phenotype prediction. WhoGEM analysis produces evidence that variation in genome admixture proportions explains most of the phenotypic variation for quantitative phenotypes.
Collapse
|
8
|
Deflated preconditioned conjugate gradient method for solving single-step BLUP models efficiently. Genet Sel Evol 2018; 50:51. [PMID: 30390656 PMCID: PMC6215606 DOI: 10.1186/s12711-018-0429-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 10/25/2018] [Indexed: 11/30/2022] Open
Abstract
Background The single-step single nucleotide polymorphism best linear unbiased prediction (ssSNPBLUP) method, such as single-step genomic BLUP (ssGBLUP), simultaneously analyses phenotypic, pedigree, and genomic information of genotyped and non-genotyped animals. In contrast to ssGBLUP, SNP effects are fitted explicitly as random effects in the ssSNPBLUP model. Similarly, principal components associated with the genomic information can be fitted explicitly as random effects in a single-step principal component BLUP (ssPCBLUP) model to remove noise in genomic information. Single-step genomic BLUP is solved efficiently by using the preconditioned conjugate gradient (PCG) method. Unfortunately, convergence issues have been reported when solving ssSNPBLUP by using PCG. Poor convergence may be linked with poor spectral condition numbers of the preconditioned coefficient matrices of ssSNPBLUP. These condition numbers, and thus convergence, could be improved through the deflated PCG (DPCG) method, which is a two-level PCG method for ill-conditioned linear systems. Therefore, the first aim of this study was to compare the properties of the preconditioned coefficient matrices of ssGBLUP and ssSNPBLUP, and to document convergence patterns that are obtained with the PCG method. The second aim was to implement and test the efficiency of a DPCG method for solving ssSNPBLUP and ssPCBLUP. Results For two dairy cattle datasets, the smallest eigenvalues obtained for ssSNPBLUP (ssPCBLUP) and ssGBLUP, both solved with the PCG method, were similar. However, the largest eigenvalues obtained for ssSNPBLUP and ssPCBLUP were larger than those for ssGBLUP, which resulted in larger condition numbers and in slow convergence for both systems solved by the PCG method. Different implementations of the DPCG method led to smaller condition numbers, and faster convergence for ssSNPBLUP and for ssPCBLUP, by deflating the largest unfavourable eigenvalues. Conclusions Poor convergence of ssSNPBLUP and ssPCBLUP when solved by the PCG method are related to larger eigenvalues and larger condition numbers in comparison to ssGBLUP. These convergence issues were solved with a DPCG method that annihilates the effect of the largest unfavourable eigenvalues of the preconditioned coefficient matrix of ssSNPBLUP and of ssPCBLUP on the convergence of the PCG method. It resulted in a convergence pattern, at least, similar to that of ssGBLUP. Electronic supplementary material The online version of this article (10.1186/s12711-018-0429-3) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Incorporating Prior Knowledge of Principal Components in Genomic Prediction. Front Genet 2018; 9:289. [PMID: 30116258 PMCID: PMC6082966 DOI: 10.3389/fgene.2018.00289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Accepted: 07/11/2018] [Indexed: 12/05/2022] Open
Abstract
Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is commonly used to reduce the dimension of predictor variables by generating orthogonal variables. Usually, the knowledge from PCA is incorporated in genomic prediction, assuming equal variance for the PCs or a variance proportional to the eigenvalues, both treat variances as fixed. Here, three prior distributions including normal, scaled-t and double exponential were assumed for PC effects in a Bayesian framework with a subset of PCs. These developed PCR models (dPCRm) were compared to routine genomic prediction models (RGPM) i.e., ridge and Bayesian ridge regression, BayesA, BayesB, and PC regression with a subset of PCs but PC variances predefined as proportional to the eigenvalues (PCR-Eigen). The performance of methods was compared by simulating a single trait with heritability of 0.25 on a genome consisted of 3,000 SNPs on three chromosomes and QTL numbers of 15, 60, and 105. After 500 generations of random mating as the historical population, a population was isolated and mated for another 15 generations. The generations 8 and 9 of recent population were used as the reference population and the next six generations as validation populations. The accuracy and bias of predictions were evaluated within the reference population, and each of validation populations. The accuracies of dPCRm were similar to RGPM (0.536 to 0.664 vs. 0.542 to 0.671), and higher than the accuracies of PCR-Eigen (0.504 to 0.641) within reference population over different QTL numbers. Decline in accuracies in validation populations were from 0.633 to 0.310, 0.639 to 0.313, and 0.617 to 0.298 using dPCRm, RGPM and PCR-Eigen, respectively. Prediction biases of dPCRm and RGPM were similar and always much less than biases of PCR-Eigen. In conclusion assuming PC variances as random variables via prior specification yielded higher accuracy than PCR-Eigen and same accuracy as RGPM, while fewer predictors were used.
Collapse
|
10
|
Genomic prediction accuracy for switchgrass traits related to bioenergy within differentiated populations. BMC PLANT BIOLOGY 2018; 18:142. [PMID: 29986667 PMCID: PMC6038187 DOI: 10.1186/s12870-018-1360-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 07/02/2018] [Indexed: 05/16/2023]
Abstract
BACKGROUND Switchgrass breeders need to improve the rates of genetic gain in many bioenergy-related traits in order to create improved cultivars that are higher yielding and have optimal biomass composition. One way to achieve this is through genomic selection. However, the heritability of traits needs to be determined as well as the accuracy of prediction in order to determine if efficient selection is possible. RESULTS Using five distinct switchgrass populations comprised of three lowland, one upland and one hybrid accession, the accuracy of genomic predictions under different cross-validation strategies and prediction methods was investigated. Individual genotypes were collected using GBS while kin-BLUP, partial least squares, sparse partial least squares, and BayesB methods were employed to predict yield, morphological, and NIRS-based compositional data collected in 2012-2013 from a replicated Nebraska field trial. Population structure was assessed by F statistics which ranged from 0.3952 between lowland and upland accessions to 0.0131 among the lowland accessions. Prediction accuracy ranged from 0.57-0.52 for cell wall soluble glucose and fructose respectively, to insignificant for traits with low repeatability. Ratios of heritability across to within-population ranged from 15 to 0.6. CONCLUSIONS Accuracy was significantly affected by both cross-validation strategy and trait. Accounting for population structure with a cross-validation strategy constrained by accession resulted in accuracies that were 69% lower than apparent accuracies using unconstrained cross-validation. Less accurate genomic selection is anticipated when most of the phenotypic variation exists between populations such as with spring regreening and yield phenotypes.
Collapse
|
11
|
Prediction of Complex Traits: Robust Alternatives to Best Linear Unbiased Prediction. Front Genet 2018; 9:195. [PMID: 29951082 PMCID: PMC6008589 DOI: 10.3389/fgene.2018.00195] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 05/14/2018] [Indexed: 12/05/2022] Open
Abstract
A widely used method for prediction of complex traits in animal and plant breeding is “genomic best linear unbiased prediction” (GBLUP). In a quantitative genetics setting, BLUP is a linear regression of phenotypes on a pedigree or on a genomic relationship matrix, depending on the type of input information available. Normality of the distributions of random effects and of model residuals is not required for BLUP but a Gaussian assumption is made implicitly. A potential downside is that Gaussian linear regressions are sensitive to outliers, genetic or environmental in origin. We present simple (relative to a fully Bayesian analysis) to implement robust alternatives to BLUP using a linear model with residual t or Laplace distributions instead of a Gaussian one, and evaluate the methods with milk yield records on Italian Brown Swiss cattle, grain yield data in inbred wheat lines, and using three traits measured on accessions of Arabidopsis thaliana. The methods do not use Markov chain Monte Carlo sampling and model hyper-parameters, viewed here as regularization “knobs,” are tuned via some cross-validation. Uncertainty of predictions are evaluated by employing bootstrapping or by random reconstruction of training and testing sets. It was found (e.g., test-day milk yield in cows, flowering time and FRIGIDA expression in Arabidopsis) that the best predictions were often those obtained with the robust methods. The results obtained are encouraging and stimulate further investigation and generalization.
Collapse
|
12
|
A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity (Edinb) 2017; 120:356-368. [PMID: 29238077 PMCID: PMC5842222 DOI: 10.1038/s41437-017-0023-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 10/13/2017] [Accepted: 10/23/2017] [Indexed: 12/15/2022] Open
Abstract
Single nucleotide polymorphism (SNP)-heritability estimation is an important topic in several research fields, including animal, plant and human genetics, as well as in ecology. Linear mixed model estimation of SNP-heritability uses the structures of genomic relationships between individuals, which is constructed from genome-wide sets of SNP-markers that are generally weighted equally in their contributions. Proposed methods to handle dependence between SNPs include, “thinning” the marker set by linkage disequilibrium (LD)-pruning, the use of haplotype-tagging of SNPs, and LD-weighting of the SNP-contributions. For improved estimation, we propose a new conceptual framework for genomic relationship matrix, in which Mahalanobis distance-based LD-correction is used in a linear mixed model estimation of SNP-heritability. The superiority of the presented method is illustrated and compared to mixed-model analyses using a VanRaden genomic relationship matrix, a matrix used by GCTA and a matrix employing LD-weighting (as implemented in the LDAK software) in simulated (using real human, rice and cattle genotypes) and real (maize, rice and mice) datasets. Despite of the computational difficulties, our results suggest that by using the proposed method one can improve the accuracy of SNP-heritability estimates in datasets with high LD.
Collapse
|
13
|
Genomic variance estimates: With or without disequilibrium covariances? J Anim Breed Genet 2017; 134:232-241. [PMID: 28508483 DOI: 10.1111/jbg.12268] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 02/13/2017] [Indexed: 01/31/2023]
Abstract
Whole-genome regression methods are often used for estimating genomic heritability: the proportion of phenotypic variance that can be explained by regression on marker genotypes. Recently, there has been an intensive debate on whether and how to account for the contribution of linkage disequilibrium (LD) to genomic variance. Here, we investigate two different methods for genomic variance estimation that differ in their ability to account for LD. By analysing flowering time in a data set on 1,057 fully sequenced Arabidopsis lines with strong evidence for diversifying selection, we observed a large contribution of covariances between quantitative trait loci (QTL) to the genomic variance. The classical estimate of genomic variance that ignores covariances underestimated the genomic variance in the data. The second method accounts for LD explicitly and leads to genomic variance estimates that when added to error variance estimates match the sample variance of phenotypes. This method also allows estimating the covariance between sets of markers when partitioning the genome into subunits. Large covariance estimates between the five Arabidopsis chromosomes indicated that the population structure in the data led to strong LD also between physically unlinked QTL. By consecutively removing population structure from the phenotypic variance using principal component analysis, we show how population structure affects the magnitude of LD contribution and the genomic variance estimates obtained with the two methods.
Collapse
|
14
|
Genome-wide association study in an F2 Duroc x Pietrain resource population for economically important meat quality and carcass traits. J Anim Sci 2017; 95:545-558. [PMID: 28380601 DOI: 10.2527/jas.2016.1003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Meat quality is essential for consumer acceptance, it ultimately impacts pork production profitability and it is subject to genetic control. The objective of this study was to map genomic regions associated with economically important meat quality and carcass traits. We performed a genome-wide association (GWA) analysis to map regions associated with 38 meat quality and carcass traits recorded for 948 F2 pigs from the Michigan State University Duroc × Pietrain resource population. The F0, F1, and 336 F2 pigs were genotyped with the Illumina Porcine SNP60 BeadChip, while the remaining F2 pigs were genotyped with the GeneSeek Genomic Profiler for Porcine Low Desnisty (LD) chip, and imputed with high accuracy ( = 0.97). Altogether the genomic dataset comprised 1,019 animals and 44,911 SNP. A Gaussian linear mixed model was fitted to estimate the breeding values and the variance components. A linear transformation was performed to estimate the marker effects and variances. Type I error rate was controlled at a False Discovery Rate of 5%. Seven putative QTL found in this study were previously reported in other studies. Two novel QTL associated with tenderness (TEN) were located on SSC3 [135.6:137.5Mb; False Discovery rate (FDR) < 0.03] and SSC5 (67.3:69.1Mb; FDR < 0.02). The QTL region identified on SSC15 includes Protein Kinase AMP-activated ɣ 3-subunit gene (), which has been associated with 24-h pH (pH24), drip loss (DL) and cook yield (CY). Also, novel candidate genes were identified for TEN in the region on SSC5 [A Kinase (PRKA) Anchor Protein 3 (], and for tenth rib backfat thickness (BF10) [Carnitine O-Acetyltransferase ()] on SSC1. The association of gene polymorphisms with pork quality traits has been reported for several pig populations. However, there are no SNP for this gene on the chip used, thus we genotyped the animals for 2 non-synonymous variants ( and ). We then performed a GWA conditioning on the genotype of both SNP and was associated with pH24, DL, protein content (PRO) and CY ( < 0.004) and T30N with Juiciness, TEN, shear force, pH24, PRO, and CY < 0.04). Finally, we performed a GWA conditioning on the genotype of the SNP peak detected in this study, and T30N remained associated only with PRO ( < 0.02). Therefore, in this study we identified 2 novel QTL regions, suggest 2 novel candidate genes, and conclude that other SNP in PRKAG3 or nearby gene(s) explain the observed associations on SSC15 in this population.
Collapse
|
15
|
Genomic prediction ability for yield-related traits in German winter barley elite material. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2017; 130:1669-1683. [PMID: 28534096 DOI: 10.1007/s00122-017-2917-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2016] [Accepted: 05/04/2017] [Indexed: 05/25/2023]
Abstract
Genomic prediction was evaluated in German winter barley breeding lines. In this material, prediction ability is strongly influenced by population structure and main determinant of prediction ability is the close genetic relatedness of the breeding material. To ensure breeding progress under changing environmental conditions the implementation and evaluation of new breeding methods is of crucial importance. Modern breeding approaches like genomic selection may significantly accelerate breeding progress. We assessed the potential of genomic prediction in a training population of 750 genotypes, consisting of multiple six-rowed winter barley (Hordeum vulgare L.) elite material families and old cultivars, which reflect the breeding history of barley in Germany. Crosses of parents selected from the training set were used to create a set of double-haploid families consisting of 750 genotypes. Those were used to confirm prediction ability estimates based on a cross-validation with the training set material using 11 different genomic prediction models. Population structure was inferred with dimensionality reduction methods like discriminant analysis of principle components and the influence of population structure on prediction ability was investigated. In addition to the size of the training set, marker density is of crucial importance for genomic prediction. We used genome-wide linkage disequilibrium and persistence of linkage phase as indicators to estimate that 11,203 evenly spaced markers are required to capture all QTL effects. Although a 9k SNP array does not contain a sufficient number of polymorphic markers for long-term genomic selection, we obtained fairly high prediction accuracies ranging from 0.31 to 0.71 for the traits earing, hectoliter weight, spikes per square meter, thousand kernel weight and yield and show that they result from the close genetic relatedness of the material. Our work contributes to designing long-term genetic prediction programs for barley breeding.
Collapse
|
16
|
GBS-Based Genomic Selection for Pea Grain Yield under Severe Terminal Drought. THE PLANT GENOME 2017; 10. [PMID: 28724076 DOI: 10.3835/plantgenome2016.07.0072] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 01/23/2017] [Indexed: 05/18/2023]
Abstract
Terminal drought is the main stress that limits pea ( L.) grain yield in Mediterranean-climate regions. This study provides an unprecedented assessment of the predictive ability of genomic selection (GS) for grain yield under severe terminal drought using genotyping-by-sequencing (GBS) data. Additional aims were to assess the GS predictive ability for different GBS data quality filters and GS models, comparing intrapopulation with interpopulation GS predictive ability and to perform genome-wide association (GWAS) studies. The yield and onset of flowering of 315 lines from three recombinant inbred line (RIL) populations issued by connected crosses between three elite cultivars were assessed under a field rainout shelter. We defined an adjusted yield, which is associated with intrinsic drought tolerance, as the yield deviation from the value expected as a function of onset of flowering (which correlated negatively with grain yield). Total polymorphic markers ranged from approximately 100 (minimum of eight reads per locus, maximum 10% genotype missing data) to over 7500 markers (minimum of four reads, maximum 50% missing rate). Best predictions were provided by Bayesian Lasso (BL) or ridge regression best linear unbiased prediction (rrBLUP), rather than support vector regression (SVR) models, with at least 400-500 markers. Intrapopulation GS predictive ability exceeded 0.5 for yield and onset of flowering in all populations and approached 0.4 for the adjusted yield of a population with high trait variation. Genomic selection was preferable to phenotypic selection in terms of predicted yield gains. Interpopulation GS predictive ability varied largely depending on the pair of populations. GWAS revealed extensive colocalization of markers associated with high yield and early flowering and suggested that they are concentrated in a few genomic regions.
Collapse
|
17
|
Genome-enabled predictions for fruit weight and quality from repeated records in European peach progenies. BMC Genomics 2017; 18:432. [PMID: 28583089 PMCID: PMC5460546 DOI: 10.1186/s12864-017-3781-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 05/10/2017] [Indexed: 11/16/2022] Open
Abstract
Background Highly polygenic traits such as fruit weight, sugar content and acidity strongly influence the agroeconomic value of peach varieties. Genomic Selection (GS) can accelerate peach yield and quality gain if predictions show higher levels of accuracy compared to phenotypic selection. The available IPSC 9K SNP array V1 allows standardized and highly reliable genotyping, preparing the ground for GS in peach. Results A repeatability model (multiple records per individual plant) for genome-enabled predictions in eleven European peach populations is presented. The analysis included 1147 individuals derived from both commercial and non-commercial peach or peach-related accessions. Considered traits were average fruit weight (FW), sugar content (SC) and titratable acidity (TA). Plants were genotyped with the 9K IPSC array, grown in three countries (France, Italy, Spain) and phenotyped for 3–5 years. An analysis of imputation accuracy of missing genotypic data was conducted using the software Beagle, showing that two of the eleven populations were highly sensitive to increasing levels of missing data. The regression model produced, for each trait and each population, estimates of heritability (FW:0.35, SC:0.48, TA:0.53, on average) and repeatability (FW:0.56, SC:0.63, TA:0.62, on average). Predictive ability was estimated in a five-fold cross validation scheme within population as the correlation of true and predicted phenotypes. Results differed by populations and traits, but predictive abilities were in general high (FW:0.60, SC:0.72, TA:0.65, on average). Conclusions This study assessed the feasibility of Genomic Selection in peach for highly polygenic traits linked to yield and fruit quality. The accuracy of imputing missing genotypes was as high as 96%, and the genomic predictive ability was on average 0.65, but could be as high as 0.84 for fruit weight or 0.83 for titratable acidity. The estimated repeatability may prove very useful in the management of the typical long cycles involved in peach productions. All together, these results are very promising for the application of genomic selection to peach breeding programmes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3781-8) contains supplementary material, which is available to authorized users.
Collapse
|
18
|
Genotype by environment (climate) interaction improves genomic prediction for production traits in US Holstein cattle. J Dairy Sci 2017; 100:2042-2056. [PMID: 28109596 DOI: 10.3168/jds.2016-11543] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 11/04/2016] [Indexed: 01/27/2023]
Abstract
Genotype by environment interaction (G × E) in dairy cattle productive traits has been shown to exist, but current genetic evaluation methods do not take this component into account. As several environmental descriptors (e.g., climate, farming system) are known to vary within the United States, not accounting for the G × E could lead to reranking of bulls and loss in genetic gain. Using test-day records on milk yield, somatic cell score, fat, and protein percentage from all over the United States, we computed within herd-year-season daughter yield deviations for 1,087 Holstein bulls and regressed them on genetic and environmental information to estimate variance components and to assess prediction accuracy. Genomic information was obtained from a 50k SNP marker panel. Environmental effect inputs included herd (160 levels), geographical region (7 levels), geographical location (2 variables), climate information (7 variables), and management conditions of the herds (16 total variables divided in 4 subgroups). For each set of environmental descriptors, environmental, genomic, and G × E components were sequentially fitted. Variance components estimates confirmed the presence of G × E on milk yield, with its effect being larger than main genetic effect and the environmental effect for some models. Conversely, G × E was moderate for somatic cell score and small for milk composition. Genotype by environment interaction, when included, partially eroded the genomic effect (as compared with the models where G × E was not included), suggesting that the genomic variance could at least in part be attributed to G × E not appropriately accounted for. Model predictive ability was assessed using 3 cross-validation schemes (new bulls, incomplete progeny test, and new environmental conditions), and performance was compared with a reference model including only the main genomic effect. In each scenario, at least 1 of the models including G × E was able to perform better than the reference model, although it was not possible to find the overall best-performing model that included the same set of environmental descriptors. In general, the methodology used is promising in accounting for G × E in genomic predictions, but challenges exist in identifying a unique set of covariates capable of describing the entire variety of environments.
Collapse
|
19
|
Genes, behavior, and behavior genetics. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2016; 8. [PMID: 27906529 DOI: 10.1002/wcs.1405] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 06/16/2016] [Accepted: 06/20/2016] [Indexed: 12/27/2022]
Abstract
According to the 'first law' of behavior genetics, 'All human behavioral traits are heritable.' Accepting the validity of this first law and employing statistical methods, researchers within psychology, sociology, political science, economics, and business claim to have demonstrated that all the behaviors studied by their disciplines are heritable-no matter how culturally specific these behaviors appear to be. Further, in many cases they claim to have identified specific genes that play a role in those behaviors. The validity of behavior genetics as a discipline depends upon the validity of the research methods used to justify such claims. It also depends, foundationally, upon certain key assumptions concerning the relationship between genotype (one's specific DNA sequences) and phenotype (any and all observable traits or characteristics). In this article, I examine-and find serious flaws with-both the methodologies of behavior genetics and the underlying assumptions concerning the genotype-phenotype relationship. WIREs Cogn Sci 2017, 8:e1405. doi: 10.1002/wcs.1405 For further resources related to this article, please visit the WIREs website.
Collapse
|
20
|
Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space. G3 (BETHESDA, MD.) 2016. [PMID: 27672112 DOI: 10.1534/g3.116.035410/-/dc1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 04/26/2023]
Abstract
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel.
Collapse
|
21
|
Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space. G3-GENES GENOMES GENETICS 2016; 6:3733-3747. [PMID: 27672112 PMCID: PMC5100872 DOI: 10.1534/g3.116.035410] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel.
Collapse
|
22
|
An efficient exact method to obtain GBLUP and single-step GBLUP when the genomic relationship matrix is singular. Genet Sel Evol 2016; 48:80. [PMID: 27788669 PMCID: PMC5082134 DOI: 10.1186/s12711-016-0260-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 10/20/2016] [Indexed: 01/08/2023] Open
Abstract
Background The mixed linear model employed for genomic best linear unbiased prediction (GBLUP) includes the breeding value for each animal as a random effect that has a mean of zero and a covariance matrix proportional to the genomic relationship matrix (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg), where the inverse of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg is required to set up the usual mixed model equations (MME). When only some animals have genomic information, genomic predictions can be obtained by an extension known as single-step GBLUP, where the covariance matrix of breeding values is constructed by combining the pedigree-based additive relationship matrix with \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg. The inverse of the combined relationship matrix can be obtained efficiently, provided \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg can be inverted. In some livestock species, however, the number \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N_{g}$$\end{document}Ng of animals with genomic information exceeds the number of marker covariates used to compute \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg, and this results in a singular \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg. For such a case, an efficient and exact method to obtain GBLUP and single-step GBLUP is presented here. Results Exact methods are already available to obtain GBLUP when \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg is singular, but these require working with large dense matrices. Another approach is to modify \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg to make it nonsingular by adding a small value to all its diagonals or regressing it towards the pedigree-based relationship matrix. This, however, results in the inverse of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg being dense and difficult to compute as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$N_{g}$$\end{document}Ng grows. The approach presented here recognizes that the number r of linearly independent genomic breeding values cannot exceed the number of marker covariates, and the mixed linear model used here for genomic prediction only fits these r linearly independent breeding values as random effects. Conclusions The exact method presented here was compared to Apy-GBLUP and to Apy single-step GBLUP, both of which are approximate methods that use a modified \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\mathbf {G}}_{gg}$$\end{document}Ggg that has a sparse inverse which can be computed efficiently. In a small numerical example, predictions from the exact approach and Apy were almost identical, but the MME from Apy had a condition number about 1000 times larger than that from the exact approach, indicating ill-conditioning of the MME from Apy. The practical application of exact SSGBLUP is not more difficult than implementation of Apy. Electronic supplementary material The online version of this article (doi:10.1186/s12711-016-0260-7) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
Genome-Wide Association Studies with a Genomic Relationship Matrix: A Case Study with Wheat and Arabidopsis. G3-GENES GENOMES GENETICS 2016; 6:3241-3256. [PMID: 27520956 PMCID: PMC5068945 DOI: 10.1534/g3.116.034256] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Standard genome-wide association studies (GWAS) scan for relationships between each of p molecular markers and a continuously distributed target trait. Typically, a marker-based matrix of genomic similarities among individuals (G) is constructed, to account more properly for the covariance structure in the linear regression model used. We show that the generalized least-squares estimator of the regression of phenotype on one or on m markers is invariant with respect to whether or not the marker(s) tested is(are) used for building G, provided variance components are unaffected by exclusion of such marker(s) from G. The result is arrived at by using a matrix expression such that one can find many inverses of genomic relationship, or of phenotypic covariance matrices, stemming from removing markers tested as fixed, but carrying out a single inversion. When eigenvectors of the genomic relationship matrix are used as regressors with fixed regression coefficients, e.g., to account for population stratification, their removal from G does matter. Removal of eigenvectors from G can have a noticeable effect on estimates of genomic and residual variances, so caution is needed. Concepts were illustrated using genomic data on 599 wheat inbred lines, with grain yield as target trait, and on close to 200 Arabidopsis thaliana accessions.
Collapse
|
24
|
The 'heritability' of domestication and its functional partitioning in the pig. Heredity (Edinb) 2016; 118:160-168. [PMID: 27649617 DOI: 10.1038/hdy.2016.78] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 07/04/2016] [Accepted: 07/04/2016] [Indexed: 11/08/2022] Open
Abstract
We propose to estimate the proportion of variance explained by regression on genome-wide markers (or genomic heritability) when wild/domestic status is considered the phenotype of interest. This approach differs from the standard Fst in that it can accommodate genetic similarity between individuals in a general form. We apply this strategy to complete genome data from 47 wild and domestic pigs from Asia and Europe. When we partitioned the total genomic variance into components associated to subsets of single nucleotide polymorphisms (SNPs) defined in terms of their annotation, we found that potentially deleterious non-synonymous mutations (9566 SNPs) explained as much genetic variance as the whole set of 25 million SNPs. This suggests that domestication may have affected protein sequence to a larger extent than regulatory or other kinds of mutations. A pathway-guided analysis revealed ovarian steroidogenesis and leptin signaling as highly relevant in domestication. The genomic regression approach proposed in this study revealed molecular processes not apparent through typical differentiation statistics. We propose that at least some of these processes are likely new discoveries because domestication is a dynamic process of genetic selection, which may not be completely characterized by a static metric like Fst. Nevertheless, and despite some particularly influential mutation types or pathways, our analyses tend to rule out a simplistic genetic basis for the domestication process: neither a single pathway nor a unique set of SNPs can explain the process as a whole.
Collapse
|
25
|
Abstract
Pork quality plays an important role in the meat processing industry. Thus, different methodologies have been implemented to elucidate the genetic architecture of traits affecting meat quality. One of the most common and widely used approaches is to perform genome-wide association (GWA) studies. However, a limitation of many GWA in animal breeding is the limited power due to small sample sizes in animal populations. One alternative is to implement a meta-analysis of GWA (MA-GWA) combining results from independent association studies. The objective of this study was to identify significant genomic regions associated with meat quality traits by performing MA-GWA for 8 different traits in 3 independent pig populations. Results from MA-GWA were used to search for genes possibly associated with the set of evaluated traits. Data from 3 pig data sets (U.S. Meat Animal Research Center, commercial, and Michigan State University Pig Resource Population) were used. A MA was implemented by combining -scores derived for each SNP in every population and then weighting them using the inverse of estimated variance of SNP effects. A search for annotated genes retrieved genes previously reported as candidates for shear force (calpain-1 catalytic subunit [] and calpastatin []), as well as for ultimate pH, purge loss, and cook loss (protein kinase, AMP-activated, γ 3 noncatalytic subunit []). In addition, novel candidate genes were identified for intramuscular fat and cook loss (acyl-CoA synthetase family member 3 mitochondrial []) and for the objective measure of muscle redness, CIE a* (glycogen synthase 1, muscle [] and ferritin, light polypeptide []). Thus, implementation of MA-GWA allowed integration of results for economically relevant traits and identified novel genes to be tested as candidates for meat quality traits in pig populations.
Collapse
|
26
|
Genomic Prediction of Gene Bank Wheat Landraces. G3 (BETHESDA, MD.) 2016; 6:1819-34. [PMID: 27172218 PMCID: PMC4938637 DOI: 10.1534/g3.116.029637] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 04/15/2016] [Indexed: 12/30/2022]
Abstract
This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials.
Collapse
|
27
|
Covariance Association Test (CVAT) Identifies Genetic Markers Associated with Schizophrenia in Functionally Associated Biological Processes. Genetics 2016; 203:1901-13. [PMID: 27317683 DOI: 10.1534/genetics.116.189498] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Accepted: 06/09/2016] [Indexed: 12/12/2022] Open
Abstract
Schizophrenia is a psychiatric disorder with large personal and social costs, and understanding the genetic etiology is important. Such knowledge can be obtained by testing the association between a disease phenotype and individual genetic markers; however, such single-marker methods have limited power to detect genetic markers with small effects. Instead, aggregating genetic markers based on biological information might increase the power to identify sets of genetic markers of etiological significance. Several set test methods have been proposed: Here we propose a new set test derived from genomic best linear unbiased prediction (GBLUP), the covariance association test (CVAT). We compared the performance of CVAT to other commonly used set tests. The comparison was conducted using a simulated study population having the same genetic parameters as for schizophrenia. We found that CVAT was among the top performers. When extending CVAT to utilize a mixture of SNP effects, we found an increase in power to detect the causal sets. Applying the methods to a Danish schizophrenia case-control data set, we found genomic evidence for association of schizophrenia with vitamin A metabolism and immunological responses, which previously have been implicated with schizophrenia based on experimental and observational studies.
Collapse
|
28
|
Accuracy of heritability estimations in presence of hidden population stratification. Sci Rep 2016; 6:26471. [PMID: 27220488 PMCID: PMC4879529 DOI: 10.1038/srep26471] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 04/29/2016] [Indexed: 01/05/2023] Open
Abstract
The heritability of a trait is the proportion of its variance explained by genetic factors; it has historically been estimated using familial data. However, new methods have appeared for estimating heritabilities using genomewide data from unrelated individuals. A drawback of this strategy is that population stratification can bias the estimates. Indeed, an environmental factor associated with the phenotype may differ among population subgroups. This factor being associated both with the phenotype and the genetic variation in the population would be a confounder. A common solution consists in adjusting on the first Principal Components (PCs) of the genomic data. We study this procedure on simulated data and on 6000 individuals from the Three-City Study. We analyse the geographical coordinates of the birth cities, which are not genetically determined, but the heritability of which should be overestimated due to population stratification. We also analyse various anthropometric traits. The procedure fails to correct the bias in geographical coordinates heritability estimates. The heritability estimates of the anthropometric traits are affected by the inclusion of the first PC, but not by the following PCs, contrarily to geographical coordinates. We recommend to be cautious with heritability estimates obtained from a large population.
Collapse
|
29
|
Genomic prediction of unordered categorical traits: an application to subpopulation assignment in German Warmblood horses. Genet Sel Evol 2016; 48:13. [PMID: 26867647 PMCID: PMC4751658 DOI: 10.1186/s12711-016-0192-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 01/29/2016] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Categorical traits without ordinal representation of classes do not qualify for threshold models. Alternatively, the multinomial problem can be assessed by a sequence of independent binary contrasts using schemes such as one-vs-all or one-vs-one. Class probabilities can be arrived at by normalization or pair-wise coupling strategies. We assessed the predictive ability of whole-genome regression models and support vector machines for the classification of horses into four German Warmblood breeds. RESULTS Prediction accuracies of leave-one-out cross-validation were high and ranged from 0.75 to 0.97 depending on the binary classifier and breeds incorporated in the training. An analysis of the population structure using eigenvectors of the genomic relationship matrix revealed clustering of individuals beyond the given breed labels. Admixture between two breeds became apparent which had substantial impact on the prediction accuracies between those two breeds and also influenced the contrasts between other breeds. CONCLUSIONS Genomic prediction of unordered categorical traits was successfully applied to subpopulation assignment of German Warmblood horses. The applied methodology is a straightforward extension of existing binary threshold models for genomic prediction.
Collapse
|
30
|
Abstract
Genome-wide association (GWA) studies based on GBLUP models are a common practice in animal breeding. However, effect sizes of GWA tests are small, requiring larger sample sizes to enhance power of detection of rare variants. Because of difficulties in increasing sample size in animal populations, one alternative is to implement a meta-analysis (MA), combining information and results from independent GWA studies. Although this methodology has been used widely in human genetics, implementation in animal breeding has been limited. Thus, we present methods to implement a MA of GWA, describing the proper approach to compute weights derived from multiple genomic evaluations based on animal-centric GBLUP models. Application to real datasets shows that MA increases power of detection of associations in comparison with population-level GWA, allowing for population structure and heterogeneity of variance components across populations to be accounted for. Another advantage of MA is that it does not require access to genotype data that is required for a joint analysis. Scripts related to the implementation of this approach, which consider the strength of association as well as the sign, are distributed and thus account for heterogeneity in association phase between QTL and SNPs. Thus, MA of GWA is an attractive alternative to summarizing results from multiple genomic studies, avoiding restrictions with genotype data sharing, definition of fixed effects and different scales of measurement of evaluated traits.
Collapse
|
31
|
Inexpensive Computation of the Inverse of the Genomic Relationship Matrix in Populations with Small Effective Population Size. Genetics 2015; 202:401-9. [PMID: 26584903 PMCID: PMC4788224 DOI: 10.1534/genetics.115.182089] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 11/15/2015] [Indexed: 11/18/2022] Open
Abstract
Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called "algorithm for proven and young" (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.
Collapse
|
32
|
Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2015; 20:467-490. [PMID: 26660276 PMCID: PMC4666286 DOI: 10.1007/s13253-015-0222-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 09/16/2015] [Indexed: 11/22/2022]
Abstract
Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.
Collapse
|
33
|
Schizophrenia: A critical view on genetic effects. PSYCHOSIS-PSYCHOLOGICAL SOCIAL AND INTEGRATIVE APPROACHES 2015. [DOI: 10.1080/17522439.2015.1081269] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
34
|
Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction. Genet Epidemiol 2015; 39:427-38. [PMID: 25995153 PMCID: PMC4734143 DOI: 10.1002/gepi.21906] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Revised: 03/19/2015] [Accepted: 04/07/2015] [Indexed: 01/14/2023]
Abstract
Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R(2) for HC increased by 66% (0.0456-0.0755; P < 10(-16)), the R(2) for TA increased by 123% (0.0154 to 0.0344; P < 10(-16)), and the liability-scale R(2) for BCC increased by 68% (0.0138-0.0232; P < 10(-16)) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.
Collapse
|
35
|
Iron and hepcidin as risk factors in atherosclerosis: what do the genes say? BMC Genet 2015; 16:79. [PMID: 26159428 PMCID: PMC4498499 DOI: 10.1186/s12863-015-0246-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 06/30/2015] [Indexed: 01/05/2023] Open
Abstract
Background Previous reports suggested a role for iron and hepcidin in atherosclerosis. Here, we evaluated the causality of these associations from a genetic perspective via (i) a Mendelian randomization (MR) approach, (ii) study of association of atherosclerosis-related single nucleotide polymorphisms (SNPs) with iron and hepcidin, and (iii) estimation of genomic correlations between hepcidin, iron and atherosclerosis. Results Analyses were performed in a general population sample. Iron parameters (serum iron, serum ferritin, total iron-binding capacity and transferrin saturation), serum hepcidin and genome-wide SNP data were available for N = 1,819; non-invasive measurements of atherosclerosis (NIMA), i.e., presence of plaque, intima media thickness and ankle-brachial index (ABI), for N = 549. For the MR, we used 12 iron-related SNPs that were previously identified in a genome-wide association meta-analysis on iron status, and assessed associations of individual SNPs and quartiles of a multi-SNP score with NIMA. Quartile 4 versus quartile 1 of the multi-SNP score showed directionally consistent associations with the hypothesized direction of effect for all NIMA in women, indicating that increased body iron status is a risk factor for atherosclerosis in women. We observed no single SNP associations that fit the hypothesized directions of effect between iron and NIMA, except for rs651007, associated with decreased ferritin concentration and decreased atherosclerosis risk. Two of six NIMA-related SNPs showed association with the ratio hepcidin/ferritin, suggesting that an increased hepcidin/ferritin ratio increases atherosclerosis risk. Genomic correlations were close to zero, except for hepcidin and ferritin with ABI at rest [−0.27 (SE 0.34) and −0.22 (SE 0.35), respectively] and ABI after exercise [−0.29 (SE 0.34) and −0.30 (0.35), respectively]. The negative sign indicates an increased atherosclerosis risk with increased hepcidin and ferritin concentrations. Conclusions Our results suggest a potential causal role for hepcidin and ferritin in atherosclerosis, and may indicate that iron status is causally related to atherosclerosis in women. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0246-4) contains supplementary material, which is available to authorized users.
Collapse
|
36
|
Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models. Genetics 2015; 201:323-37. [PMID: 26122758 DOI: 10.1534/genetics.115.177394] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 06/25/2015] [Indexed: 01/27/2023] Open
Abstract
Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to "correct" for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP.
Collapse
|
37
|
A critical assessment of the equal-environment assumption of the twin method for schizophrenia. Front Psychiatry 2015; 6:62. [PMID: 25972816 PMCID: PMC4411885 DOI: 10.3389/fpsyt.2015.00062] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Accepted: 04/09/2015] [Indexed: 12/27/2022] Open
Abstract
The classical twin method (CTM) is central to the view that schizophrenia is ~80% heritable. The CTM rests on the equal-environment assumption (EEA) that identical and fraternal twin pairs experience equivalent trait-relevant environmental exposures. The EEA has not been directly tested for schizophrenia with measures of child social adversity, which is particularly etiologically relevant to the disorder. However, if child social adversity is more similar in identical than fraternal pairs in the general twin population, the EEA is unlikely to be valid for schizophrenia, a question which we tested in this study. Using results from prior twin studies, we tested if intraclass correlations for the following five categories of child social adversity are larger in identical than fraternal twins: bullying, sexual abuse, physical maltreatment, emotional neglect and abuse, and general trauma. Eleven relevant studies that encompassed 9119 twin pairs provided 24 comparisons of intraclass correlations, which we grouped into the five social exposure categories. Fisher's z-test revealed significantly higher correlations in identical than fraternal pairs for each exposure category (z ≥ 3.53, p < 0.001). The difference remained consistent across gender, study site (country), sample size, whether psychometric instruments were used, whether interviewing was proximate or distant to the exposures, and whether informants were twins or third persons. Combined with other evidence that the differential intraclass correlation for child social adversity cannot be explained by evocative gene-environment covariation, our results indicate that the CTM does not provide any valid indication of genomic effects in schizophrenia.
Collapse
|
38
|
Comparison of breeding value prediction for two traits in a Nellore-Angus crossbred population using different Bayesian modeling methodologies. Genet Mol Biol 2014; 37:631-7. [PMID: 25505837 PMCID: PMC4261962 DOI: 10.1590/s1415-47572014005000021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 07/23/2014] [Indexed: 12/29/2022] Open
Abstract
The objectives of this study were to 1) compare four models for breeding value prediction using genomic or pedigree information and 2) evaluate the impact of fixed effects that account for family structure. Comparisons were made in a Nellore-Angus population comprising F2, F3 and half-siblings to embryo transfer F2 calves with records for overall temperament at weaning (TEMP; n = 769) and Warner-Bratzler shear force (WBSF; n = 387). After quality control, there were 34,913 whole genome SNP markers remaining. Bayesian methods employed were BayesB (π̃ = 0.995 or 0.997 for WBSF or TEMP, respectively) and BayesC (π = 0 and π̃), where π̃ is the ideal proportion of markers not included. Direct genomic values (DGV) from single trait Bayesian analyses were compared to conventional pedigree-based animal model breeding values. Numerically, BayesC procedures (using π̃) had the highest accuracy of all models for WBSF and TEMP (ρ̂gĝ = 0.843 and 0.923, respectively), but BayesB had the least bias (regression of performance on prediction closest to 1, β̂y,x = 2.886 and 1.755, respectively). Accounting for family structure decreased accuracy and increased bias in prediction of DGV indicating a detrimental impact when used in these prediction methods that simultaneously fit many markers.
Collapse
|
39
|
Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet 2014; 95:535-52. [PMID: 25439723 PMCID: PMC4225595 DOI: 10.1016/j.ajhg.2014.10.004] [Citation(s) in RCA: 407] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 10/02/2014] [Indexed: 10/25/2022] Open
Abstract
Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment; p = 3.7 × 10(-17)) and 38% (SE = 4%) of hg(2) from genotyped SNPs (1.6× enrichment, p = 1.0 × 10(-4)). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg(2) despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease.
Collapse
|
40
|
A general unified framework to assess the sampling variance of heritability estimates using pedigree or marker-based relationships. Genetics 2014; 199:223-32. [PMID: 25361897 PMCID: PMC4286686 DOI: 10.1534/genetics.114.171017] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N(2), where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N.
Collapse
|
41
|
Estimation of heritability of different outcomes for genetic studies of TNFi response in patients with rheumatoid arthritis. Ann Rheum Dis 2014; 74:2183-7. [PMID: 25114059 DOI: 10.1136/annrheumdis-2014-205541] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 07/20/2014] [Indexed: 01/03/2023]
Abstract
OBJECTIVES Pharmacogenetic studies of tumour necrosis factor inhibitors (TNFi) response in patients with rheumatoid arthritis (RA) have largely relied on the changes in complex disease scores, such as disease activity score 28 (DAS28), as a measure of treatment response. It is expected that genetic architecture of such complex score is heterogeneous and not very suitable for pharmacogenetic studies. We aimed to select the most optimal phenotype for TNFi response using heritability estimates. METHODS Using two linear mixed-modelling approaches (Bayz and GCTA), we estimated heritability, together with genomic and environmental correlations for the TNFi drug-response phenotype ΔDAS28 and its separate components: Δ swollen joint count (SJC), Δ tender joint count (TJC), Δ erythrocyte sedimentation rate (ESR) and Δ visual-analogue scale of general health (VAS-GH). For this, we used genome-wide single nucleotide polymorphism (SNP) data from 878 TNFi-treated Dutch patients with RA. Furthermore, a multivariate genome-wide association study (GWAS) approach was implemented, analysing separate DAS28 components simultaneously. RESULTS The highest heritability estimates were found for ΔSJC (h(2)gbayz=0.76 and h(2)gGCTA=0.87) and ΔTJC (h(2)gbayz=0.62 and h(2)gGCTA=0.82); lower heritability was found for ΔDAS28 (h(2)gbayz=0.59 and h(2)gGCTA=0.71) while estimates for ΔESR and ΔVASGH were near or equal to zero. The highest genomic correlations were observed for ΔSJC and ΔTJC (0.49), and the highest environmental correlation was seen between ΔTJC and ΔVASGH (0.62). The multivariate GWAS did not generate excess of low p values as compared with a univariate analysis of ΔDAS28. CONCLUSIONS Our results indicate that multiple SNPs together explain a substantial portion of the variation in change in joint counts in TNFi-treated patients with RA. In conclusion, of the outcomes studied, the joint counts are most suitable for TNFi pharmacogenetics in RA.
Collapse
|
42
|
Whole-genome analyses of lung function, height and smoking. Ann Hum Genet 2014; 78:452-67. [PMID: 25081033 DOI: 10.1111/ahg.12078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/15/2014] [Indexed: 11/29/2022]
Abstract
A joint analysis of FEV1 (forced expiratory volume after one second) and height is reported using novel methodology, as well as a single-trait analysis of smoking status. A first goal of the study was to incorporate dense genetic marker information in a random regression (Bayesian) model to quantify the relative contributions of genomic and environmental factors to the relationship between FEV1 and height. Smoking status was analysed using a probit random regression model and a second goal of the study was to estimate the genomic heritability of smoking status. Estimates of genomic heritabilities for height and FEV1 are equal to 0.47 and to 0.30, respectively. The estimates of the genomic and environmental correlations between height and FEV1 are 0.78 and 0.34, respectively. The posterior mean of the genomic heritability of smoking status is equal to 0.14 and provides evidence for the presence of genetic factors associated with the trait. Under the data augmentation strategy introduced, the joint posterior distribution of FEV1 and height factorises into two independent posterior distributions. This simplifies programming and results in excellent numerical behaviour. The approach can be readily extended for the joint analysis of an arbitrary number of traits. Details are shown in an Appendix.
Collapse
|
43
|
Abstract
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis.
Collapse
|
44
|
Poly-omic prediction of complex traits: OmicKriging. Genet Epidemiol 2014; 38:402-15. [PMID: 24799323 DOI: 10.1002/gepi.21808] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 03/11/2014] [Accepted: 03/12/2014] [Indexed: 12/23/2022]
Abstract
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods. We provide an R package to implement OmicKriging (http://www.scandb.org/newinterface/tools/OmicKriging.html).
Collapse
|
45
|
Abstract
Using a reduced subset of SNPs in a linear mixed model can improve power for genome-wide association studies, yet this can result in insufficient correction for population stratification. We propose a hybrid approach using principal components that does not inflate statistics in the presence of population stratification and improves power over standard linear mixed models.
Collapse
|
46
|
Conditions for the validity of SNP-based heritability estimation. Hum Genet 2014; 133:1011-22. [PMID: 24744256 DOI: 10.1007/s00439-014-1441-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 03/28/2014] [Indexed: 01/05/2023]
Abstract
The heritability of a trait (h(2)) is the proportion of its population variance caused by genetic differences, and estimates of this parameter are important for interpreting the results of genome-wide association studies (GWAS). In recent years, researchers have adopted a novel method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals. The quantity estimated by this method is purported to be the contribution to heritability that could in principle be recovered from association studies employing the given panel of SNPs (h(2)(SNP)). Thus far, the validity of this approach has mostly been tested empirically. Here, we provide a mathematical explication and show that the method should remain a robust means of obtaining h(2)(SNP)) under circumstances wider than those under which it has so far been derived.
Collapse
|
47
|
Conditions for the validity of SNP-based heritability estimation. Hum Genet 2014. [DOI: 10.1007/s00439-014-1441-5 (cit.on p.4).] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
|
48
|
Abstract
We examined whether or not the predictive ability of genomic best linear unbiased prediction (GBLUP) could be improved via a resampling method used in machine learning: bootstrap aggregating sampling (“bagging”). In theory, bagging can be useful when the predictor has large variance or when the number of markers is much larger than sample size, preventing effective regularization. After presenting a brief review of GBLUP, bagging was adapted to the context of GBLUP, both at the level of the genetic signal and of marker effects. The performance of bagging was evaluated with four simulated case studies including known or unknown quantitative trait loci, and an application was made to real data on grain yield in wheat planted in four environments. A metric aimed to quantify candidate-specific cross-validation uncertainty was proposed and assessed; as expected, model derived theoretical reliabilities bore no relationship with cross-validation accuracy. It was found that bagging can ameliorate predictive performance of GBLUP and make it more robust against over-fitting. Seemingly, 25–50 bootstrap samples was enough to attain reasonable predictions as well as stable measures of individual predictive mean squared errors.
Collapse
|
49
|
The impact of population structure on genomic prediction in stratified populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2014; 127:749-62. [PMID: 24452438 DOI: 10.1007/s00122-013-2255-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 12/14/2013] [Indexed: 05/18/2023]
Abstract
Impacts of population structure on the evaluation of genomic heritability and prediction were investigated and quantified using high-density markers in diverse panels in rice and maize. Population structure is an important factor affecting estimation of genomic heritability and assessment of genomic prediction in stratified populations. In this study, our first objective was to assess effects of population structure on estimations of genomic heritability using the diversity panels in rice and maize. Results indicate population structure explained 33 and 7.5% of genomic heritability for rice and maize, respectively, depending on traits, with the remaining heritability explained by within-subpopulation variation. Estimates of within-subpopulation heritability were higher than that derived from quantitative trait loci identified in genome-wide association studies, suggesting 65% improvement in genetic gains. The second objective was to evaluate effects of population structure on genomic prediction using cross-validation experiments. When population structure exists in both training and validation sets, correcting for population structure led to a significant decrease in accuracy with genomic prediction. In contrast, when prediction was limited to a specific subpopulation, population structure showed little effect on accuracy and within-subpopulation genetic variance dominated predictions. Finally, effects of genomic heritability on genomic prediction were investigated. Accuracies with genomic prediction increased with genomic heritability in both training and validation sets, with the former showing a slightly greater impact. In summary, our results suggest that the population structure contribution to genomic prediction varies based on prediction strategies, and is also affected by the genetic architectures of traits and populations. In practical breeding, these conclusions may be helpful to better understand and utilize the different genetic resources in genomic prediction.
Collapse
|
50
|
Author reply to A commentary on Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 2014; 14:894. [PMID: 24240515 DOI: 10.1038/nrg3457-c2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|