1
|
Liu Z, Turkmen AS, Lin S. Bayesian LASSO for population stratification correction in rare haplotype association studies. Stat Appl Genet Mol Biol 2024; 23:sagmb-2022-0034. [PMID: 38235525 PMCID: PMC10794901 DOI: 10.1515/sagmb-2022-0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 12/19/2023] [Indexed: 01/19/2024]
Abstract
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.
Collapse
Affiliation(s)
- Zilu Liu
- Department of Statistics, The Ohio State University, Columbus, OH43210, USA
| | | | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH43210, USA
| |
Collapse
|
2
|
Xu C, Wang X, Lim J, Xiao G, Xie Y. RCRdiff: A fully integrated Bayesian method for differential expression analysis using raw NanoString nCounter data. Stat Med 2022; 41:665-680. [PMID: 34773277 PMCID: PMC8795478 DOI: 10.1002/sim.9250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 08/23/2021] [Accepted: 10/16/2021] [Indexed: 11/05/2022]
Abstract
The medium-throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter data and Bayesian LASSO for variable selection, we propose a fully integrated Bayesian method, called RCRdiff, to detect differentially expressed (DE) genes between different groups of tissue samples (eg, normal and cancer). Unlike existing methods that often require normalization performed beforehand, RCRdiff directly handles raw read counts and jointly models the behaviors of different types of internal controls along with DE and non-DE gene patterns. Doing so would avoid efficiency loss caused by ignoring estimation uncertainty from the normalization step in a sequential approach and thus can offer more reliable statistical inference. We also propose clustering-based strategies for DE gene selection, which do not require any external dataset and are free of any arbitrary cutoff. Empirical evidence of the attractiveness of RCRdiff is demonstrated via extensive simulation and data examples.
Collapse
Affiliation(s)
- Can Xu
- Department of Statistical Science, Southern Methodist University, Texas, USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Texas, USA,Correspondence: Xinlei Wang, Department of Statistical Science, Southern Methodist University, Dallas, TX 75275.
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Guanghua Xiao
- Department of Population & Data Sciences and Department of Bioinformatics, University of Texas Southwestern Medical Center, Texas, USA
| | - Yang Xie
- Department of Population & Data Sciences and Department of Bioinformatics, University of Texas Southwestern Medical Center, Texas, USA
| |
Collapse
|
3
|
Datta AS, Lin S, Biswas S. A Family-Based Rare Haplotype Association Method for Quantitative Traits. Hum Hered 2019; 83:175-195. [PMID: 30799419 DOI: 10.1159/000493543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 09/07/2018] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect. METHODS We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods. RESULTS We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure. CONCLUSION FamQBL can help uncover rHTVs associated with quantitative traits.
Collapse
Affiliation(s)
- Ananda S Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
4
|
Carreño LOD, da Conceição Pessoa M, Espigolan R, Takada L, Bresolin T, Cavani L, Baldi F, Carvalheiro R, de Albuquerque LG, da Fonseca R. Genome Association Study for Visual Scores in Nellore Cattle Measured at Weaning. BMC Genomics 2019; 20:150. [PMID: 30786866 PMCID: PMC6381746 DOI: 10.1186/s12864-019-5520-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 02/07/2019] [Indexed: 01/04/2023] Open
Abstract
Background Genome-wide association studies (GWAS) are utilized in cattle to identify regions or genetic variants associated with phenotypes of interest, and thus, to identify design strategies that allow for the increase of the frequency of favorable alleles. Visual scores are important traits of cattle production in Brazil because they are utilized as selection criteria, helping to choose more harmonious animals. Despite its importance, there are still no studies on the genome association for these traits. This study aimed to identify genome regions associated with the traits of conformation, precocity and muscling, based on a visual score measured at weaning. Results Bayesian approaches with BayesC and Bayesian LASSO were utilized with 2873 phenotypes of Nellore cattle for a GWAS. The animals were genotyped with Illumina BovineHD BeadChip, and a total of 309,865 SNPs were utilized after quality control. In the analyses, phenotype and deregressed breeding values were utilized as dependent variables; a threshold model was utilized for the former and a linear model for the latter. The association criterion was the percentage of genetic variance explained by SNPs found in 1 Mb-long windows. The Bayesian approach BayesC was better adjusted to the data because it could explain a larger phenotypic variance for both dependent variables. Conclusions There were no large effects for the visual scores, indicating that they have a polygenic nature; however, regions in chromosomes 1, 3, 5, 7, 14, 15, 16, 19, 20 and 23 were identified and explained a large part of the genetic variance.
Collapse
Affiliation(s)
- Luis Orlando Duitama Carreño
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Matilde da Conceição Pessoa
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Rafael Espigolan
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Luciana Takada
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Tiago Bresolin
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Ligia Cavani
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil.
| | - Fernando Baldi
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Roberto Carvalheiro
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Lucia Galvão de Albuquerque
- Animal Science Department, School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, São Paulo, Brazil
| | - Ricardo da Fonseca
- Animal Science Department, São Paulo State University (Unesp), Dracena, São Paulo, Brazil
| |
Collapse
|
5
|
Chen ZQ, Baison J, Pan J, Karlsson B, Andersson B, Westin J, García-Gil MR, Wu HX. Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce. BMC Genomics 2018; 19:946. [PMID: 30563448 DOI: 10.1186/s12864-12018-15256-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 11/16/2018] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND Genomic selection (GS) can increase genetic gain by reducing the length of breeding cycle in forest trees. Here we genotyped 1370 control-pollinated progeny trees from 128 full-sib families in Norway spruce (Picea abies (L.) Karst.), using exome capture as genotyping platform. We used 116,765 high-quality SNPs to develop genomic prediction models for tree height and wood quality traits. We assessed the impact of different genomic prediction methods, genotype-by-environment interaction (G × E), genetic composition, size of the training and validation set, relatedness, and number of SNPs on accuracy and predictive ability (PA) of GS. RESULTS Using G matrix slightly altered heritability estimates relative to pedigree-based method. GS accuracies were about 11-14% lower than those based on pedigree-based selection. The efficiency of GS per year varied from 1.71 to 1.78, compared to that of the pedigree-based model if breeding cycle length was halved using GS. Height GS accuracy decreased to more than 30% while using one site as training for GS prediction and using this model to predict the second site, indicating that G × E for tree height should be accommodated in model fitting. Using a half-sib family structure instead of full-sib structure led to a significant reduction in GS accuracy and PA. The full-sib family structure needed only 750 markers to reach similar accuracy and PA, as compared to 100,000 markers required for the half-sib family, indicating that maintaining the high relatedness in the model improves accuracy and PA. Using 4000-8000 markers in full-sib family structure was sufficient to obtain GS model accuracy and PA for tree height and wood quality traits, almost equivalent to that obtained with all markers. CONCLUSIONS The study indicates that GS would be efficient in reducing generation time of breeding cycle in conifer tree breeding program that requires long-term progeny testing. The sufficient number of trees within-family (16 for growth and 12 for wood quality traits) and number of SNPs (8000) are required for GS with full-sib family relationship. GS methods had little impact on GS efficiency for growth and wood quality traits. GS model should incorporate G × E effect when a strong G × E is detected.
Collapse
Affiliation(s)
- Zhi-Qiang Chen
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - John Baison
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Jin Pan
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Bo Karlsson
- Skogforsk, Ekebo 2250, SE-268 90, Svalöv, Sweden
| | | | | | - María Rosario García-Gil
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden
| | - Harry X Wu
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183, Umeå, Sweden.
- CSIRO NRCA, Black Mountain Laboratory, Canberra, ACT, 2601, Australia.
| |
Collapse
|
6
|
Chen ZQ, Baison J, Pan J, Karlsson B, Andersson B, Westin J, García-Gil MR, Wu HX. Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce. BMC Genomics 2018; 19:946. [PMID: 30563448 PMCID: PMC6299659 DOI: 10.1186/s12864-018-5256-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 11/16/2018] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Genomic selection (GS) can increase genetic gain by reducing the length of breeding cycle in forest trees. Here we genotyped 1370 control-pollinated progeny trees from 128 full-sib families in Norway spruce (Picea abies (L.) Karst.), using exome capture as genotyping platform. We used 116,765 high-quality SNPs to develop genomic prediction models for tree height and wood quality traits. We assessed the impact of different genomic prediction methods, genotype-by-environment interaction (G × E), genetic composition, size of the training and validation set, relatedness, and number of SNPs on accuracy and predictive ability (PA) of GS. RESULTS Using G matrix slightly altered heritability estimates relative to pedigree-based method. GS accuracies were about 11-14% lower than those based on pedigree-based selection. The efficiency of GS per year varied from 1.71 to 1.78, compared to that of the pedigree-based model if breeding cycle length was halved using GS. Height GS accuracy decreased to more than 30% while using one site as training for GS prediction and using this model to predict the second site, indicating that G × E for tree height should be accommodated in model fitting. Using a half-sib family structure instead of full-sib structure led to a significant reduction in GS accuracy and PA. The full-sib family structure needed only 750 markers to reach similar accuracy and PA, as compared to 100,000 markers required for the half-sib family, indicating that maintaining the high relatedness in the model improves accuracy and PA. Using 4000-8000 markers in full-sib family structure was sufficient to obtain GS model accuracy and PA for tree height and wood quality traits, almost equivalent to that obtained with all markers. CONCLUSIONS The study indicates that GS would be efficient in reducing generation time of breeding cycle in conifer tree breeding program that requires long-term progeny testing. The sufficient number of trees within-family (16 for growth and 12 for wood quality traits) and number of SNPs (8000) are required for GS with full-sib family relationship. GS methods had little impact on GS efficiency for growth and wood quality traits. GS model should incorporate G × E effect when a strong G × E is detected.
Collapse
Affiliation(s)
- Zhi-Qiang Chen
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183 Umeå, Sweden
| | - John Baison
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183 Umeå, Sweden
| | - Jin Pan
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183 Umeå, Sweden
| | - Bo Karlsson
- Skogforsk, Ekebo 2250, SE-268 90 Svalöv, Sweden
| | | | | | - María Rosario García-Gil
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183 Umeå, Sweden
| | - Harry X. Wu
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183 Umeå, Sweden
- CSIRO NRCA, Black Mountain Laboratory, Canberra, ACT 2601 Australia
| |
Collapse
|
7
|
Xu M, Zhong F, Bruno RS, Ballard KD, Zhang J, Zhu J. Comparative Metabolomics Elucidates Postprandial Metabolic Modifications in Plasma of Obese Individuals with Metabolic Syndrome. J Proteome Res 2018; 17:2850-2860. [PMID: 29975061 DOI: 10.1021/acs.jproteome.8b00315] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Although higher intakes of dairy milk are associated with a lower risk of metabolic syndrome (MetS), the underlying protective mechanism remains unclear. This study investigated the dynamic metabolic profile shift following the ingestion of low-fat milk or an isocaloric volume of rice milk in obese individuals with metabolic syndrome (MetS). In a randomized, double-blind, crossover study, postprandial plasma samples ( n = 266) were collected from 19 MetS participants. Plasma samples were analyzed by a targeted metabolomics platform which specifically detects 117 metabolites from 25 metabolic pathways. The comprehensive time-course metabolic profiling in MetS participants indicated that the postprandial metabolic profiles distinguish low-fat milk and rice milk consumption in a time-dependent manner. Metabolic biomarkers, such as orotate, leucine/isoleucine and adenine, showed significantly different trends in the two test beverages. Bayesian statistics identified 12 metabolites associated with clinical characteristics of postprandial vascular endothelial function, such as flow-mediated dilation (FMD), postprandial plasma markers of oxidative stress and NO status. Furthermore, metabolic pathway analysis based on these metabolite data indicated the potential utility of metabolomics to provide mechanistic insights of dietary interventions to regulate postprandial metabolic excursions.
Collapse
Affiliation(s)
- Mengyang Xu
- Department of Chemistry and Biochemistry , Miami University , Oxford , Ohio 45056 , United States
| | - Fanyi Zhong
- Department of Chemistry and Biochemistry , Miami University , Oxford , Ohio 45056 , United States
| | - Richard S Bruno
- Human Nutrition Program , The Ohio State University , Columbus , Ohio 43210 , United States
| | - Kevin D Ballard
- Department of Kinesiology and Health , Miami University , Oxford , Ohio 45056 , United States
| | - Jing Zhang
- Department of Statistics , Miami University , Oxford , Ohio 45056 , United States
| | - Jiangjiang Zhu
- Department of Chemistry and Biochemistry , Miami University , Oxford , Ohio 45056 , United States
| |
Collapse
|
8
|
Tan B, Grattapaglia D, Martins GS, Ferreira KZ, Sundberg B, Ingvarsson PK. Evaluating the accuracy of genomic prediction of growth and wood traits in two Eucalyptus species and their F 1 hybrids. BMC Plant Biol 2017; 17:110. [PMID: 28662679 PMCID: PMC5492818 DOI: 10.1186/s12870-017-1059-6] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 06/15/2017] [Indexed: 05/18/2023]
Abstract
BACKGROUND Genomic prediction is a genomics assisted breeding methodology that can increase genetic gains by accelerating the breeding cycle and potentially improving the accuracy of breeding values. In this study, we use 41,304 informative SNPs genotyped in a Eucalyptus breeding population involving 90 E.grandis and 78 E.urophylla parents and their 949 F1 hybrids to develop genomic prediction models for eight phenotypic traits - basic density and pulp yield, circumference at breast height and height and tree volume scored at age three and six years. We assessed the impact of different genomic prediction methods, the composition and size of the training and validation set and the number and genomic location of SNPs on the predictive ability (PA). RESULTS Heritabilities estimated using the realized genomic relationship matrix (GRM) were considerably higher than estimates based on the expected pedigree, mainly due to inconsistencies in the expected pedigree that were readily corrected by the GRM. Moreover, the GRM more precisely capture Mendelian sampling among related individuals, such that the genetic covariance was based on the true proportion of the genome shared between individuals. PA improved considerably when increasing the size of the training set and by enhancing relatedness to the validation set. Prediction models trained on pure species parents could not predict well in F1 hybrids, indicating that model training has to be carried out in hybrid populations if one is to predict in hybrid selection candidates. The different genomic prediction methods provided similar results for all traits, therefore either GBLUP or rrBLUP represents better compromises between computational time and prediction efficiency. Only slight improvement was observed in PA when more than 5000 SNPs were used for all traits. Using SNPs in intergenic regions provided slightly better PA than using SNPs sampled exclusively in genic regions. CONCLUSIONS The size and composition of the training set and number of SNPs used are the two most important factors for model prediction, compared to the statistical methods and the genomic location of SNPs. Furthermore, training the prediction model based on pure parental species only provide limited ability to predict traits in interspecific hybrids. Our results provide additional promising perspectives for the implementation of genomic prediction in Eucalyptus breeding programs by the selection of interspecific hybrids.
Collapse
Affiliation(s)
- Biyue Tan
- Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, Umeå, SE-90187 Sweden
- Biomaterials Division, Stora Enso AB, Nacka, SE-13104 Sweden
| | - Dario Grattapaglia
- EMBRAPA Genetic Resources and Biotechnology – EPqB, Brasilia, DF 70770-910 Brazil
- Universidade Católica de Brasília- SGAN, 916 modulo B, Brasilia, DF 70790-160 Brazil
| | | | | | - Björn Sundberg
- Biomaterials Division, Stora Enso AB, Nacka, SE-13104 Sweden
| | - Pär K. Ingvarsson
- Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, Umeå, SE-90187 Sweden
- Present address: Department of Plant Biology, Uppsala BioCenter, Swedish University of Agricultural Sciences, Uppsala, SE-75007 Sweden
| |
Collapse
|
9
|
López de Maturana E, Picornell A, Masson-Lecomte A, Kogevinas M, Márquez M, Carrato A, Tardón A, Lloreta J, García-Closas M, Silverman D, Rothman N, Chanock S, Real FX, Goddard ME, Malats N. Prediction of non-muscle invasive bladder cancer outcomes assessed by innovative multimarker prognostic models. BMC Cancer 2016; 16:351. [PMID: 27259534 PMCID: PMC4893282 DOI: 10.1186/s12885-016-2361-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 05/12/2016] [Indexed: 01/28/2023] Open
Abstract
Background We adapted Bayesian statistical learning strategies to the prognosis field to investigate if genome-wide common SNP improve the prediction ability of clinico-pathological prognosticators and applied it to non-muscle invasive bladder cancer (NMIBC) patients. Methods Adapted Bayesian sequential threshold models in combination with LASSO were applied to consider the time-to-event and the censoring nature of data. We studied 822 NMIBC patients followed-up >10 years. The study outcomes were time-to-first-recurrence and time-to-progression. The predictive ability of the models including up to 171,304 SNP and/or 6 clinico-pathological prognosticators was evaluated using AUC-ROC and determination coefficient. Results Clinico-pathological prognosticators explained a larger proportion of the time-to-first-recurrence (3.1 %) and time-to-progression (5.4 %) phenotypic variances than SNPs (1 and 0.01 %, respectively). Adding SNPs to the clinico-pathological-parameters model slightly improved the prediction of time-to-first-recurrence (up to 4 %). The prediction of time-to-progression using both clinico-pathological prognosticators and SNP did not improve. Heritability (ĥ2) of both outcomes was <1 % in NMIBC. Conclusions We adapted a Bayesian statistical learning method to deal with a large number of parameters in prognostic studies. Common SNPs showed a limited role in predicting NMIBC outcomes yielding a very low heritability for both outcomes. We report for the first time a heritability estimate for a disease outcome. Our method can be extended to other disease models. Electronic supplementary material The online version of this article (doi:10.1186/s12885-016-2361-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- E López de Maturana
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - A Picornell
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - A Masson-Lecomte
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - M Kogevinas
- Centre for Research in Environmental Epidemiology (CREAL), Parc de Salut Mar, Barcelona, Spain.,CIBERESP, Madrid, Spain
| | - M Márquez
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain
| | - A Carrato
- Servicio de Oncología, Hospital Universitario Ramon y Cajal, Madrid, and Servicio de Oncología, Hospital Universitario de Elche, Elche, Spain
| | - A Tardón
- Department of Preventive Medicine Universidad de Oviedo, Oviedo, Spain.,CIBERESP, Madrid, Spain
| | - J Lloreta
- Parc de Salut Mar and Departament of Pathology, Hospital del Mar - IMAS, Barcelona, Spain
| | - M García-Closas
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - D Silverman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Department of Health and Human Services, Bethesda, Maryland, USA
| | - N Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Department of Health and Human Services, Bethesda, Maryland, USA
| | - S Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Department of Health and Human Services, Bethesda, Maryland, USA
| | - F X Real
- Epithelial Carcinogenesis Group, Spanish National Cancer Research Centre (CNIO), Madrid, and Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - M E Goddard
- Biosciences Research Division, Department of Environment and Primary Industries, Agribio, and Department of Food and Agricultural Systems, University of Melbourne, Melbourne, Australia
| | - N Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández, Almagro, 3, 28029, Madrid, Spain.
| | | |
Collapse
|
10
|
Niemi J, Mittman E, Landau W, Nettleton D. Empirical Bayes analysis of RNA-seq data for detection of gene expression heterosis. J Agric Biol Environ Stat 2015; 20:614-628. [PMID: 27147815 DOI: 10.1007/s13253-015-0230-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
An important type of heterosis, known as hybrid vigor, refers to the enhancements in the phenotype of hybrid progeny relative to their inbred parents. Although hybrid vigor is extensively utilized in agriculture, its molecular basis is still largely unknown. In an effort to understand phenotypic heterosis at the molecular level, researchers are measuring transcript abundance levels of thousands of genes in parental inbred lines and their hybrid offspring using RNA sequencing (RNA-seq) technology. The resulting data allow researchers to search for evidence of gene expression heterosis as one potential molecular mechanism underlying heterosis of agriculturally important traits. The null hypotheses of greatest interest in testing for gene expression heterosis are composite null hypotheses that are difficult to test with standard statistical approaches for RNA-seq analysis. To address these shortcomings, we develop a hierarchical negative binomial model and draw inferences using a computationally tractable empirical Bayes approach to inference. We demonstrate improvements over alternative methods via a simulation study based on a maize experiment and then analyze that maize experiment with our newly proposed methodology. This article has supplementary material online.
Collapse
Affiliation(s)
- Jarad Niemi
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| | - Eric Mittman
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| | - Will Landau
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| | - Dan Nettleton
- Department of Statistics, Iowa State University, Ames, Iowa, U.S.A
| |
Collapse
|
11
|
Hidalgo AM, Lopes PS, Paixão DM, Silva FF, Bastiaansen JWM, Paiva SR, Faria DA, Guimarães SEF. Fine mapping and single nucleotide polymorphism effects estimation on pig chromosomes 1, 4, 7, 8, 17 and X. Genet Mol Biol 2014; 36:511-9. [PMID: 24385854 PMCID: PMC3873182 DOI: 10.1590/s1415-47572013000400009] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 08/26/2013] [Indexed: 11/21/2022] Open
Abstract
Fine mapping of quantitative trait loci (QTL) from previous linkage studies was performed on pig chromosomes 1, 4, 7, 8, 17, and X which were known to harbor QTL. Traits were divided into: growth performance, carcass, internal organs, cut yields, and meat quality. Fifty families were used of a F2 population produced by crossing local Brazilian Piau boars with commercial sows. The linkage map consisted of 237 SNP and 37 microsatellite markers covering 866 centimorgans. QTL were identified by regression interval mapping using GridQTL. Individual marker effects were estimated by Bayesian LASSO regression using R. In total, 32 QTL affecting the evaluated traits were detected along the chromosomes studied. Seven of the QTL were known from previous studies using our F2 population, and 25 novel QTL resulted from the increased marker coverage. Six of the seven QTL that were significant at the 5% genome-wide level had SNPs within their confidence interval whose effects were among the 5% largest effects. The combined use of microsatellites along with SNP markers increased the saturation of the genome map and led to smaller confidence intervals of the QTL. The results showed that the tested models yield similar improvements in QTL mapping accuracy.
Collapse
Affiliation(s)
- André M Hidalgo
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Paulo S Lopes
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Débora M Paixão
- Departamento de Zootecnia, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Fabyano F Silva
- Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - John W M Bastiaansen
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, The Netherlands
| | - Samuel R Paiva
- Embrapa Recursos Genéticos e Biotecnologia, Brasília, DF, Brazil
| | - Danielle A Faria
- Embrapa Recursos Genéticos e Biotecnologia, Brasília, DF, Brazil
| | | |
Collapse
|
12
|
Charmet G, Storlie E, Oury FX, Laurent V, Beghin D, Chevarin L, Lapierre A, Perretant MR, Rolland B, Heumez E, Duchalais L, Goudemand E, Bordes J, Robert O. Genome-wide prediction of three important traits in bread wheat. Mol Breed 2014. [PMID: 26316839 DOI: 10.1007/s11032‐014‐0143‐y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Five genomic prediction models were applied to three wheat agronomic traits-grain yield, heading date and grain test weight-in three breeding populations, each comprising about 350 doubled haploid or recombinant inbred lines evaluated in three locations during a 3-year period. The prediction accuracy, measured as the correlation between genomic estimated breeding value and observed trait, was in the range of previously published values for yield (r = 0.2-0.5), a trait with relatively low heritability. Accuracies for heading date and test weight, with relatively high heritabilities, were about 0.70. There was no improvement of prediction accuracy when two or three breeding populations were merged into one for a larger training set (e.g., for yield r ranged between 0.11 and 0.40 in the respective populations and between 0.18 and 0.35 in the merged populations). Cross-population prediction, when one population was used as the training population set and another population was used as the validation set, resulted in no prediction accuracy. This lack of cross-population prediction accuracy cannot be explained by a lower level of relatedness between populations, as measured by a shared SNP similarity, since it was only slightly lower between than within populations. Simulation studies confirm that cross-prediction accuracy decreases as the proportion of shared QTLs decreases, which can be expected from a higher level of QTL × environment interactions.
Collapse
Affiliation(s)
- Gilles Charmet
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Eric Storlie
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France ; Colorado State University, Fort Collins, CO 80523 USA
| | - François Xavier Oury
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Valérie Laurent
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| | - Denis Beghin
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| | - Laetitia Chevarin
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Annie Lapierre
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Marie Reine Perretant
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Bernard Rolland
- INRA-APBV, Domaine de la Motte, BP 35327, 35653 Le Rheu Cedex, France
| | - Emmanuel Heumez
- INRA UE Lille, 2 chaussée Brunehaut, Estrées-Mons, BP 50136, 80203 Peronne Cedex, France
| | - Laure Duchalais
- Bioplante-R2n, 60 rue Léon Beauchamp, 59930 La Chapelle d'Armentières, France
| | - Ellen Goudemand
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| | - Jacques Bordes
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Olivier Robert
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| |
Collapse
|
13
|
Charmet G, Storlie E, Oury FX, Laurent V, Beghin D, Chevarin L, Lapierre A, Perretant MR, Rolland B, Heumez E, Duchalais L, Goudemand E, Bordes J, Robert O. Genome-wide prediction of three important traits in bread wheat. Mol Breed 2014; 34:1843-1852. [PMID: 26316839 PMCID: PMC4544631 DOI: 10.1007/s11032-014-0143-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2014] [Accepted: 06/30/2014] [Indexed: 05/19/2023]
Abstract
Five genomic prediction models were applied to three wheat agronomic traits-grain yield, heading date and grain test weight-in three breeding populations, each comprising about 350 doubled haploid or recombinant inbred lines evaluated in three locations during a 3-year period. The prediction accuracy, measured as the correlation between genomic estimated breeding value and observed trait, was in the range of previously published values for yield (r = 0.2-0.5), a trait with relatively low heritability. Accuracies for heading date and test weight, with relatively high heritabilities, were about 0.70. There was no improvement of prediction accuracy when two or three breeding populations were merged into one for a larger training set (e.g., for yield r ranged between 0.11 and 0.40 in the respective populations and between 0.18 and 0.35 in the merged populations). Cross-population prediction, when one population was used as the training population set and another population was used as the validation set, resulted in no prediction accuracy. This lack of cross-population prediction accuracy cannot be explained by a lower level of relatedness between populations, as measured by a shared SNP similarity, since it was only slightly lower between than within populations. Simulation studies confirm that cross-prediction accuracy decreases as the proportion of shared QTLs decreases, which can be expected from a higher level of QTL × environment interactions.
Collapse
Affiliation(s)
- Gilles Charmet
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Eric Storlie
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
- Colorado State University, Fort Collins, CO 80523 USA
| | - François Xavier Oury
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Valérie Laurent
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| | - Denis Beghin
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| | - Laetitia Chevarin
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Annie Lapierre
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Marie Reine Perretant
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Bernard Rolland
- INRA-APBV, Domaine de la Motte, BP 35327, 35653 Le Rheu Cedex, France
| | - Emmanuel Heumez
- INRA UE Lille, 2 chaussée Brunehaut, Estrées-Mons, BP 50136, 80203 Peronne Cedex, France
| | - Laure Duchalais
- Bioplante-R2n, 60 rue Léon Beauchamp, 59930 La Chapelle d’Armentières, France
| | - Ellen Goudemand
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| | - Jacques Bordes
- UMR GDEC, INRA-Université Clermont II, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex, France
| | - Olivier Robert
- Bioplante-Florimond Desprez, BP41, 59242 Cappelle en Pévèle, France
| |
Collapse
|
14
|
Silva FFE, de Resende MDV, Rocha GS, Duarte DAS, Lopes PS, Brustolini OJB, Thus S, Viana JMS, Guimarães SEF. Genomic growth curves of an outbred pig population. Genet Mol Biol 2013; 36:520-7. [PMID: 24385855 PMCID: PMC3873183 DOI: 10.1590/s1415-47572013005000042] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 07/07/2013] [Indexed: 11/23/2022] Open
Abstract
In the current post-genomic era, the genetic basis of pig growth can be understood by assessing SNP marker effects and genomic breeding values (GEBV) based on estimates of these growth curve parameters as phenotypes. Although various statistical methods, such as random regression (RR-BLUP) and Bayesian LASSO (BL), have been applied to genomic selection (GS), none of these has yet been used in a growth curve approach. In this work, we compared the accuracies of RR-BLUP and BL using empirical weight-age data from an outbred F2 (Brazilian Piau X commercial) population. The phenotypes were determined by parameter estimates using a nonlinear logistic regression model and the halothane gene was considered as a marker for evaluating the assumptions of the GS methods in relation to the genetic variation explained by each locus. BL yielded more accurate values for all of the phenotypes evaluated and was used to estimate SNP effects and GEBV vectors. The latter allowed the construction of genomic growth curves, which showed substantial genetic discrimination among animals in the final growth phase. The SNP effect estimates allowed identification of the most relevant markers for each phenotype, the positions of which were coincident with reported QTL regions for growth traits.
Collapse
Affiliation(s)
| | | | | | - Darlene Ana S Duarte
- Departamento de Estatística, Universidade Federal de Viçosa, Viçosa, MG, Brazil . ; Departamento de Ciência Animal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Paulo Sávio Lopes
- Departamento de Ciência Animal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Otávio J B Brustolini
- Instituto de Biotecnologia Aplicada à Agropecuária, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Sander Thus
- Department of Animal Sciences, Wageningen University, Wageningen, Netherlands
| | - José Marcelo S Viana
- Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Simone E F Guimarães
- Departamento de Ciência Animal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| |
Collapse
|