1
|
St-Pierre J, Oualkacha K, Rai Bhatnagar S. Hierarchical selection of genetic and gene by environment interaction effects in high-dimensional mixed models. Stat Methods Med Res 2025; 34:180-198. [PMID: 39659138 PMCID: PMC11800719 DOI: 10.1177/09622802241293768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]
Abstract
Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models have been proposed for hierarchical selection of gene by environment interaction effects, where a gene-environment interaction effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these methods allow to include random effects to account for population structure, subject relatedness and shared environmental exposure. In this article, we develop a unified approach based on regularized penalized quasi-likelihood estimation to perform hierarchical selection of gene-environment interaction effects in sparse regularized mixed models. We compare the selection and prediction accuracy of our proposed model with existing methods through simulations under the presence of population structure and shared environmental exposure. We show that for all simulation scenarios, including and additional random effect to account for the shared environmental exposure reduces the false positive rate and false discovery rate of our proposed method for selection of both gene-environment interaction and main effects. Using the F 1 score as a balanced measure of the false discovery rate and true positive rate, we further show that in the hierarchical simulation scenarios, our method outperforms other methods for retrieving important gene-environment interaction effects. Finally, we apply our method to a real data application using the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, and found that our method retrieves previously reported significant loci.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Faculté des Sciences, Université du Québec à Montréal, Montreal, QC, Canada
| | - Sahir Rai Bhatnagar
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| |
Collapse
|
2
|
Xiao H, Hu L, Tan Q, Jia J, Xie P, Li J, Wang M. Transcriptional profiles reveal histologic origin and prognosis across 33 The Cancer Genome Atlas tumor types. Transl Cancer Res 2023; 12:2764-2780. [PMID: 37969389 PMCID: PMC10643977 DOI: 10.21037/tcr-23-234] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 08/18/2023] [Indexed: 11/17/2023]
Abstract
Background In recent years, with the development of transcriptome sequencing, the molecular characteristics of tumors are gradually revealed. Because of the complexity of tumor transcriptome, there is a need to look for the molecular signatures which can be used to evaluate the tissue origin and cell stemness of tumors in order to promote the diagnosis and treatment of tumors. Methods Tumor tissue-specific gene sets (TTSGs) consisting of 200 genes were selected using RNA expression data of 9,875 patients from 33 tumor types. t-distributed Stochastic Neighbor Embedding (t-SNE) was used for dimensionality reduction and visualization of TTSGs in each tumor type. To evaluate oncogenic dedifferentiation and loss of cell stemness, Euclidean distance from each sample to a human embryo single-cell RNA-seq dataset (GSE36552) of TTSGs was calculated as TTSGs index indicating dissimilarity of tumors and embryo. TTSGs index was evaluated for prognosis in each tumor type. Two published signature indexes, the mRNA signature index (mRNAsi) and CIBERSORT, were compared to assess the correlation between the TTSGs index with cell stemness and immune microenvironment. Finally, the difference of prognosis, immune microenvironment and radiotherapy outcomes were compared between patients with high and low TTSGs index. Results In this study, all 33 tumor types in The Cancer Genome Atlas (TCGA) were embedded into isolated clusters by t-SNE and confirmed by k-nearest neighbors (kNN) algorithm. Clusters of squamous-cell carcinoma were adjacent to each other revealing similar histologic origin. Basal-like breast cancer was separated from luminal and HER-2-amplified subtypes and closed to squamous-cell carcinoma. TTSGs index was related to overall survival outcomes in cancers derived from liver, thyroid, brain, cervical and kidney. There was a positive correlation between mRNAsi and TTSGs index in pan-kidney and pan-neuronal cancers. Furthermore, cell fractions of M2 macrophages and total leukocytes increased in the group with higher TTSGs index. Patients with higher TTSGs index had longer overall survival time and less radiation therapy resistance compared to patients with lower TTSGs index. Conclusions The signature of TTSGs is related to tumor expression features that distinguish tumors of different histologic origin using t-SNE. The signature also relates to prognosis of certain kinds of tumors.
Collapse
Affiliation(s)
- Hui Xiao
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, The Chinese University of Hong Kong, Shenzhen & Longgang District People’s Hospital of Shenzhen, Shenzhen, China
| | - Liang Hu
- Central Laboratory, Longgang District Maternity & Child Healthcare Hospital of Shenzhen City, Shenzhen, China
| | - Qi Tan
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, The Chinese University of Hong Kong, Shenzhen & Longgang District People’s Hospital of Shenzhen, Shenzhen, China
| | - Jinping Jia
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, The Chinese University of Hong Kong, Shenzhen & Longgang District People’s Hospital of Shenzhen, Shenzhen, China
| | - Ping Xie
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, The Chinese University of Hong Kong, Shenzhen & Longgang District People’s Hospital of Shenzhen, Shenzhen, China
| | - Junai Li
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, The Chinese University of Hong Kong, Shenzhen & Longgang District People’s Hospital of Shenzhen, Shenzhen, China
| | - Minghua Wang
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, The Chinese University of Hong Kong, Shenzhen & Longgang District People’s Hospital of Shenzhen, Shenzhen, China
| |
Collapse
|
3
|
Wang Q, Jiang S, Li T, Qiu Z, Yan J, Fu R, Ma C, Wang X, Jiang S, Cheng Q. G2P Provides an Integrative Environment for Multi-model genomic selection analysis to improve genotype-to-phenotype prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1207139. [PMID: 37600179 PMCID: PMC10437076 DOI: 10.3389/fpls.2023.1207139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/21/2023] [Indexed: 08/22/2023]
Abstract
Genotype-to-phenotype (G2P) prediction has become a mainstream paradigm to facilitate genomic selection (GS)-assisted breeding in the seed industry. Many methods have been introduced for building GS models, but their prediction precision may vary depending on species and specific traits. Therefore, evaluation of multiple models and selection of the appropriate one is crucial to effective GS analysis. Here, we present the G2P container developed for the Singularity platform, which not only contains a library of 16 state-of-the-art GS models and 13 evaluation metrics. G2P works as an integrative environment offering comprehensive, unbiased evaluation analyses of the 16 GS models, which may be run in parallel on high-performance computing clusters. Based on the evaluation outcome, G2P performs auto-ensemble algorithms that not only can automatically select the most precise models but also can integrate prediction results from multiple models. This functionality should further improve the precision of G2P prediction. Another noteworthy function is the refinement design of the training set, in which G2P optimizes the training set based on the genetic diversity analysis of a studied population. Although the training samples in the optimized set are fewer than in the original set, the prediction precision is almost equivalent to that obtained when using the whole set. This functionality is quite useful in practice, as it reduces the cost of phenotyping when constructing training population. The G2P container and source codes are freely accessible at https://g2p-env.github.io/.
Collapse
Affiliation(s)
- Qian Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shan Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Tong Li
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Zhixu Qiu
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Ran Fu
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Chuang Ma
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shuqin Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Qian Cheng
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
4
|
Difabachew YF, Frisch M, Langstroff AL, Stahl A, Wittkop B, Snowdon RJ, Koch M, Kirchhoff M, Cselényi L, Wolf M, Förster J, Weber S, Okoye UJ, Zenke-Philippi C. Genomic prediction with haplotype blocks in wheat. FRONTIERS IN PLANT SCIENCE 2023; 14:1168547. [PMID: 37229104 PMCID: PMC10203549 DOI: 10.3389/fpls.2023.1168547] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/17/2023] [Indexed: 05/27/2023]
Abstract
Haplotype blocks might carry additional information compared to single SNPs and have therefore been suggested for use as independent variables in genomic prediction. Studies in different species resulted in more accurate predictions than with single SNPs in some traits but not in others. In addition, it remains unclear how the blocks should be built to obtain the greatest prediction accuracies. Our objective was to compare the results of genomic prediction with different types of haplotype blocks to prediction with single SNPs in 11 traits in winter wheat. We built haplotype blocks from marker data from 361 winter wheat lines based on linkage disequilibrium, fixed SNP numbers, fixed lengths in cM and with the R package HaploBlocker. We used these blocks together with data from single-year field trials in a cross-validation study for predictions with RR-BLUP, an alternative method (RMLA) that allows for heterogeneous marker variances, and GBLUP performed with the software GVCHAP. The greatest prediction accuracies for resistance scores for B. graminis, P. triticina, and F. graminearum were obtained with LD-based haplotype blocks while blocks with fixed marker numbers and fixed lengths in cM resulted in the greatest prediction accuracies for plant height. Prediction accuracies of haplotype blocks built with HaploBlocker were greater than those of the other methods for protein concentration and resistances scores for S. tritici, B. graminis, and P. striiformis. We hypothesize that the trait-dependence is caused by properties of the haplotype blocks that have overlapping and contrasting effects on the prediction accuracy. While they might be able to capture local epistatic effects and to detect ancestral relationships better than single SNPs, prediction accuracy might be reduced by unfavorable characteristics of the design matrices in the models that are due to their multi-allelic nature.
Collapse
Affiliation(s)
| | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| | - Anna Luise Langstroff
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | - Andreas Stahl
- Institute for Resistance Research and Stress Tolerance, Julius Kühn Institute, Quedlinburg, Germany
| | - Benjamin Wittkop
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | - Rod J. Snowdon
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | | | | | - László Cselényi
- Department of Cereal Breeding, W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany
| | - Markus Wolf
- German Seed Alliance GmbH, Holtsee, Germany
- Saaten-Union Biotec GmbH, Leopoldshöhe, Germany
| | | | - Sven Weber
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | - Uche Joshua Okoye
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| | - Carola Zenke-Philippi
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| |
Collapse
|
5
|
Wei C, Zeng H, Zhong Z, Cai X, Teng J, Liu Y, Zhao Y, Wu X, Li J, Zhang Z. Integration of non-additive genome-wide association study with a multi-tissue transcriptome analysis of growth and carcass traits in Duroc pigs. Animal 2023; 17:100817. [PMID: 37196577 DOI: 10.1016/j.animal.2023.100817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 04/03/2023] [Accepted: 04/06/2023] [Indexed: 05/19/2023] Open
Abstract
Growth and carcass traits are of economic importance in the pig production, which affect pork quality and profitability of finishing pig production. This study used whole-genome and transcriptome sequencing technologies to identify potential candidate genes affecting growth and carcass traits in Duroc pigs. The medium (50-60 k) single nucleotide polymorphism (SNP) arrays of 4 154 Duroc pigs from three populations were imputed to whole-genome sequence data, yielding 10 463 227 markers on 18 autosomes. The dominance heritabilities estimated for growth and carcass traits ranged from 0.000 ± 0.041 to 0.161 ± 0.054. Using non-additive genome-wide association study (GWAS), we identified 80 dominance quantitative trait loci for growth and carcass traits at genome-wide significance (false discovery rate < 5%), 15 of which were also detected in our additive GWAS. After fine mapping, 31 candidate genes for dominance GWAS were annotated, and 8 of them were highlighted that have been previously reported to be associated with growth and development (e.g. SNX14, RELN and ENPP2), autosomal recessive diseases (e.g. AMPH, SNX14, RELN and CACNB4) and immune response (e.g. UNC93B1 and PPM1D). By integrating the lead SNPs with RNA-seq data of 34 pig tissues from the Pig Genotype-Tissue Expression project (https://piggtex.farmgtex.org/), we found that the rs691128548, rs333063869, and rs1110730611 have significantly dominant effects for the expression of SNX14, AMPH and UNC93B1 genes in tissues related to growth and development for pig, respectively. Finally, the identified candidate genes were significantly enriched for biological processes involved in the cell and organ development, lipids catabolic process and phosphatidylinositol 3-kinase signalling (P < 0.05). These results provide new molecular markers for meat production and quality selection of pig as well as basis for deciphering the genetic mechanisms of growth and carcass traits.
Collapse
Affiliation(s)
- Chen Wei
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Haonan Zeng
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Zhanming Zhong
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Xiaodian Cai
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Jingyan Teng
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Yuqiang Liu
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Yunxiang Zhao
- School of Life Science and Engineering, Foshan University, Foshan 528225, PR China
| | - Xibo Wu
- Guangxi Guiken Yongxin Animal Husbandry Group Co. Ltd, Nanning 530000, PR China
| | - Jiaqi Li
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China
| | - Zhe Zhang
- National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China.
| |
Collapse
|
6
|
Lai X, Cao J, Lin Z. An Accelerated Maximally Split ADMM for a Class of Generalized Ridge Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:958-972. [PMID: 34437070 DOI: 10.1109/tnnls.2021.3104840] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Ridge regression (RR) has been commonly used in machine learning, but is facing computational challenges in big data applications. To meet the challenges, this article develops a highly parallel new algorithm, i.e., an accelerated maximally split alternating direction method of multipliers (A-MS-ADMM), for a class of generalized RR (GRR) that allows different regularization factors for different regression coefficients. Linear convergence of the new algorithm along with its convergence ratio is established. Optimal parameters of the algorithm for the GRR with a particular set of regularization factors are derived, and a selection scheme of the algorithm parameters for the GRR with general regularization factors is also discussed. The new algorithm is then applied in the training of single-layer feedforward neural networks. Experimental results on performance validation on real-world benchmark datasets for regression and classification and comparisons with existing methods demonstrate the fast convergence, low computational complexity, and high parallelism of the new algorithm.
Collapse
|
7
|
Nantongo JS, Potts BM, Klápště J, Graham NJ, Dungey HS, Fitzgerald H, O'Reilly-Wapstra JM. Genomic selection for resistance to mammalian bark stripping and associated chemical compounds in radiata pine. G3 (BETHESDA, MD.) 2022; 12:jkac245. [PMID: 36218439 PMCID: PMC9635650 DOI: 10.1093/g3journal/jkac245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 08/29/2022] [Indexed: 07/28/2023]
Abstract
The integration of genomic data into genetic evaluations can facilitate the rapid selection of superior genotypes and accelerate the breeding cycle in trees. In this study, 390 trees from 74 control-pollinated families were genotyped using a 36K Axiom SNP array. A total of 15,624 high-quality SNPs were used to develop genomic prediction models for mammalian bark stripping, tree height, and selected primary and secondary chemical compounds in the bark. Genetic parameters from different genomic prediction methods-single-trait best linear unbiased prediction based on a marker-based relationship matrix (genomic best linear unbiased prediction), multitrait single-step genomic best linear unbiased prediction, which integrated the marker-based and pedigree-based relationship matrices (single-step genomic best linear unbiased prediction) and the single-trait generalized ridge regression-were compared to equivalent single- or multitrait pedigree-based approaches (ABLUP). The influence of the statistical distribution of data on the genetic parameters was assessed. Results indicated that the heritability estimates were increased nearly 2-fold with genomic models compared to the equivalent pedigree-based models. Predictive accuracy of the single-step genomic best linear unbiased prediction was higher than the ABLUP for most traits. Allowing for heterogeneity in marker effects through the use of generalized ridge regression did not markedly improve predictive ability over genomic best linear unbiased prediction, arguing that most of the chemical traits are modulated by many genes with small effects. Overall, the traits with low pedigree-based heritability benefited more from genomic models compared to the traits with high pedigree-based heritability. There was no evidence that data skewness or the presence of outliers affected the genomic or pedigree-based genetic estimates.
Collapse
Affiliation(s)
- Judith S Nantongo
- Corresponding author: National Agricultural Research Organization, P.O Box 1752, Mukono, Uganda.
| | - Brad M Potts
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- ARC Training Centre for Forest Value, Hobart, TAS 7001, Australia
| | - Jaroslav Klápště
- Scion (New Zealand Forest Research Institute Ltd.), Rotorua 3046, New Zealand
| | - Natalie J Graham
- Scion (New Zealand Forest Research Institute Ltd.), Rotorua 3046, New Zealand
| | - Heidi S Dungey
- Scion (New Zealand Forest Research Institute Ltd.), Rotorua 3046, New Zealand
| | - Hugh Fitzgerald
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| | - Julianne M O'Reilly-Wapstra
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
- ARC Training Centre for Forest Value, Hobart, TAS 7001, Australia
| |
Collapse
|
8
|
Gu B, Xu A, Huo Z, Deng C, Huang H. Privacy-Preserving Asynchronous Vertical Federated Learning Algorithms for Multiparty Collaborative Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6103-6115. [PMID: 34161243 DOI: 10.1109/tnnls.2021.3072238] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The privacy-preserving federated learning for vertically partitioned (VP) data has shown promising results as the solution of the emerging multiparty joint modeling application, in which the data holders (such as government branches, private finance, and e-business companies) collaborate throughout the learning process rather than relying on a trusted third party to hold data. However, most of the existing federated learning algorithms for VP data are limited to synchronous computation. To improve the efficiency when the unbalanced computation/communication resources are common among the parties in the federated learning system, it is essential to develop asynchronous training algorithms for VP data while keeping the data privacy. In this article, we propose an asynchronous federated stochastic gradient descent (AFSGD-VP) algorithm and its two variance reduction variants, including stochastic variance reduced gradient (SVRG) and SAGA on the VP data. Moreover, we provide the convergence analyses of AFSGD-VP and its SVRG and SAGA variants under the condition of strong convexity and without any restrictions of staleness. We also discuss their model privacy, data privacy, computational complexities, and communication costs. To the best of our knowledge, AFSGD-VP and its SVRG and SAGA variants are the first asynchronous federated learning algorithms for VP data with theoretical guarantees. Extensive experimental results on a variety of VP datasets not only verify the theoretical results of AFSGD-VP and its SVRG and SAGA variants but also show that our algorithms have much higher efficiency than the corresponding synchronous algorithms.
Collapse
|
9
|
Danilevicz MF, Gill M, Anderson R, Batley J, Bennamoun M, Bayer PE, Edwards D. Plant Genotype to Phenotype Prediction Using Machine Learning. Front Genet 2022; 13:822173. [PMID: 35664329 PMCID: PMC9159391 DOI: 10.3389/fgene.2022.822173] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/07/2022] [Indexed: 12/13/2022] Open
Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
- *Correspondence: David Edwards,
| |
Collapse
|
10
|
Mathew B, Hauptmann A, Léon J, Sillanpää MJ. NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:800161. [PMID: 35574107 PMCID: PMC9100816 DOI: 10.3389/fpls.2022.800161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 03/18/2022] [Indexed: 06/15/2023]
Abstract
Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dominance and epistasis effects can influence of the prediction accuracy of such models. Recently machine learning (ML) methods have been widely applied for prediction in both animal and plant breeding programs. In this study, we propose a new algorithm for genomic prediction which is based on neural networks, but incorporates classical elements of LASSO. Our new method is able to account for the local epistasis (higher order interaction between the neighboring markers) in the prediction. We compare the prediction accuracy of our new method with the most commonly used prediction methods, such as BayesA, BayesB, Bayesian Lasso (BL), genomic BLUP and Elastic Net (EN) using the heterogenous stock mouse and rice field data sets.
Collapse
Affiliation(s)
- Boby Mathew
- Bayer CropScience, Monheim am Rhein, Germany
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Andreas Hauptmann
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- Department of Computer Science, University College London, London, United Kingdom
| | - Jens Léon
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| |
Collapse
|
11
|
Estimating genetic variance contributed by a quantitative trait locus: A random model approach. PLoS Comput Biol 2022; 18:e1009923. [PMID: 35275920 PMCID: PMC8942241 DOI: 10.1371/journal.pcbi.1009923] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 03/23/2022] [Accepted: 02/13/2022] [Indexed: 11/20/2022] Open
Abstract
Detecting quantitative trait loci (QTL) and estimating QTL variances (represented by the squared QTL effects) are two main goals of QTL mapping and genome-wide association studies (GWAS). However, there are issues associated with estimated QTL variances and such issues have not attracted much attention from the QTL mapping community. Estimated QTL variances are usually biased upwards due to estimation being associated with significance tests. The phenomenon is called the Beavis effect. However, estimated variances of QTL without significance tests can also be biased upwards, which cannot be explained by the Beavis effect; rather, this bias is due to the fact that QTL variances are often estimated as the squares of the estimated QTL effects. The parameters are the QTL effects and the estimated QTL variances are obtained by squaring the estimated QTL effects. This square transformation failed to incorporate the errors of estimated QTL effects into the transformation. The consequence is biases in estimated QTL variances. To correct the biases, we can either reformulate the QTL model by treating the QTL effect as random and directly estimate the QTL variance (as a variance component) or adjust the bias by taking into account the error of the estimated QTL effect. A moment method of estimation has been proposed to correct the bias. The method has been validated via Monte Carlo simulation studies. The method has been applied to QTL mapping for the 10-week-body-weight trait from an F2 mouse population. One of the goals of QTL mapping and GWAS is to quantify the size of a QTL, which is measured by the QTL variance or the proportion of trait variance explained by the QTL. The effect of a QTL appears in a linear or linear mixed model as a regression coefficient and defined as a fixed effect. The estimated QTL variance in conventional QTL mapping studies takes the square of the estimated QTL effect. This is a biased estimate of QTL variance. An unbiased estimate of the QTL variance should be obtained by (1) treating the QTL effect as random and estimating the variance of the random effect or (2) adjusting the squared estimated QTL effect by the squared estimation error. We proved that the two methods are identical. We further proved that the usual R2 (goodness of fit) in regression analysis is equivalent to the biased QTL heritability while the adjusted R2 is equivalent to the bias corrected QTL heritability.
Collapse
|
12
|
Shook JM, Lourenco D, Singh AK. PATRIOT: A Pipeline for Tracing Identity-by-Descent for Chromosome Segments to Improve Genomic Prediction in Self-Pollinating Crop Species. FRONTIERS IN PLANT SCIENCE 2021; 12:676269. [PMID: 34737757 PMCID: PMC8562157 DOI: 10.3389/fpls.2021.676269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 09/01/2021] [Indexed: 06/13/2023]
Abstract
The lowering genotyping cost is ushering in a wider interest and adoption of genomic prediction and selection in plant breeding programs worldwide. However, improper conflation of historical and recent linkage disequilibrium between markers and genes restricts high accuracy of genomic prediction (GP). Multiple ancestors may share a common haplotype surrounding a gene, without sharing the same allele of that gene. This prevents parsing out genetic effects associated with the underlying allele of that gene among the set of ancestral haplotypes. We present "Parental Allele Tracing, Recombination Identification, and Optimal predicTion" (i.e., PATRIOT) approach that utilizes marker data to allow for a rapid identification of lines carrying specific alleles, increases the accuracy of genomic relatedness and diversity estimates, and improves genomic prediction. Leveraging identity-by-descent relationships, PATRIOT showed an improvement in GP accuracy by 16.6% relative to the traditional rrBLUP method. This approach will help to increase the rate of genetic gain and allow available information to be more effectively utilized within breeding programs.
Collapse
Affiliation(s)
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Asheesh K. Singh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| |
Collapse
|
13
|
Zhang J, Liu F, Reif JC, Jiang Y. On the use of GBLUP and its extension for GWAS with additive and epistatic effects. G3-GENES GENOMES GENETICS 2021; 11:6237487. [PMID: 33871030 PMCID: PMC8495923 DOI: 10.1093/g3journal/jkab122] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 04/04/2021] [Indexed: 11/29/2022]
Abstract
Genomic best linear unbiased prediction (GBLUP) is the most widely used model for genome-wide predictions. Interestingly, it is also possible to perform genome-wide association studies (GWAS) based on GBLUP. Although the estimated marker effects in GBLUP are shrunken and the conventional test based on such effects has low power, it was observed that a modified test statistic can be produced and the result of test was identical to a standard GWAS model. Later, a mathematical proof was given for the special case that there is no fixed covariate in GBLUP. Since then, the new approach has been called “GWAS by GBLUP”. Nevertheless, covariates such as environmental and subpopulation effects are very common in GBLUP. Thus, it is necessary to confirm the equivalence in the general case. Recently, the concept was generalized to GWAS for epistatic effects and the new approach was termed rapid epistatic mixed-model association analysis (REMMA) because it greatly improved the computational efficiency. However, the relationship between REMMA and the standard GWAS model has not been investigated. In this study, we first provided a general mathematical proof of the equivalence between “GWAS by GBLUP” and the standard GWAS model for additive effects. Then, we compared REMMA with the standard GWAS model for epistatic effects by a theoretical investigation and by empirical data analyses. We hypothesized that the similarity of the two models is influenced by the relative contribution of additive and epistatic effects to the phenotypic variance, which was verified by empirical and simulation studies.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Fang Liu
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Jochen C Reif
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Yong Jiang
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| |
Collapse
|
14
|
Wang D, Tang H, Liu JF, Xu S, Zhang Q, Ning C. Rapid epistatic mixed-model association studies by controlling multiple polygenic effects. Bioinformatics 2021; 36:4833-4837. [PMID: 32614415 DOI: 10.1093/bioinformatics/btaa610] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/21/2020] [Accepted: 06/24/2020] [Indexed: 12/19/2022] Open
Abstract
SUMMARY We have developed a rapid mixed model algorithm for exhaustive genome-wide epistatic association analysis by controlling multiple polygenic effects. Our model can simultaneously handle additive by additive epistasis, dominance by dominance epistasis and additive by dominance epistasis, and account for intrasubject fluctuations due to individuals with repeated records. Furthermore, we suggest a simple but efficient approximate algorithm, which allows the examination of all pairwise interactions in a remarkably fast manner of linear with population size. Simulation studies are performed to investigate the properties of REMMAX. Application to publicly available yeast and human data has showed that our mixed model-based method has similar performance with simple linear model on computational efficiency. It took less than 40 h for the pairwise analysis of 5000 individuals genotyped with roughly 350 000 SNPs with five threads on Intel Xeon E5 2.6 GHz CPU. AVAILABILITY AND IMPLEMENTATION Source codes are freely available at https://github.com/chaoning/GMAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dan Wang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| | - Hui Tang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| | - Jian-Feng Liu
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Shizhong Xu
- Department of Botany and Plant Science, University of California, Riverside, CA 92521, USA
| | - Qin Zhang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| | - Chao Ning
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, China
| |
Collapse
|
15
|
Moretti R, Soglia D, Chessa S, Sartore S, Finocchiaro R, Rasero R, Sacchi P. Identification of SNPs Associated with Somatic Cell Score in Candidate Genes in Italian Holstein Friesian Bulls. Animals (Basel) 2021; 11:366. [PMID: 33535694 PMCID: PMC7912858 DOI: 10.3390/ani11020366] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 01/01/2023] Open
Abstract
Mastitis is an infectious disease affecting the mammary gland, leading to inflammatory reactions and to heavy economic losses due to milk production decrease. One possible way to tackle the antimicrobial resistance issue stemming from antimicrobial therapy is to select animals with a genetic resistance to this disease. Therefore, aim of this study was to analyze the genetic variability of the SNPs found in candidate genes related to mastitis resistance in Holstein Friesian bulls. Target regions were amplified, sequenced by Next-Generation Sequencing technology on the Illumina® MiSeq, and then analyzed to find correlation with mastitis related phenotypes in 95 Italian Holstein bulls chosen with the aid of a selective genotyping approach. On a total of 557 detected mutations, 61 showed different genotype distribution in the tails of the deregressed EBVs for SCS and 15 were identified as significantly associated with the phenotype using two different approaches. The significant SNPs were identified in intergenic or intronic regions of six genes, known to be key components in the immune system (namely CXCR1, DCK, NOD2, MBL2, MBL1 and M-SAA3.2). These SNPs could be considered as candidates for a future genetic selection for mastitis resistance, although further studies are required to assess their presence in other dairy cattle breeds and their possible negative correlation with other traits.
Collapse
Affiliation(s)
- Riccardo Moretti
- Department of Veterinary Science, University of Turin, 10095 Turin, Italy; (R.M.); (D.S.); (S.S.); (R.R.); (P.S.)
| | - Dominga Soglia
- Department of Veterinary Science, University of Turin, 10095 Turin, Italy; (R.M.); (D.S.); (S.S.); (R.R.); (P.S.)
| | - Stefania Chessa
- Department of Veterinary Science, University of Turin, 10095 Turin, Italy; (R.M.); (D.S.); (S.S.); (R.R.); (P.S.)
| | - Stefano Sartore
- Department of Veterinary Science, University of Turin, 10095 Turin, Italy; (R.M.); (D.S.); (S.S.); (R.R.); (P.S.)
| | - Raffaella Finocchiaro
- Associazione Nazionale Allevatori Razza Frisona e Jersey Italiana—ANAFIJ, 26100 Cremona, Italy;
| | - Roberto Rasero
- Department of Veterinary Science, University of Turin, 10095 Turin, Italy; (R.M.); (D.S.); (S.S.); (R.R.); (P.S.)
| | - Paola Sacchi
- Department of Veterinary Science, University of Turin, 10095 Turin, Italy; (R.M.); (D.S.); (S.S.); (R.R.); (P.S.)
| |
Collapse
|
16
|
Wang M, Li R, Xu S. Deshrinking ridge regression for genome-wide association studies. Bioinformatics 2021; 36:4154-4162. [PMID: 32379866 DOI: 10.1093/bioinformatics/btaa345] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 04/21/2020] [Accepted: 04/29/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) are still the primary steps toward gene discovery. The urgency is more obvious in the big data era when GWAS are conducted simultaneously for thousand traits, e.g. transcriptomic and metabolomic traits. Efficient mixed model association (EMMA) and genome-wide efficient mixed model association (GEMMA) are the widely used methods for GWAS. An algorithm with high computational efficiency is badly needed. It is interesting to note that the test statistics of the ordinary ridge regression (ORR) have the same patterns across the genome as those obtained from the EMMA method. However, ORR has never been used for GWAS due to its severe shrinkage on the estimated effects and the test statistics. RESULTS We introduce a degree of freedom for each marker effect obtained from ORR and use it to deshrink both the estimated effect and the standard error so that the Wald test of ORR is brought back to the same level as that of EMMA. The new method is called deshrinking ridge regression (DRR). By evaluating the methods under three different model sizes (small, medium and large), we demonstrate that DRR is more generalized for all model sizes than EMMA, which only works for medium and large models. Furthermore, DRR detect all markers in a simultaneous manner instead of scanning one marker at a time. As a result, the computational time complexity of DRR is much simpler than EMMA and about m (number of genetic variants) times simpler than that of GEMMA when the sample size is way smaller than the number of markers. CONTACT shizhong.xu@ucr.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meiyue Wang
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Ruidong Li
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| |
Collapse
|
17
|
George AW, Verbyla A, Bowden J. Eagle: multi-locus association mapping on a genome-wide scale made routine. Bioinformatics 2020; 36:1509-1516. [PMID: 31596455 DOI: 10.1093/bioinformatics/btz759] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 08/19/2019] [Accepted: 10/02/2019] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION We present Eagle, a new method for multi-locus association mapping. The motivation for developing Eagle was to make multi-locus association mapping 'easy' and the method-of-choice. Eagle's strengths are that it (i) is considerably more powerful than single-locus association mapping, (ii) does not suffer from multiple testing issues, (iii) gives results that are immediately interpretable and (iv) has a computational footprint comparable to single-locus association mapping. RESULTS By conducting a large simulation study, we will show that Eagle finds true and avoids false single-nucleotide polymorphism trait associations better than competing single- and multi-locus methods. We also analyze data from a published mouse study. Eagle found over 50% more validated findings than the state-of-the-art single-locus method. AVAILABILITY AND IMPLEMENTATION Eagle has been implemented as an R package, with a browser-based Graphical User Interface for users less familiar with R. It is freely available via the CRAN website at https://cran.r-project.org. Videos, Quick Start guides, FAQs and Demos are available via the Eagle website http://eagle.r-forge.r-project.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
18
|
Ohishi M, Yanagihara H, Fujikoshi Y. A fast algorithm for optimizing ridge parameters in a generalized ridge regression by minimizing a model selection criterion. J Stat Plan Inference 2020. [DOI: 10.1016/j.jspi.2019.04.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
19
|
An B, Gao X, Chang T, Xia J, Wang X, Miao J, Xu L, Zhang L, Chen Y, Li J, Xu S, Gao H. Genome-wide association studies using binned genotypes. Heredity (Edinb) 2019; 124:288-298. [PMID: 31641238 DOI: 10.1038/s41437-019-0279-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/25/2019] [Accepted: 09/26/2019] [Indexed: 01/23/2023] Open
Abstract
Linear mixed models (LMM) that tests trait association one marker at a time have been the most popular methods for genome-wide association studies. However, this approach has potential pitfalls: over conservativeness after Bonferroni correction, ignorance of linkage disequilibrium (LD) between neighboring markers, and power reduction due to overfitting SNP effects. So, multiple locus models that can simultaneously estimate and test all markers in the genome are more appropriate. Based on the multiple locus models, we proposed a bin model that combines markers into bins based on their LD relationships. A bin is treated as a new synthetic marker and we detect the associations between bins and traits. Since the number of bins can be substantially smaller than the number of markers, a penalized multiple regression method can be adopted by fitting all bins to a single model. We developed an innovative method to bin the neighboring markers and used the least absolute shrinkage and selection operator (LASSO) method. We compared BIN-Lasso with SNP-Lasso and Q + K-LMM in a simulation experiment, and showed that the new method is more powerful with less Type I error than the other two methods. We also applied the bin model to a Chinese Simmental beef cattle population for bone weight association study. The new method identified more significant associations than the classical LMM. The bin model is a new dimension reduction technique that takes advantage of biological information (i.e., LD). The new method will be a significant breakthrough in associative genomics in the big data era.
Collapse
Affiliation(s)
- Bingxing An
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tianpeng Chang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jiangwei Xia
- Institute of Basic Medical Science, Westlake Institute for Advanced Study, Hangzhou, China
| | - Xiaoqiao Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jian Miao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yan Chen
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
20
|
Abstract
The increasing amount of available biological information on the markers can be used to inform the models applied for genomic selection to improve predictions. The objective of this study was to propose a general model for genomic selection using a link function approach within the hierarchical generalized linear model framework (hglm) that can include external information on the markers. These models can be fitted using the well-established hglm package in R. We also present an R package (CodataGS) to fit these models, which is significantly faster than the hglm package. Simulated data were used to validate the proposed model. We tested categorical, continuous and combination models where the external information on the markers was related to 1) the location of the QTL on the genome with varying degree of uncertainty, 2) the relationship of the markers with the QTL calculated as the LD between them, and 3) a combination of both. The proposed models showed improved accuracies from 3.8% up to 23.2% compared to the SNP-BLUP method in a simulated population derived from a base population with 100 individuals. Moreover, the proposed categorical model was tested on a dairy cattle dataset for two traits (Milk Yield and Fat Percentage). These results also showed improved accuracy compared to SNP-BLUP, especially for the Fat% trait. The performance of the proposed models depended on the genetic architecture of the trait, as traits that deviate from the infinitesimal model benefited more from the external information. Also, the gain in accuracy depended on the degree of uncertainty of the external information provided to the model. The usefulness of these type of models is expected to increase with time as more accurate information on the markers becomes available.
Collapse
|
21
|
Soltis NE, Atwell S, Shi G, Fordyce R, Gwinner R, Gao D, Shafi A, Kliebenstein DJ. Interactions of Tomato and Botrytis cinerea Genetic Diversity: Parsing the Contributions of Host Differentiation, Domestication, and Pathogen Variation. THE PLANT CELL 2019; 31:502-519. [PMID: 30647076 PMCID: PMC6447006 DOI: 10.1105/tpc.18.00857] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 12/18/2018] [Accepted: 01/08/2019] [Indexed: 05/26/2023]
Abstract
Although the impacts of crop domestication on specialist pathogens are well known, less is known about the interaction of crop variation and generalist pathogens. To study how genetic variation within a crop affects plant resistance to generalist pathogens, we infected a collection of wild and domesticated tomato accessions with a genetically diverse population of the generalist pathogen Botrytis cinerea We quantified variation in lesion size of 97 B. cinerea genotypes (isolates) on six domesticated tomato genotypes (Solanum lycopersicum) and six wild tomato genotypes (Solanum pimpinellifolium). Lesion size was significantly affected by large effects of the host and pathogen's genotype, with a much smaller contribution of domestication. This pathogen collection also enables genome-wide association mapping of B. cinerea Genome-wide association mapping of the pathogen showed that virulence is highly polygenic and involves a diversity of mechanisms. Breeding against this pathogen would likely require the use of diverse isolates to capture all possible mechanisms. Critically, we identified a subset of B. cinerea genes where allelic variation was linked to altered virulence against wild versus domesticated tomato, as well as loci that could handle both groups. This generalist pathogen already has a large collection of allelic variation that must be considered when designing a breeding program.
Collapse
Affiliation(s)
- Nicole E Soltis
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
| | - Susanna Atwell
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
| | - Gongjun Shi
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
- Department of Plant Pathology, North Dakota State University, Fargo, North Dakota, 58102
| | - Rachel Fordyce
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
| | - Raoni Gwinner
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
- Department of Agriculture, Universidade Federal de Lavras, Lavras MG, 37200-000, Brazil
| | - Dihan Gao
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
| | - Aysha Shafi
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
| | - Daniel J Kliebenstein
- Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, California, 95616
- DynaMo Center of Excellence, University of Copenhagen, Thorvaldsensvej 40, DK-1871, Frederiksberg C, Denmark
| |
Collapse
|
22
|
Fordyce RF, Soltis NE, Caseys C, Gwinner R, Corwin JA, Atwell S, Copeland D, Feusier J, Subedy A, Eshbaugh R, Kliebenstein DJ. Digital Imaging Combined with Genome-Wide Association Mapping Links Loci to Plant-Pathogen Interaction Traits. PLANT PHYSIOLOGY 2018; 178:1406-1422. [PMID: 30266748 PMCID: PMC6236616 DOI: 10.1104/pp.18.00851] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 09/18/2018] [Indexed: 05/04/2023]
Abstract
Plant resistance to generalist pathogens with broad host ranges, such as Botrytis cinerea (Botrytis), is typically quantitative and highly polygenic. Recent studies have begun to elucidate the molecular genetic basis of plant-pathogen interactions using commonly measured traits, including lesion size and/or pathogen biomass. However, with the advent of digital imaging and high-throughput phenomics, there are a large number of additional traits available to study quantitative resistance. In this study, we used high-throughput digital imaging analysis to investigate previously poorly characterized visual traits of plant-pathogen interactions related to disease resistance using the Arabidopsis (Arabidopsis thaliana)/Botrytis pathosystem. From a large collection of visual lesion trait measurements, we focused on color, shape, and size to test how these aspects of the Arabidopsis/Botrytis interaction are genetically related. Through genome-wide association mapping in Arabidopsis, we show that lesion color and shape are genetically separable traits associated with plant disease resistance. Moreover, by employing defined mutants in 23 candidate genes identified from the genome-wide association mapping, we demonstrate links between loci and each of the different plant-pathogen interaction traits. These results expand our understanding of the functional mechanisms driving plant disease resistance.
Collapse
Affiliation(s)
- Rachel F Fordyce
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Nicole E Soltis
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Celine Caseys
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Raoni Gwinner
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Jason A Corwin
- Department of Plant Sciences, University of California, Davis, California 95616
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado 80309-0334
| | - Susana Atwell
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Daniel Copeland
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Julie Feusier
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Anushriya Subedy
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Robert Eshbaugh
- Department of Plant Sciences, University of California, Davis, California 95616
| | - Daniel J Kliebenstein
- Department of Plant Sciences, University of California, Davis, California 95616
- DynaMo Center of Excellence, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| |
Collapse
|
23
|
Coombes BJ, Basu S, McGue M. A linear mixed model framework for gene-based gene-environment interaction tests in twin studies. Genet Epidemiol 2018; 42:648-663. [PMID: 30203856 DOI: 10.1002/gepi.22150] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 04/25/2018] [Accepted: 04/30/2018] [Indexed: 02/03/2023]
Abstract
Interaction between genes and environments (G×E) can be well investigated in families due to the shared genes and environment among family members. However, the majority of the current tests of G×E interaction between a set of variants and an environment are only suitable for studies with unrelated subjects. In this paper, we extend several G×E interaction tests to a linear mixed model framework to study interaction between a set of correlated environments and a candidate gene in families. The correlated environments can either be modeled separately or jointly in one model. We demonstrate theoretically that the tests developed by modeling correlated environments separately are valid and present a computationally fast alternative to detect G×E interaction in families. For either strategy, we propose treating the genetic main effects as a random effect to reduce the number of main-effect parameters and thus improve the power to detect interactions. Additionally, we propose a generalization of a test of interaction that adaptively sums the interactions using a sequential algorithm. This generalized set of tests, referred to as the sequential algorithm for the sum of powered score (Seq-SPU) family of tests, can be expressed as a weighted version of the SPU. We find that the adaptive version of our test, Seq-aSPU, can outperform aSPU in cases where the interactions effects are in opposite directions. We applied these methods to the Minnesota Center for Twin and Family Research data set and found one significant gene in interaction with four psychosocial environmental factors affecting the alcohol consumption among the twins.
Collapse
Affiliation(s)
- Brandon J Coombes
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Matt McGue
- Department of Psychology, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
24
|
Ning C, Wang D, Kang H, Mrode R, Zhou L, Xu S, Liu JF. A rapid epistatic mixed-model association analysis by linear retransformations of genomic estimated values. Bioinformatics 2018; 34:1817-1825. [PMID: 29342229 PMCID: PMC5972602 DOI: 10.1093/bioinformatics/bty017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 01/07/2018] [Accepted: 01/10/2018] [Indexed: 12/16/2022] Open
Abstract
Motivation Epistasis provides a feasible way for probing potential genetic mechanism of complex traits. However, time-consuming computation challenges successful detection of interaction in practice, especially when linear mixed model (LMM) is used to control type I error in the presence of population structure and cryptic relatedness. Results A rapid epistatic mixed-model association analysis (REMMA) method was developed to overcome computational limitation. This method first estimates individuals' epistatic effects by an extended genomic best linear unbiased prediction (EG-BLUP) model with additive and epistatic kinship matrix, then pairwise interaction effects are obtained by linear retransformations of individuals' epistatic effects. Simulation studies showed that REMMA could control type I error and increase statistical power in detecting epistatic QTNs in comparison with existing LMM-based FaST-LMM. We applied REMMA to two real datasets, a mouse dataset and the Wellcome Trust Case Control Consortium (WTCCC) data. Application to the mouse data further confirmed the performance of REMMA in controlling type I error. For the WTCCC data, we found most epistatic QTNs for type 1 diabetes (T1D) located in a major histocompatibility complex (MHC) region, from which a large interacting network with 12 hub genes (interacting with ten or more genes) was established. Availability and implementation Our REMMA method can be freely accessed at https://github.com/chaoning/REMMA. Contact liujf@cau.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Ning
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Dan Wang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Huimin Kang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Raphael Mrode
- Animal Biosciences, International Livestock Institute, Nairobi, Kenya
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shizhong Xu
- Department of Botany and Plant Science, University of California, Riverside, CA, USA
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
25
|
|
26
|
A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity (Edinb) 2017; 120:356-368. [PMID: 29238077 PMCID: PMC5842222 DOI: 10.1038/s41437-017-0023-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Revised: 10/13/2017] [Accepted: 10/23/2017] [Indexed: 12/15/2022] Open
Abstract
Single nucleotide polymorphism (SNP)-heritability estimation is an important topic in several research fields, including animal, plant and human genetics, as well as in ecology. Linear mixed model estimation of SNP-heritability uses the structures of genomic relationships between individuals, which is constructed from genome-wide sets of SNP-markers that are generally weighted equally in their contributions. Proposed methods to handle dependence between SNPs include, “thinning” the marker set by linkage disequilibrium (LD)-pruning, the use of haplotype-tagging of SNPs, and LD-weighting of the SNP-contributions. For improved estimation, we propose a new conceptual framework for genomic relationship matrix, in which Mahalanobis distance-based LD-correction is used in a linear mixed model estimation of SNP-heritability. The superiority of the presented method is illustrated and compared to mixed-model analyses using a VanRaden genomic relationship matrix, a matrix used by GCTA and a matrix employing LD-weighting (as implemented in the LDAK software) in simulated (using real human, rice and cattle genotypes) and real (maize, rice and mice) datasets. Despite of the computational difficulties, our results suggest that by using the proposed method one can improve the accuracy of SNP-heritability estimates in datasets with high LD.
Collapse
|
27
|
Thistlethwaite FR, Ratcliffe B, Klápště J, Porth I, Chen C, Stoehr MU, El-Kassaby YA. Genomic prediction accuracies in space and time for height and wood density of Douglas-fir using exome capture as the genotyping platform. BMC Genomics 2017; 18:930. [PMID: 29197325 PMCID: PMC5712148 DOI: 10.1186/s12864-017-4258-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 11/01/2017] [Indexed: 11/11/2022] Open
Abstract
Background Genomic selection (GS) can offer unprecedented gains, in terms of cost efficiency and generation turnover, to forest tree selective breeding; especially for late expressing and low heritability traits. Here, we used: 1) exome capture as a genotyping platform for 1372 Douglas-fir trees representing 37 full-sib families growing on three sites in British Columbia, Canada and 2) height growth and wood density (EBVs), and deregressed estimated breeding values (DEBVs) as phenotypes. Representing models with (EBVs) and without (DEBVs) pedigree structure. Ridge regression best linear unbiased predictor (RR-BLUP) and generalized ridge regression (GRR) were used to assess their predictive accuracies over space (within site, cross-sites, multi-site, and multi-site to single site) and time (age-age/ trait-trait). Results The RR-BLUP and GRR models produced similar predictive accuracies across the studied traits. Within-site GS prediction accuracies with models trained on EBVs were high (RR-BLUP: 0.79–0.91 and GRR: 0.80–0.91), and were generally similar to the multi-site (RR-BLUP: 0.83–0.91, GRR: 0.83–0.91) and multi-site to single-site predictive accuracies (RR-BLUP: 0.79–0.92, GRR: 0.79–0.92). Cross-site predictions were surprisingly high, with predictive accuracies within a similar range (RR-BLUP: 0.79–0.92, GRR: 0.78–0.91). Height at 12 years was deemed the earliest acceptable age at which accurate predictions can be made concerning future height (age-age) and wood density (trait-trait). Using DEBVs reduced the accuracies of all cross-validation procedures dramatically, indicating that the models were tracking pedigree (family means), rather than marker-QTL LD. Conclusions While GS models’ prediction accuracies were high, the main driving force was the pedigree tracking rather than LD. It is likely that many more markers are needed to increase the chance of capturing the LD between causal genes and markers.
Collapse
Affiliation(s)
- Frances R Thistlethwaite
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Blaise Ratcliffe
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Jaroslav Klápště
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada.,Scion (New Zealand Forest Research Institute Ltd.), 49 Sala Street, Whakarewarewa, Rotorua, 3046, New Zealand.,Department of Genetics and Physiology of Forest Trees, Faculty of Forestry and Wood Sciences, Czech University of Life Sciences Prague, Kamycka 129, 165 21, Praha 6, Czech Republic
| | - Ilga Porth
- Département des sciences du bois et de la forêt, Université Laval, QC, Québec, G1V 0A6, Canada
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078-3035, USA
| | - Michael U Stoehr
- British Columbia Ministry of Forests, Lands and Natural Resource Operations, Victoria, BC, V8W 9C2, Canada
| | - Yousry A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
28
|
Mazo Lopera MA, Coombes BJ, de Andrade M. An Efficient Test for Gene-Environment Interaction in Generalized Linear Mixed Models with Family Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2017; 14:ijerph14101134. [PMID: 28953253 PMCID: PMC5664635 DOI: 10.3390/ijerph14101134] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2017] [Revised: 09/20/2017] [Accepted: 09/25/2017] [Indexed: 02/07/2023]
Abstract
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma (PPARG) gene associated with diabetes.
Collapse
Affiliation(s)
- Mauricio A Mazo Lopera
- School of Statistics, National University of Colombia, Medellín, Antioquia 050022, Colombia.
- Departament of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | - Brandon J Coombes
- Departament of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | - Mariza de Andrade
- Departament of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| |
Collapse
|
29
|
|
30
|
An efficient method to handle the 'large p, small n' problem for genomewide association studies using Haseman-Elston regression. J Genet 2017; 95:847-852. [PMID: 27994183 DOI: 10.1007/s12041-016-0705-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The 'large p, small n' problem in genomewide association studies (GWAS) is an important subject in genetic studies. Many approaches have been proposed for this issue, but none of them successfully combine the Haseman-Elston (H-E) regression with sliding-window scan approaches in GWAS. In this article, we extended H-E regression to GWAS, and replaced original data with different measurements of phenotype of sib pairs. Meanwhile, we also applied hidden Markov model to infer identity by state. Using subsequent simulation studies, we found that it had higher statistical power than the corresponding single-marker association studies. The advantage of the H-E regression was also sufficient to capture about 48.01% of the quantitative trait locus (QTL). Meanwhile, the results show that the power decreases with the increase in the number of QTLs, and the power of H-E regression is sensitive to heritability.
Collapse
|
31
|
Angelovici R, Batushansky A, Deason N, Gonzalez-Jorge S, Gore MA, Fait A, DellaPenna D. Network-Guided GWAS Improves Identification of Genes Affecting Free Amino Acids. PLANT PHYSIOLOGY 2017; 173:872-886. [PMID: 27872244 PMCID: PMC5210728 DOI: 10.1104/pp.16.01287] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 11/16/2016] [Indexed: 05/18/2023]
Abstract
Amino acids are essential for proper growth and development in plants. Amino acids serve as building blocks for proteins but also are important for responses to stress and the biosynthesis of numerous essential compounds. In seed, the pool of free amino acids (FAAs) also contributes to alternative energy, desiccation, and seed vigor; thus, manipulating FAA levels can significantly impact a seed's nutritional qualities. While genome-wide association studies (GWAS) on branched-chain amino acids have identified some regulatory genes controlling seed FAAs, the genetic regulation of FAA levels, composition, and homeostasis in seeds remains mostly unresolved. Hence, we performed GWAS on 18 FAAs from a 313-ecotype Arabidopsis (Arabidopsis thaliana) association panel. Specifically, GWAS was performed on 98 traits derived from known amino acid metabolic pathways (approach 1) and then on 92 traits generated from an unbiased correlation-based metabolic network analysis (approach 2), and the results were compared. The latter approach facilitated the discovery of additional novel metabolic interactions and single-nucleotide polymorphism-trait associations not identified by the former approach. The most prominent network-guided GWAS signal was for a histidine (His)-related trait in a region containing two genes: a cationic amino acid transporter (CAT4) and a polynucleotide phosphorylase resistant to inhibition with fosmidomycin. A reverse genetics approach confirmed CAT4 to be responsible for the natural variation of His-related traits across the association panel. Given that His is a semiessential amino acid and a potent metal chelator, CAT4 orthologs could be considered as candidate genes for seed quality biofortification in crop plants.
Collapse
Affiliation(s)
- Ruthie Angelovici
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.);
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.);
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.);
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| | - Albert Batushansky
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.)
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.)
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.)
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| | - Nicholas Deason
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.)
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.)
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.)
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| | - Sabrina Gonzalez-Jorge
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.)
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.)
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.)
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| | - Michael A Gore
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.)
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.)
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.)
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| | - Aaron Fait
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.)
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.)
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.)
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| | - Dean DellaPenna
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (R.A., A.B.)
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (N.D., S.G.-J., D.D.)
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom (S.G.-J.)
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York 14854 (M.A.G.); and
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel 84990 (A.F.)
| |
Collapse
|
32
|
Li W, Liu H, Yang P, Xie W. Supporting Regularized Logistic Regression Privately and Efficiently. PLoS One 2016; 11:e0156479. [PMID: 27271738 PMCID: PMC4894560 DOI: 10.1371/journal.pone.0156479] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 04/18/2016] [Indexed: 12/03/2022] Open
Abstract
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Collapse
Affiliation(s)
- Wenfa Li
- Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, 100101, China
| | - Hongzhe Liu
- Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, 100101, China
| | - Peng Yang
- Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, 100101, China
| | - Wei Xie
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, United States of America
- * E-mail:
| |
Collapse
|
33
|
Accuracy of Genomic Prediction in Switchgrass (Panicum virgatum L.) Improved by Accounting for Linkage Disequilibrium. G3-GENES GENOMES GENETICS 2016; 6:1049-62. [PMID: 26869619 PMCID: PMC4825640 DOI: 10.1534/g3.115.024950] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS) is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near future. In this study, we empirically assessed prediction procedures for genomic selection in two different populations, consisting of 137 and 110 half-sib families of switchgrass, tested in two locations in the United States for three agronomic traits: dry matter yield, plant height, and heading date. Marker data were produced for the families’ parents by exome capture sequencing, generating up to 141,030 polymorphic markers with available genomic-location and annotation information. We evaluated prediction procedures that varied not only by learning schemes and prediction models, but also by the way the data were preprocessed to account for redundancy in marker information. More complex genomic prediction procedures were generally not significantly more accurate than the simplest procedure, likely due to limited population sizes. Nevertheless, a highly significant gain in prediction accuracy was achieved by transforming the marker data through a marker correlation matrix. Our results suggest that marker-data transformations and, more generally, the account of linkage disequilibrium among markers, offer valuable opportunities for improving prediction procedures in GS. Some of the achieved prediction accuracies should motivate implementation of GS in switchgrass breeding programs.
Collapse
|
34
|
Kooke R, Kruijer W, Bours R, Becker F, Kuhn A, van de Geest H, Buntjer J, Doeswijk T, Guerra J, Bouwmeester H, Vreugdenhil D, Keurentjes JJB. Genome-Wide Association Mapping and Genomic Prediction Elucidate the Genetic Architecture of Morphological Traits in Arabidopsis. PLANT PHYSIOLOGY 2016; 170:2187-203. [PMID: 26869705 PMCID: PMC4825126 DOI: 10.1104/pp.15.00997] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 02/11/2016] [Indexed: 05/05/2023]
Abstract
Quantitative traits in plants are controlled by a large number of genes and their interaction with the environment. To disentangle the genetic architecture of such traits, natural variation within species can be explored by studying genotype-phenotype relationships. Genome-wide association studies that link phenotypes to thousands of single nucleotide polymorphism markers are nowadays common practice for such analyses. In many cases, however, the identified individual loci cannot fully explain the heritability estimates, suggesting missing heritability. We analyzed 349 Arabidopsis accessions and found extensive variation and high heritabilities for different morphological traits. The number of significant genome-wide associations was, however, very low. The application of genomic prediction models that take into account the effects of all individual loci may greatly enhance the elucidation of the genetic architecture of quantitative traits in plants. Here, genomic prediction models revealed different genetic architectures for the morphological traits. Integrating genomic prediction and association mapping enabled the assignment of many plausible candidate genes explaining the observed variation. These genes were analyzed for functional and sequence diversity, and good indications that natural allelic variation in many of these genes contributes to phenotypic variation were obtained. For ACS11, an ethylene biosynthesis gene, haplotype differences explaining variation in the ratio of petiole and leaf length could be identified.
Collapse
Affiliation(s)
- Rik Kooke
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Willem Kruijer
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Ralph Bours
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Frank Becker
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - André Kuhn
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Henri van de Geest
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Jaap Buntjer
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Timo Doeswijk
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - José Guerra
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Harro Bouwmeester
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Dick Vreugdenhil
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| | - Joost J B Keurentjes
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., R.B., A.K., H.B., D.V.); Laboratory of Genetics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., F.B., J.J.B.K.); Centre for Biosystems Genomics, Wageningen Campus, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (R.K., H.v.d.G., D.V., J.J.B.K); Biometris, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (W.K.); PRI Bioinformatics, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands (H.v.d.G.); and Keygene, Agro Business Park 90, 6708 PW Wageningen, the Netherlands (J.B., T.D., J.G.)
| |
Collapse
|
35
|
Corwin JA, Copeland D, Feusier J, Subedy A, Eshbaugh R, Palmer C, Maloof J, Kliebenstein DJ. The Quantitative Basis of the Arabidopsis Innate Immune System to Endemic Pathogens Depends on Pathogen Genetics. PLoS Genet 2016; 12:e1005789. [PMID: 26866607 PMCID: PMC4750985 DOI: 10.1371/journal.pgen.1005789] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 12/16/2015] [Indexed: 01/19/2023] Open
Abstract
The most established model of the eukaryotic innate immune system is derived from examples of large effect monogenic quantitative resistance to pathogens. However, many host-pathogen interactions involve many genes of small to medium effect and exhibit quantitative resistance. We used the Arabidopsis-Botrytis pathosystem to explore the quantitative genetic architecture underlying host innate immune system in a population of Arabidopsis thaliana. By infecting a diverse panel of Arabidopsis accessions with four phenotypically and genotypically distinct isolates of the fungal necrotroph B. cinerea, we identified a total of 2,982 genes associated with quantitative resistance using lesion area and 3,354 genes associated with camalexin production as measures of the interaction. Most genes were associated with resistance to a specific Botrytis isolate, which demonstrates the influence of pathogen genetic variation in analyzing host quantitative resistance. While known resistance genes, such as receptor-like kinases (RLKs) and nucleotide-binding site leucine-rich repeat proteins (NLRs), were found to be enriched among associated genes, they only account for a small fraction of the total genes associated with quantitative resistance. Using publically available co-expression data, we condensed the quantitative resistance associated genes into co-expressed gene networks. GO analysis of these networks implicated several biological processes commonly connected to disease resistance, including defense hormone signaling and ROS production, as well as novel processes, such as leaf development. Validation of single gene T-DNA knockouts in a Col-0 background demonstrate a high success rate (60%) when accounting for differences in environmental and Botrytis genetic variation. This study shows that the genetic architecture underlying host innate immune system is extremely complex and is likely able to sense and respond to differential virulence among pathogen genotypes.
Collapse
Affiliation(s)
- Jason A. Corwin
- Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California - Davis, Davis, California, United States of America
| | - Daniel Copeland
- Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California - Davis, Davis, California, United States of America
| | - Julie Feusier
- Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California - Davis, Davis, California, United States of America
| | - Anushriya Subedy
- Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California - Davis, Davis, California, United States of America
| | - Robert Eshbaugh
- Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California - Davis, Davis, California, United States of America
| | - Christine Palmer
- Department of Plant Biology, College of Biological Sciences, University of California - Davis, Davis, California, United States of America
| | - Julin Maloof
- Department of Plant Biology, College of Biological Sciences, University of California - Davis, Davis, California, United States of America
| | - Daniel J. Kliebenstein
- Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California - Davis, Davis, California, United States of America
- DynaMo Center of Excellence, University of Copenhagen, Frederiksberg, Denmark
| |
Collapse
|
36
|
Francisco M, Joseph B, Caligagan H, Li B, Corwin JA, Lin C, Kerwin RE, Burow M, Kliebenstein DJ. Genome Wide Association Mapping in Arabidopsis thaliana Identifies Novel Genes Involved in Linking Allyl Glucosinolate to Altered Biomass and Defense. FRONTIERS IN PLANT SCIENCE 2016; 7:1010. [PMID: 27462337 PMCID: PMC4940622 DOI: 10.3389/fpls.2016.01010] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 06/27/2016] [Indexed: 05/17/2023]
Abstract
A key limitation in modern biology is the ability to rapidly identify genes underlying newly identified complex phenotypes. Genome wide association studies (GWAS) have become an increasingly important approach for dissecting natural variation by associating phenotypes with genotypes at a genome wide level. Recent work is showing that the Arabidopsis thaliana defense metabolite, allyl glucosinolate (GSL), may provide direct feedback regulation, linking defense metabolism outputs to the growth, and defense responses of the plant. However, there is still a need to identify genes that underlie this process. To start developing a deeper understanding of the mechanism(s) that modulate the ability of exogenous allyl GSL to alter growth and defense, we measured changes in plant biomass and defense metabolites in a collection of natural 96 A. thaliana accessions fed with 50 μM of allyl GSL. Exogenous allyl GSL was introduced exclusively to the roots and the compound transported to the leaf leading to a wide range of heritable effects upon plant biomass and endogenous GSL accumulation. Using natural variation we conducted GWAS to identify a number of new genes which potentially control allyl responses in various plant processes. This is one of the first instances in which this approach has been successfully utilized to begin dissecting a novel phenotype to the underlying molecular/polygenic basis.
Collapse
Affiliation(s)
- Marta Francisco
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
- Group of Genetics, Breeding and Biochemistry of Brassicas, Department of Plant Genetics, Misión Biológica de Galicia, Spanish Council for Scientific ResearchPontevedra, Spain
| | - Bindu Joseph
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
| | - Hart Caligagan
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
| | - Baohua Li
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
| | - Jason A. Corwin
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
| | - Catherine Lin
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
| | - Rachel E. Kerwin
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
| | - Meike Burow
- DynaMo Center, University of CopenhagenCopenhagen, Denmark
| | - Daniel J. Kliebenstein
- Department of Plant Sciences, University of California, DavisDavis, CA, USA
- DynaMo Center, University of CopenhagenCopenhagen, Denmark
- *Correspondence: Daniel J. Kliebenstein
| |
Collapse
|
37
|
Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach. Genetics 2015; 202:411-26. [PMID: 26661113 DOI: 10.1534/genetics.115.179507] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 11/19/2015] [Indexed: 01/08/2023] Open
Abstract
Genome-wide data with millions of single-nucleotide polymorphisms (SNPs) can be highly correlated due to linkage disequilibrium (LD). The ultrahigh dimensionality of big data brings unprecedented challenges to statistical modeling such as noise accumulation, the curse of dimensionality, computational burden, spurious correlations, and a processing and storing bottleneck. The traditional statistical approaches lose their power due to [Formula: see text] (n is the number of observations and p is the number of SNPs) and the complex correlation structure among SNPs. In this article, we propose an integrated distance correlation ridge regression (DCRR) approach to accommodate the ultrahigh dimensionality, joint polygenic effects of multiple loci, and the complex LD structures. Initially, a distance correlation (DC) screening approach is used to extensively remove noise, after which LD structure is addressed using a ridge penalized multiple logistic regression (LRR) model. The false discovery rate, true positive discovery rate, and computational cost were simultaneously assessed through a large number of simulations. A binary trait of Arabidopsis thaliana, the hypersensitive response to the bacterial elicitor AvrRpm1, was analyzed in 84 inbred lines (28 susceptibilities and 56 resistances) with 216,130 SNPs. Compared to previous SNP discovery methods implemented on the same data set, the DCRR approach successfully detected the causative SNP while dramatically reducing spurious associations and computational time.
Collapse
|
38
|
Dumancas GG, Ramasahayam S, Bello G, Hughes J, Kramer R. Chemometric regression techniques as emerging, powerful tools in genetic association studies. Trends Analyt Chem 2015. [DOI: 10.1016/j.trac.2015.05.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
39
|
Lachowiec J, Shen X, Queitsch C, Carlborg Ö. A Genome-Wide Association Analysis Reveals Epistatic Cancellation of Additive Genetic Variance for Root Length in Arabidopsis thaliana. PLoS Genet 2015; 11:e1005541. [PMID: 26397943 PMCID: PMC4580642 DOI: 10.1371/journal.pgen.1005541] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Accepted: 08/27/2015] [Indexed: 12/19/2022] Open
Abstract
Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. Here, we examined the genetics of Arabidopsis thaliana root length and found that the genomic narrow-sense heritability for this trait in the examined population was statistically zero. The low amount of additive genetic variance that could be captured by the genome-wide genotypes likely explains why no associations to root length could be found using standard additive-model-based genome-wide association (GWA) approaches. However, as the broad-sense heritability for root length was significantly larger, and primarily due to epistasis, we also performed an epistatic GWA analysis to map loci contributing to the epistatic genetic variance. Four interacting pairs of loci were revealed, involving seven chromosomal loci that passed a standard multiple-testing corrected significance threshold. The genotype-phenotype maps for these pairs revealed epistasis that cancelled out the additive genetic variance, explaining why these loci were not detected in the additive GWA analysis. Small population sizes, such as in our experiment, increase the risk of identifying false epistatic interactions due to testing for associations with very large numbers of multi-marker genotypes in few phenotyped individuals. Therefore, we estimated the false-positive risk using a new statistical approach that suggested half of the associated pairs to be true positive associations. Our experimental evaluation of candidate genes within the seven associated loci suggests that this estimate is conservative; we identified functional candidate genes that affected root development in four loci that were part of three of the pairs. The statistical epistatic analyses were thus indispensable for confirming known, and identifying new, candidate genes for root length in this population of wild-collected A. thaliana accessions. We also illustrate how epistatic cancellation of the additive genetic variance explains the insignificant narrow-sense and significant broad-sense heritability by using a combination of careful statistical epistatic analyses and functional genetic experiments. Complex traits, such as many human diseases or climate adaptation and production traits in crops, arise through the action and interaction of many genes and environmental factors. Classic approaches to identify contributing genes generally assume that these factors contribute mainly additive genetic variance. Recent methods, such as genome-wide association studies, often adhere to this additive genetics paradigm. However, additive models of complex traits do not reflect that genes can also contribute with non-additive genetic variance. In this study, we use Arabidopsis thaliana to determine the additive and non-additive genetic contributions to the phenotypic variation in root length. Surprisingly, much of the observed phenotypic variation in root length across genetically divergent strains was explained by epistasis. We mapped seven loci contributing to the epistatic genetic variance and validated four genes in these loci with mutant analysis. For three of these genes, this is their first implication in root development. Together, our results emphasize the importance of considering both non-additive and additive genetic variance when dissecting complex trait variation, in order not to lose sensitivity in genetic analyses.
Collapse
Affiliation(s)
- Jennifer Lachowiec
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, United States of America
| | - Xia Shen
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail: (CQ); (ÖC)
| | - Örjan Carlborg
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
- * E-mail: (CQ); (ÖC)
| |
Collapse
|
40
|
Kierczak M, Jabłońska J, Forsberg SKG, Bianchi M, Tengvall K, Pettersson M, Scholz V, Meadows JRS, Jern P, Carlborg Ö, Lindblad-Toh K. cgmisc: enhanced genome-wide association analyses and visualization. Bioinformatics 2015; 31:3830-1. [PMID: 26249815 PMCID: PMC4653382 DOI: 10.1093/bioinformatics/btv426] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 07/17/2015] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED High-throughput genotyping and sequencing technologies facilitate studies of complex genetic traits and provide new research opportunities. The increasing popularity of genome-wide association studies (GWAS) leads to the discovery of new associated loci and a better understanding of the genetic architecture underlying not only diseases, but also other monogenic and complex phenotypes. Several softwares are available for performing GWAS analyses, R environment being one of them. RESULTS We present cgmisc, an R package that enables enhanced data analysis and visualization of results from GWAS. The package contains several utilities and modules that complement and enhance the functionality of the existing software. It also provides several tools for advanced visualization of genomic data and utilizes the power of the R language to aid in preparation of publication-quality figures. Some of the package functions are specific for the domestic dog (Canis familiaris) data. AVAILABILITY AND IMPLEMENTATION The package is operating system-independent and is available from: https://github.com/cgmisc-team/cgmisc CONTACT marcin.kierczak@imbim.uu.se. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcin Kierczak
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden, Computational Genetics Section, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden and
| | - Jagoda Jabłońska
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Simon K G Forsberg
- Computational Genetics Section, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden and
| | - Matteo Bianchi
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Katarina Tengvall
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Mats Pettersson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden, Computational Genetics Section, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden and
| | - Veronika Scholz
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Patric Jern
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Örjan Carlborg
- Computational Genetics Section, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden and
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden, Broad Institute of MIT and Harvard, Boston, MA, USA
| |
Collapse
|
41
|
The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics. BIOMED RESEARCH INTERNATIONAL 2015; 2015:143712. [PMID: 26273586 PMCID: PMC4529984 DOI: 10.1155/2015/143712] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/24/2014] [Indexed: 01/05/2023]
Abstract
In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g., dominance and epistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e., N < 10,000) the predictive accuracy of ridge regression is slightly higher than the classical genome-wide association study approach of repeated simple regression (i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially.
Collapse
|
42
|
A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods. Heredity (Edinb) 2015. [PMID: 26126540 DOI: 10.1038/hdy.2015.57.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Collapse
|
43
|
Ratcliffe B, El-Dien OG, Klápště J, Porth I, Chen C, Jaquish B, El-Kassaby YA. A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods. Heredity (Edinb) 2015; 115:547-55. [PMID: 26126540 DOI: 10.1038/hdy.2015.57] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Revised: 04/29/2015] [Accepted: 05/26/2015] [Indexed: 11/09/2022] Open
Abstract
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3-40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31-0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04-0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Collapse
Affiliation(s)
- B Ratcliffe
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, British Columbia, Canada
| | - O G El-Dien
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, British Columbia, Canada
| | - J Klápště
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, British Columbia, Canada.,Department of Genetics and Physiology of Forest Trees, Faculty of Forestry and Wood Sciences, Czech University of Life Sciences Prague, Praha 6, Czech Republic
| | - I Porth
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, British Columbia, Canada
| | - C Chen
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, USA
| | - B Jaquish
- British Columbia Ministry of Forests, Lands and Natural Resource Operations, Tree Improvement Branch, Kalamalka Research Station and Seed Orchard, Vernon, British Columbia, Canada
| | - Y A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
44
|
Gamal El-Dien O, Ratcliffe B, Klápště J, Chen C, Porth I, El-Kassaby YA. Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing. BMC Genomics 2015; 16:370. [PMID: 25956247 PMCID: PMC4424896 DOI: 10.1186/s12864-015-1597-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 04/28/2015] [Indexed: 02/02/2024] Open
Abstract
Background Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Results Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. Conclusions The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1597-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Omnia Gamal El-Dien
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada.
| | - Blaise Ratcliffe
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada.
| | - Jaroslav Klápště
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada. .,Department of Genetics and Physiology of Forest Trees, Faculty of Forestry and Wood Sciences, Czech University of Life Sciences Prague, Kamycka 129, 165 21, Prague 6, Czech Republic.
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078-3035, USA.
| | - Ilga Porth
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada.
| | - Yousry A El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada.
| |
Collapse
|
45
|
Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait. PLoS One 2015; 10:e0126880. [PMID: 25950439 PMCID: PMC4423967 DOI: 10.1371/journal.pone.0126880] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 04/08/2015] [Indexed: 12/01/2022] Open
Abstract
The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.
Collapse
|
46
|
Lee YS, Kim HJ, Cho S, Kim H. The Usage of an SNP-SNP Relationship Matrix for Best Linear Unbiased Prediction (BLUP) Analysis Using a Community-Based Cohort Study. Genomics Inform 2015; 12:254-60. [PMID: 25705167 PMCID: PMC4330263 DOI: 10.5808/gi.2014.12.4.254] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2014] [Revised: 08/18/2014] [Accepted: 09/16/2014] [Indexed: 11/25/2022] Open
Abstract
Best linear unbiased prediction (BLUP) has been used to estimate the fixed effects and random effects of complex traits. Traditionally, genomic relationship matrix-based (GRM) and random marker-based BLUP analyses are prevalent to estimate the genetic values of complex traits. We used three methods: GRM-based prediction (G-BLUP), random marker-based prediction using an identity matrix (so-called single-nucleotide polymorphism [SNP]-BLUP), and SNP-SNP variance-covariance matrix (so-called SNP-GBLUP). We used 35,675 SNPs and R package "rrBLUP" for the BLUP analysis. The SNP-SNP relationship matrix was calculated using the GRM and Sherman-Morrison-Woodbury lemma. The SNP-GBLUP result was very similar to G-BLUP in the prediction of genetic values. However, there were many discrepancies between SNP-BLUP and the other two BLUPs. SNP-GBLUP has the merit to be able to predict genetic values through SNP effects.
Collapse
Affiliation(s)
- Young-Sup Lee
- Department of Natural Science, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea
| | | | | | - Heebal Kim
- Department of Natural Science, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea. ; C&K Genomics, Seoul 151-742, Korea. ; Department of Agricultural Biotechnology, Animal Biotechnology, and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea
| |
Collapse
|
47
|
Abstract
BACKGROUND A method for estimating genomic breeding values (GEBV) based on the Horseshoe prior was introduced and used on the analysis of the 16(th) QTLMAS workshop dataset, which resembles three milk production traits. The method was compared with five commonly used methods: Bayes A, Bayes B, Bayes C, Bayesian Lasso and GLUP. METHODS The main difference between the methods is the prior distribution assumed during the estimation of the SNP effects. The distribution of the Bayesian Lasso is a Laplace distribution; for Bayes A is a Student-t; for Bayes B and Bayes C is a spike and slab prior combining a proportion of SNP without effect and a proportion with effect distributed as a Student-t or Gaussian for Bayes B and C, respectively; for GBLUP is similar to a ridge regression. The distribution for the Horseshoe prior behaves like log(1+1/β(2)) (up to a constant). It has an infinite spike at zero and heavy tail that decay by β(-2) (slower than the Laplace or the Student-t). The implementation of all methods (except GBLUP) was done using a MCMC approach, where the relevant parameters defining the prior distributions were jointly estimated from the data. The GBLUP was done using ASREML. RESULTS The accuracy for all methods ranged from 0.74 to 0.83, representing an improvement of 44% to 78% over the traditional BLUP evaluation. GEBV with the highest accuracy were obtained with Bayes A, Bayes B and the Horseshoe prior. The Horseshoe tended to select smaller number of SNP and assigning them larger effects, while strongly shrinking the remaining SNP to have an effect closer to zero. CONCLUSIONS The Horseshoe prior showed a different shrinkage pattern than the other methods. While for this specific dataset, this has little impact on the accuracy of the GEBV, it may prove a good property to discriminate true effect from noise, and thereby, improve overall prediction under different scenarios.
Collapse
Affiliation(s)
- Ricardo Pong-Wong
- The Roslin Institute and the R(D)SVS, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| |
Collapse
|
48
|
Shen X, De Jonge J, Forsberg SKG, Pettersson ME, Sheng Z, Hennig L, Carlborg Ö. Natural CMT2 variation is associated with genome-wide methylation changes and temperature seasonality. PLoS Genet 2014; 10:e1004842. [PMID: 25503602 PMCID: PMC4263395 DOI: 10.1371/journal.pgen.1004842] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 10/21/2014] [Indexed: 12/30/2022] Open
Abstract
As Arabidopsis thaliana has colonized a wide range of habitats across the world it is an attractive model for studying the genetic mechanisms underlying environmental adaptation. Here, we used public data from two collections of A. thaliana accessions to associate genetic variability at individual loci with differences in climates at the sampling sites. We use a novel method to screen the genome for plastic alleles that tolerate a broader climate range than the major allele. This approach reduces confounding with population structure and increases power compared to standard genome-wide association methods. Sixteen novel loci were found, including an association between Chromomethylase 2 (CMT2) and temperature seasonality where the genome-wide CHH methylation was different for the group of accessions carrying the plastic allele. Cmt2 mutants were shown to be more tolerant to heat-stress, suggesting genetic regulation of epigenetic modifications as a likely mechanism underlying natural adaptation to variable temperatures, potentially through differential allelic plasticity to temperature-stress.
Collapse
Affiliation(s)
- Xia Shen
- Swedish University of Agricultural Sciences, Department of Clinical Sciences, Division of Computational Genetics, Uppsala, Sweden
- Karolinska Institutet, Department of Medical Epidemiology and Biostatistics, Stockholm, Sweden
- University of Edinburgh, MRC Institute of Genetics and Molecular Medicine, MRC Human Genetics Unit, Edinburgh, United Kingdom
| | - Jennifer De Jonge
- Swedish University of Agricultural Sciences, Department of Plant Biology, Uppsala, Sweden
| | - Simon K. G. Forsberg
- Swedish University of Agricultural Sciences, Department of Clinical Sciences, Division of Computational Genetics, Uppsala, Sweden
| | - Mats E. Pettersson
- Swedish University of Agricultural Sciences, Department of Clinical Sciences, Division of Computational Genetics, Uppsala, Sweden
| | - Zheya Sheng
- Swedish University of Agricultural Sciences, Department of Clinical Sciences, Division of Computational Genetics, Uppsala, Sweden
| | - Lars Hennig
- Swedish University of Agricultural Sciences, Department of Plant Biology, Uppsala, Sweden
| | - Örjan Carlborg
- Swedish University of Agricultural Sciences, Department of Clinical Sciences, Division of Computational Genetics, Uppsala, Sweden
| |
Collapse
|
49
|
Falke KC, Mahone GS, Bauer E, Haseneyer G, Miedaner T, Breuer F, Frisch M. Genome-wide prediction methods for detecting genetic effects of donor chromosome segments in introgression populations. BMC Genomics 2014; 15:782. [PMID: 25213628 PMCID: PMC4169839 DOI: 10.1186/1471-2164-15-782] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 08/20/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Introgression populations are used to make the genetic variation of unadapted germplasm or wild relatives of crops available for plant breeding. They consist of introgression lines that carry small chromosome segments from an exotic donor in the genetic background of an elite line. The goal of our study was to investigate the detection of favorable donor chromosome segments in introgression lines with statistical methods developed for genome-wide prediction. RESULTS Computer simulations showed that genome-wide prediction employing heteroscedastic marker variances had a greater power and a lower false positive rate compared with homoscedastic marker variances when the phenotypic difference between the donor and recipient lines was controlled by few genes. The simulations helped to interpret the analyses of glycosinolate and linolenic acid content in a rapeseed introgression population and plant height in a rye introgression population. These analyses support the superiority of genome-wide prediction approaches that use heteroscedastic marker variances. CONCLUSIONS We conclude that genome-wide prediction methods in combination with permutation tests can be employed for analysis of introgression populations. They are particularly useful when introgression lines carry several donor segments or when the donor segments of different introgression lines are overlapping.
Collapse
Affiliation(s)
- Karen Christin Falke
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, 35392 Giessen, Germany.
| | | | | | | | | | | | | |
Collapse
|
50
|
Fabregat-Traver D, Sharapov SZ, Hayward C, Rudan I, Campbell H, Aulchenko Y, Bientinesi P. High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software. F1000Res 2014; 3:200. [PMID: 25717363 PMCID: PMC4329600 DOI: 10.12688/f1000research.4867.1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/07/2014] [Indexed: 01/06/2023] Open
Abstract
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the 'omics' context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.
Collapse
Affiliation(s)
- Diego Fabregat-Traver
- Aachen Institute for Advanced Study in Computational Engineering Science, Aachen, 52062, Germany
| | - Sodbo Zh. Sharapov
- Institute of Cytology and Genetics, Siberian Division of the Russian Academy of Sciences, Novosibirsk, 630090, Russian Federation
- Novosibirsk State University, Novosibirsk, 630090, Russian Federation
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Igor Rudan
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, EH8 9AG, UK
- Split University, Split, 21000, Croatia
| | - Harry Campbell
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, EH8 9AG, UK
| | - Yurii Aulchenko
- Institute of Cytology and Genetics, Siberian Division of the Russian Academy of Sciences, Novosibirsk, 630090, Russian Federation
- Novosibirsk State University, Novosibirsk, 630090, Russian Federation
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, EH8 9AG, UK
| | - Paolo Bientinesi
- Aachen Institute for Advanced Study in Computational Engineering Science, Aachen, 52062, Germany
| |
Collapse
|