1
|
Gibbs PM, Paril JF, Fournier-Level A. Trait genetic architecture and population structure determine model selection for genomic prediction in natural Arabidopsis thaliana populations. Genetics 2025; 229:iyaf003. [PMID: 39814947 DOI: 10.1093/genetics/iyaf003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 12/29/2024] [Indexed: 01/18/2025] Open
Abstract
Genomic prediction applies to any agro- or ecologically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear, non-parametric approaches, has not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits in 1,000+ natural genotypes of the model plant Arabidopsis thaliana, we assessed the performance of penalized regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait-notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering time and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.
Collapse
Affiliation(s)
- Patrick M Gibbs
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia
| | - Jefferson F Paril
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia
- Agriculture Victoria Research, Department of Energy, Environment and Climate Action, La Trobe University, AgriBio, 5 Ring Road, Bundoora, VIC 3083, Australia
| | - Alexandre Fournier-Level
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, VIC 3010, Australia
| |
Collapse
|
2
|
John M, Korte A, Todesco M, Grimm DG. Population-aware permutation-based significance thresholds for genome-wide association studies. BIOINFORMATICS ADVANCES 2024; 4:vbae168. [PMID: 39678204 PMCID: PMC11639184 DOI: 10.1093/bioadv/vbae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 10/02/2024] [Accepted: 10/25/2024] [Indexed: 12/17/2024]
Abstract
Motivation Permutation-based significance thresholds have been shown to be a robust alternative to classical Bonferroni significance thresholds in genome-wide association studies (GWAS) for skewed phenotype distributions. The recently published method permGWAS introduced a batch-wise approach to efficiently compute permutation-based GWAS. However, running multiple univariate tests in parallel leads to many repetitive computations and increased computational resources. More importantly, traditional permutation methods that permute only the phenotype break the underlying population structure. Results We propose permGWAS2, an improved method that does not break the population structure during permutations and uses an elegant block matrix decomposition to optimize computations, thereby reducing redundancies. We show on synthetic data that this improved approach yields a lower false discovery rate for skewed phenotype distributions compared to the previous version and the commonly used Bonferroni correction. In addition, we re-analyze a dataset covering phenotypic variation in 86 traits in a population of 615 wild sunflowers (Helianthus annuus L.). This led to the identification of dozens of novel associations with putatively adaptive traits, and removed several likely false-positive associations with limited biological support. Availability and implementation permGWAS2 is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
| | - Arthur Korte
- Faculty of Biology, University of Würzburg, 97074 Würzburg, Germany
| | - Marco Todesco
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Biology, University of British Columbia, Kelowna, BC V1V 1V7, Canada
| | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
- Technical University of Munich, TUM School of Computation, Information and Technology, 85748 Garching, Germany
| |
Collapse
|
3
|
Arouisse B, Thoen MPM, Kruijer W, Kunst JF, Jongsma MA, Keurentjes JJB, Kooke R, de Vos RCH, Mumm R, van Eeuwijk FA, Dicke M, Kloth KJ. Bivariate GWA mapping reveals associations between aliphatic glucosinolates and plant responses to thrips and heat stress. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:674-686. [PMID: 39316617 DOI: 10.1111/tpj.17009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/20/2024] [Indexed: 09/26/2024]
Abstract
Although plants harbor a huge phytochemical diversity, only a fraction of plant metabolites is functionally characterized. In this work, we aimed to identify the genetic basis of metabolite functions during harsh environmental conditions in Arabidopsis thaliana. With machine learning algorithms we predicted stress-specific metabolomes for 23 (a)biotic stress phenotypes of 300 natural Arabidopsis accessions. The prediction models identified several aliphatic glucosinolates (GLSs) and their breakdown products to be implicated in responses to heat stress in siliques and herbivory by Western flower thrips, Frankliniella occidentalis. Bivariate GWA mapping of the metabolome predictions and their respective (a)biotic stress phenotype revealed genetic associations with MAM, AOP, and GS-OH, all three involved in aliphatic GSL biosynthesis. We, therefore, investigated thrips herbivory on AOP, MAM, and GS-OH loss-of-function and/or overexpression lines. Arabidopsis accessions with a combination of MAM2 and AOP3, leading to 3-hydroxypropyl dominance, suffered less from thrips feeding damage. The requirement of MAM2 for this effect could, however, not be confirmed with an introgression line of ecotypes Cvi and Ler, most likely due to other, unknown susceptibility factors in the Ler background. However, AOP2 and GS-OH, adding alkenyl or hydroxy-butenyl groups, respectively, did not have major effects on thrips feeding. Overall, this study illustrates the complex implications of aliphatic GSL diversity in plant responses to heat stress and a cell-content-feeding herbivore.
Collapse
Affiliation(s)
- Bader Arouisse
- Biometris, Wageningen University and Research, Wageningen, the Netherlands
| | - Manus P M Thoen
- Laboratory of Entomology, Wageningen University & Research, Wageningen, the Netherlands
- Enza Seeds, Enkhuizen, the Netherlands
| | - Willem Kruijer
- Biometris, Wageningen University and Research, Wageningen, the Netherlands
| | - Jonathan F Kunst
- Biometris, Wageningen University and Research, Wageningen, the Netherlands
| | - Maarten A Jongsma
- Bioscience, Wageningen Plant Research, Wageningen University and Research, Wageningen, the Netherlands
| | - Joost J B Keurentjes
- Laboratory of Genetics, Wageningen University and Research, Wageningen, the Netherlands
| | - Rik Kooke
- Biometris, Wageningen University and Research, Wageningen, the Netherlands
- Laboratory of Genetics, Wageningen University and Research, Wageningen, the Netherlands
| | - Ric C H de Vos
- Bioscience, Wageningen Plant Research, Wageningen University and Research, Wageningen, the Netherlands
| | - Roland Mumm
- Bioscience, Wageningen Plant Research, Wageningen University and Research, Wageningen, the Netherlands
| | - Fred A van Eeuwijk
- Biometris, Wageningen University and Research, Wageningen, the Netherlands
| | - Marcel Dicke
- Laboratory of Entomology, Wageningen University & Research, Wageningen, the Netherlands
| | - Karen J Kloth
- Laboratory of Entomology, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
4
|
John M, Korte A, Grimm DG. The benefits of permutation-based genome-wide association studies. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:5377-5389. [PMID: 38954539 PMCID: PMC11389838 DOI: 10.1093/jxb/erae280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 07/01/2024] [Indexed: 07/04/2024]
Abstract
Linear mixed models (LMMs) are a commonly used method for genome-wide association studies (GWAS) that aim to detect associations between genetic markers and phenotypic measurements in a population of individuals while accounting for population structure and cryptic relatedness. In a standard GWAS, hundreds of thousands to millions of statistical tests are performed, requiring control for multiple hypothesis testing. Typically, static corrections that penalize the number of tests performed are used to control for the family-wise error rate, which is the probability of making at least one false positive. However, it has been shown that in practice this threshold is too conservative for normally distributed phenotypes and not stringent enough for non-normally distributed phenotypes. Therefore, permutation-based LMM approaches have recently been proposed to provide a more realistic threshold that takes phenotypic distributions into account. In this work, we discuss the advantages of permutation-based GWAS approaches, including new simulations and results from a re-analysis of all publicly available Arabidopsis phenotypes from the AraPheno database.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
| | - Arthur Korte
- University of Würzburg, Faculty of Biology, Julius-von-Sachs Institute, Julius-von-Sachs-Platz 3, 97082 Würzburg, Germany
| | - Dominik G Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Technical University of Munich, TUM School of Computation, Information and Technology, Boltzmannstraße 3, 85748 Garching, Germany
| |
Collapse
|
5
|
Cassan O, Pimpare LL, Mozzanino T, Fizames C, Devidal S, Roux F, Milcu A, Lebre S, Gojon A, Martin A. Natural genetic variation underlying the negative effect of elevated CO 2 on ionome composition in Arabidopsis thaliana. eLife 2024; 12:RP90170. [PMID: 38780431 PMCID: PMC11115449 DOI: 10.7554/elife.90170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024] Open
Abstract
The elevation of atmospheric CO2 leads to a decline in plant mineral content, which might pose a significant threat to food security in coming decades. Although few genes have been identified for the negative effect of elevated CO2 on plant mineral composition, several studies suggest the existence of genetic factors. Here, we performed a large-scale study to explore genetic diversity of plant ionome responses to elevated CO2, using six hundred Arabidopsis thaliana accessions, representing geographical distributions ranging from worldwide to regional and local environments. We show that growth under elevated CO2 leads to a global decrease of ionome content, whatever the geographic distribution of the population. We observed a high range of genetic diversity, ranging from the most negative effect to resilience or even to a benefit in response to elevated CO2. Using genome-wide association mapping, we identified a large set of genes associated with this response, and we demonstrated that the function of one of these genes is involved in the negative effect of elevated CO2 on plant mineral composition. This resource will contribute to understand the mechanisms underlying the effect of elevated CO2 on plant mineral nutrition, and could help towards the development of crops adapted to a high-CO2 world.
Collapse
Affiliation(s)
- Oceane Cassan
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut AgroMontpellierFrance
| | - Lea-Lou Pimpare
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut AgroMontpellierFrance
| | - Timothy Mozzanino
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut AgroMontpellierFrance
| | - Cecile Fizames
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut AgroMontpellierFrance
| | - Sebastien Devidal
- Montpellier European Ecotron, Univ Montpellier, CNRS, Campus BaillarguetMontpellierFrance
| | - Fabrice Roux
- Laboratoire des Interactions Plantes-Microbes-Environnement, Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, CNRS, Université de ToulouseCastanet-TolosanFrance
| | - Alexandru Milcu
- Montpellier European Ecotron, Univ Montpellier, CNRS, Campus BaillarguetMontpellierFrance
- CEFE, Univ Montpellier, CNRS, EPHE, IRDMontpellierFrance
| | | | - Alain Gojon
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut AgroMontpellierFrance
| | - Antoine Martin
- IPSiM, Univ Montpellier, CNRS, INRAE, Institut AgroMontpellierFrance
| |
Collapse
|
6
|
Reichelt N, Korte A, Krischke M, Mueller MJ, Maag D. Natural variation of warm temperature-induced raffinose accumulation identifies TREHALOSE-6-PHOSPHATE SYNTHASE 1 as a modulator of thermotolerance. PLANT, CELL & ENVIRONMENT 2023; 46:3392-3404. [PMID: 37427798 DOI: 10.1111/pce.14664] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/27/2023] [Accepted: 06/28/2023] [Indexed: 07/11/2023]
Abstract
High-temperature stress limits plant growth and reproduction. Exposure to high temperature, however, also elicits a physiological response, which protects plants from the damage evoked by heat. This response involves a partial reconfiguration of the metabolome including the accumulation of the trisaccharide raffinose. In this study, we explored the intraspecific variation of warm temperature-induced raffinose accumulation as a metabolic marker for temperature responsiveness with the aim to identify genes that contribute to thermotolerance. By combining raffinose measurements in 250 Arabidopsis thaliana accessions following a mild heat treatment with genome-wide association studies, we identified five genomic regions that were associated with the observed trait variation. Subsequent functional analyses confirmed a causal relationship between TREHALOSE-6-PHOSPHATE SYNTHASE 1 (TPS1) and warm temperature-dependent raffinose synthesis. Moreover, complementation of the tps1-1 null mutant with functionally distinct TPS1 isoforms differentially affected carbohydrate metabolism under more severe heat stress. While higher TPS1 activity was associated with reduced endogenous sucrose levels and thermotolerance, disruption of trehalose 6-phosphate signalling resulted in higher accumulation of transitory starch and sucrose and was associated with enhanced heat resistance. Taken together, our findings suggest a role of trehalose 6-phosphate in thermotolerance, most likely through its regulatory function in carbon partitioning and sucrose homoeostasis.
Collapse
Affiliation(s)
- Niklas Reichelt
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Würzburg, Germany
| | - Markus Krischke
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| | - Martin J Mueller
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| | - Daniel Maag
- Department of Pharmaceutical Biology, Julius-von-Sachs-Institute of Biosciences, University of Würzburg, Würzburg, Germany
| |
Collapse
|
7
|
Aarabi F, Ghigi A, Ahchige MW, Bulut M, Geigenberger P, Neuhaus HE, Sampathkumar A, Alseekh S, Fernie AR. Genome-wide association study unveils ascorbate regulation by PAS/LOV PROTEIN during high light acclimation. PLANT PHYSIOLOGY 2023; 193:2037-2054. [PMID: 37265123 PMCID: PMC10602610 DOI: 10.1093/plphys/kiad323] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 05/10/2023] [Accepted: 05/10/2023] [Indexed: 06/03/2023]
Abstract
Varying light conditions elicit metabolic responses as part of acclimation with changes in ascorbate levels being an important component. Here, we adopted a genome-wide association-based approach to characterize the response in ascorbate levels on high light (HL) acclimation in a panel of 315 Arabidopsis (Arabidopsis thaliana) accessions. These studies revealed statistically significant SNPs for total and reduced ascorbate under HL conditions at a locus in chromosome 2. Ascorbate levels under HL and the region upstream and within PAS/LOV PROTEIN (PLP) were strongly associated. Intriguingly, subcellular localization analyses revealed that the PLPA and PLPB splice variants co-localized with VITAMIN C DEFECTIVE2 (VTC2) and VTC5 in both the cytosol and nucleus. Yeast 2-hybrid and bimolecular fluorescence complementation analyses revealed that PLPA and PLPB interact with VTC2 and that blue light diminishes this interaction. Furthermore, PLPB knockout mutants were characterized by 1.5- to 1.7-fold elevations in their ascorbate levels, whereas knockout mutants of the cry2 cryptochromes displayed 1.2- to 1.3-fold elevations compared to WT. Our results collectively indicate that PLP plays a critical role in the elevation of ascorbate levels, which is a signature response of HL acclimation. The results strongly suggest that this is achieved via the release of the inhibitory effect of PLP on VTC2 upon blue light illumination, as the VTC2-PLPB interaction is stronger under darkness. The conditional importance of the cryptochrome receptors under different environmental conditions suggests a complex hierarchy underpinning the environmental control of ascorbate levels. However, the data we present here clearly demonstrate that PLP dominates during HL acclimation.
Collapse
Affiliation(s)
- Fayezeh Aarabi
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Andrea Ghigi
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Micha Wijesingha Ahchige
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Mustafa Bulut
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Peter Geigenberger
- Department Biology I, Ludwig-Maximilians-University Munich, Planegg-Martinsried 82152, Germany
| | - H Ekkehard Neuhaus
- Plant Physiology, University of Kaiserslautern, Kaiserslautern D-67653, Germany
| | - Arun Sampathkumar
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Saleh Alseekh
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
- Crop Quantitative Genetics, Centre of Plant Systems Biology and Biotechnology, Plovdiv 4000, Bulgaria
| | - Alisdair R Fernie
- Central Metabolism, Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
- Crop Quantitative Genetics, Centre of Plant Systems Biology and Biotechnology, Plovdiv 4000, Bulgaria
| |
Collapse
|
8
|
Staunton PM, Peters AJ, Seoighe C. Somatic mutations inferred from RNA-seq data highlight the contribution of replication timing to mutation rate variation in a model plant. Genetics 2023; 225:iyad128. [PMID: 37450609 PMCID: PMC10550316 DOI: 10.1093/genetics/iyad128] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 03/23/2023] [Accepted: 06/11/2023] [Indexed: 07/18/2023] Open
Abstract
Variation in the rates and characteristics of germline and somatic mutations across the genome of an organism is informative about DNA damage and repair processes and can also shed light on aspects of organism physiology and evolution. We adapted a recently developed method for inferring somatic mutations from bulk RNA-seq data and applied it to a large collection of Arabidopsis thaliana accessions. The wide range of genomic data types available for A. thaliana enabled us to investigate the relationships of multiple genomic features with the variation in the somatic mutation rate across the genome of this model plant. We observed that late replicated regions showed evidence of an elevated rate of somatic mutation compared to genomic regions that are replicated early. We identified transcriptional strand asymmetries, consistent with the effects of transcription-coupled damage and/or repair. We also observed a negative relationship between the inferred somatic mutation count and the H3K36me3 histone mark which is well documented in the literature of human systems. In addition, we were able to support previous reports of an inverse relationship between inferred somatic mutation count and guanine-cytosine content as well as a positive relationship between inferred somatic mutation count and DNA methylation for both cytosine and noncytosine mutations.
Collapse
Affiliation(s)
- Patrick M Staunton
- School of Mathematical and Statistical Sciences, University of Galway, Galway H91 TK33, Ireland
| | - Andrew J Peters
- School of Mathematical and Statistical Sciences, University of Galway, Galway H91 TK33, Ireland
| | - Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, Galway H91 TK33, Ireland
| |
Collapse
|
9
|
Córdoba SC, Tong H, Burgos A, Zhu F, Alseekh S, Fernie AR, Nikoloski Z. Identification of gene function based on models capturing natural variability of Arabidopsis thaliana lipid metabolism. Nat Commun 2023; 14:4897. [PMID: 37580345 PMCID: PMC10425450 DOI: 10.1038/s41467-023-40644-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 08/04/2023] [Indexed: 08/16/2023] Open
Abstract
Lipids play fundamental roles in regulating agronomically important traits. Advances in plant lipid metabolism have until recently largely been based on reductionist approaches, although modulation of its components can have system-wide effects. However, existing models of plant lipid metabolism provide lumped representations, hindering detailed study of component modulation. Here, we present the Plant Lipid Module (PLM) which provides a mechanistic description of lipid metabolism in the Arabidopsis thaliana rosette. We demonstrate that the PLM can be readily integrated in models of A. thaliana Col-0 metabolism, yielding accurate predictions (83%) of single lethal knock-outs and 75% concordance between measured transcript and predicted flux changes under extended darkness. Genome-wide associations with fluxes obtained by integrating the PLM in diel condition- and accession-specific models identify up to 65 candidate genes modulating A. thaliana lipid metabolism. Using mutant lines, we validate up to 40% of the candidates, paving the way for identification of metabolic gene function based on models capturing natural variability in metabolism.
Collapse
Affiliation(s)
- Sandra Correa Córdoba
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany.
- Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
| | - Hao Tong
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| | - Asdrúbal Burgos
- Department of Zoology and Botany, University of Guadalajara, Guadalajara, Mexico
| | - Feng Zhu
- National R&D Center for Citrus Preservation, Hubei Hongshan Laboratory, National Key Laboratory for Germplasm Innovation and Utilization for Horticultural Crops, Huazhong Agricultural University, Wuhan, China
| | - Saleh Alseekh
- Central Metabolism, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria
| | - Alisdair R Fernie
- Central Metabolism, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany.
- Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria.
| |
Collapse
|
10
|
Putra AR, Yen JDL, Fournier-Level A. Forecasting trait responses in novel environments to aid seed provenancing under climate change. Mol Ecol Resour 2023; 23:565-580. [PMID: 36308465 DOI: 10.1111/1755-0998.13728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 10/23/2022] [Accepted: 10/27/2022] [Indexed: 11/28/2022]
Abstract
Revegetation projects face the major challenge of sourcing optimal plant material. This is often done with limited information about plant performance and increasingly requires factoring resilience to climate change. Functional traits can be used as quantitative indices of plant performance and guide seed provenancing, but trait values expected under novel conditions are often unknown. To support climate-resilient provenancing efforts, we develop a trait prediction model that integrates the effect of genetic variation with fine-scale temperature variation. We train our model on multiple field plantings of Arabidopsis thaliana and predict two relevant fitness traits-days-to-bolting and fecundity-across the species' European range. Prediction accuracy was high for days-to-bolting and moderate for fecundity, with the majority of trait variation explained by temperature differences between plantings. Projection under future climate predicted a decline in fecundity, although this response was heterogeneous across the range. In response, we identified novel genotypes that could be introduced to genetically offset the fitness decay. Our study highlights the value of predictive models to aid seed provenancing and improve the success of revegetation projects.
Collapse
Affiliation(s)
- Andhika R Putra
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Jian D L Yen
- Arthur Rylah Institute for Environmental Research, Heidelberg, Victoria, Australia
| | | |
Collapse
|
11
|
Analysis of dog breed diversity using a composite selection index. Sci Rep 2023; 13:1674. [PMID: 36717599 PMCID: PMC9886904 DOI: 10.1038/s41598-023-28826-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 01/25/2023] [Indexed: 01/31/2023] Open
Abstract
During breed development, domestic dogs have undergone genetic bottlenecks and sustained selective pressures, as a result distinctive genomic diversity occurs to varying degrees within and between breed groups. This diversity can be identified using standard methods or combinations of these methods. This study explored the application of a combined selection index, composite selection signals (CSS), derived from multiple methods to an existing genotype dataset from three breed groups developed in distinct regions of Asia: Qinghai-Tibet plateau dogs (adapted to living at altitude), Xi dogs (with superior running ability) and Mountain hounds (used for hunting ability). The CSS analysis confirmed top ranked genomic regions on CFA10 and CFA21 in Qinghai-Tibet plateau dogs, CFA1 in Xi dogs and CFA5 in Mountain hounds. CSS analysis identified additional significant genomic regions in each group, defined by a total of 1,397, 1,475 and 1,675 significant SNPs in the Qinghai-Tibetan Plateau dogs, Xi dogs and Mountain hounds, respectively. Chitinase 3 Like 1 (CHI3L1) and Leucine Rich Repeat Containing G Protein-Coupled Receptor 6 (LGR6) genes were located in the top ranked region on CFA7 (0.02-1 Mb) in the Qinghai-Tibetan Plateau dogs. Both genes have been associated with hypoxia responses or altitude adaptation in humans. For the Xi dogs, the top ranked region on CFA25 contained the Transient Receptor Potential Cation Channel Subfamily C Member 4 (TRPC4) gene. This calcium channel is important for optimal muscle performance during exercise. The outstanding signals in the Mountain dogs were on CFA5 with 213 significant SNPs that spanned genes involved in cardiac development, sight and generation of biochemical energy. These findings support the use of the combined index approach for identifying novel regions of genome diversity in dogs. As with other methods, the results do not prove causal links between these regions and phenotypes, but they may assist in focusing future studies that seek to identify functional pathways that contribute to breed diversity.
Collapse
|
12
|
Almira Casellas MJ, Pérez‐Martín L, Busoms S, Boesten R, Llugany M, Aarts MGM, Poschenrieder C. A genome-wide association study identifies novel players in Na and Fe homeostasis in Arabidopsis thaliana under alkaline-salinity stress. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 113:225-245. [PMID: 36433704 PMCID: PMC10108281 DOI: 10.1111/tpj.16042] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/11/2022] [Accepted: 11/21/2022] [Indexed: 06/16/2023]
Abstract
In nature, multiple stress factors occur simultaneously. The screening of natural diversity panels and subsequent Genome-Wide Association Studies (GWAS) is a powerful approach to identify genetic components of various stress responses. Here, the nutritional status variation of a set of 270 natural accessions of Arabidopsis thaliana grown on a natural saline-carbonated soil is evaluated. We report significant natural variation on leaf Na (LNa) and Fe (LFe) concentrations in the studied accessions. Allelic variation in the NINJA and YUC8 genes is associated with LNa diversity, and variation in the ALA3 is associated with LFe diversity. The allelic variation detected in these three genes leads to changes in their mRNA expression and correlates with plant differential growth performance when plants are exposed to alkaline salinity treatment under hydroponic conditions. We propose that YUC8 and NINJA expression patters regulate auxin and jasmonic signaling pathways affecting plant tolerance to alkaline salinity. Finally, we describe an impairment in growth and leaf Fe acquisition associated with differences in root expression of ALA3, encoding a phospholipid translocase active in plasma membrane and the trans Golgi network which directly interacts with proteins essential for the trafficking of PIN auxin transporters, reinforcing the role of phytohormonal processes in regulating ion homeostasis under alkaline salinity.
Collapse
Affiliation(s)
- Maria Jose Almira Casellas
- Plant Physiology Laboratory, Bioscience FacultyUniversitat Autònoma de BarcelonaC/de la Vall Moronta s/nE‐08193BellaterraSpain
| | - Laura Pérez‐Martín
- Plant Physiology Laboratory, Bioscience FacultyUniversitat Autònoma de BarcelonaC/de la Vall Moronta s/nE‐08193BellaterraSpain
- Department of Botany and Plant BiologyUniversity of Geneva1211GenevaSwitzerland
| | - Silvia Busoms
- Plant Physiology Laboratory, Bioscience FacultyUniversitat Autònoma de BarcelonaC/de la Vall Moronta s/nE‐08193BellaterraSpain
| | - René Boesten
- Laboratory of GeneticsWageningen University and ResearchDroevendaalsesteeg 16708 PBWageningenThe Netherlands
| | - Mercè Llugany
- Plant Physiology Laboratory, Bioscience FacultyUniversitat Autònoma de BarcelonaC/de la Vall Moronta s/nE‐08193BellaterraSpain
| | - Mark G. M. Aarts
- Laboratory of GeneticsWageningen University and ResearchDroevendaalsesteeg 16708 PBWageningenThe Netherlands
| | - Charlotte Poschenrieder
- Plant Physiology Laboratory, Bioscience FacultyUniversitat Autònoma de BarcelonaC/de la Vall Moronta s/nE‐08193BellaterraSpain
| |
Collapse
|
13
|
John M, Grimm D, Korte A. Predicting Gene Regulatory Interactions Using Natural Genetic Variation. Methods Mol Biol 2023; 2698:301-322. [PMID: 37682482 DOI: 10.1007/978-1-0716-3354-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Genome-wide association studies (GWAS) are a powerful tool to elucidate the genotype-phenotype map. Although GWAS are usually used to assess simple univariate associations between genetic markers and traits of interest, it is also possible to infer the underlying genetic architecture and to predict gene regulatory interactions. In this chapter, we describe the latest methods and tools to perform GWAS by calculating permutation-based significance thresholds. For this purpose, we first provide guidelines on univariate GWAS analyses that are extended in the second part of this chapter to more complex models that enable the inference of gene regulatory networks and how these networks vary.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich & Weihenstephan-Triesdorf University of Applied Sciences, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
| | - Dominik Grimm
- Technical University of Munich & Weihenstephan-Triesdorf University of Applied Sciences, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
14
|
Niehoff T, Pook T, Gholami M, Beissinger T. Imputation of low-density marker chip data in plant breeding: Evaluation of methods based on sugar beet. THE PLANT GENOME 2022; 15:e20257. [PMID: 36258672 DOI: 10.1002/tpg2.20257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/02/2022] [Indexed: 06/16/2023]
Abstract
Low-density genotyping followed by imputation reduces genotyping costs while still providing high-density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet (Beta vulgaris L. ssp. vulgaris) as an example crop, where these are realistic marker numbers for modern breeding applications. The generally accepted 'gold standard' for imputation, Beagle 5.1, was compared with the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low-density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation whereas Beagle was better for phasing. Combining both tools yielded the highest accuracies.
Collapse
Affiliation(s)
- Tobias Niehoff
- Animal Breeding and Genomics, Wageningen Univ. & Research, Postbox 338, 6700AH, Wageningen, The Netherlands
- Dep. of Crop Sciences, Division of Plant Breeding Methodology, Univ. of Göttingen, Göttingen, 37075, Germany
| | - Torsten Pook
- Animal Breeding and Genomics, Wageningen Univ. & Research, Postbox 338, 6700AH, Wageningen, The Netherlands
- Dep. of Animal Sciences, Animal Breeding and Genetics Group, Univ. of Göttingen, Göttingen, 37075, Germany
- Center for Integrated Breeding Research, Univ. of Göttingen, Göttingen, 37075, Germany
| | - Mahmood Gholami
- RD-SBCE-BTA, KWS SAAT SE & Co. KGaA, Grimsehlstr. 31, Einbeck, 37574, Germany
| | - Timothy Beissinger
- Dep. of Crop Sciences, Division of Plant Breeding Methodology, Univ. of Göttingen, Göttingen, 37075, Germany
- Center for Integrated Breeding Research, Univ. of Göttingen, Göttingen, 37075, Germany
| |
Collapse
|
15
|
López-Ruiz BA, Quezada-Rodríguez EH, Piñeyro-Nelson A, Tovar H, García-Ponce B, Sánchez MDLP, Álvarez-Buylla ER, Garay-Arroyo A. Combined Approach of GWAS and Phylogenetic Analyses to Identify New Candidate Genes That Participate in Arabidopsis thaliana Primary Root Development Using Cellular Measurements and Primary Root Length. PLANTS (BASEL, SWITZERLAND) 2022; 11:3162. [PMID: 36432890 PMCID: PMC9697774 DOI: 10.3390/plants11223162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/13/2022] [Accepted: 11/15/2022] [Indexed: 06/16/2023]
Abstract
Genome-wide association studies (GWAS) have allowed the identification of different loci associated with primary root (PR) growth, and Arabidopsis is an excellent model for these studies. The PR length is controlled by cell proliferation, elongation, and differentiation; however, the specific contribution of proliferation and differentiation in the control of PR growth is still poorly studied. To this end, we analyzed 124 accessions and used a GWAS approach to identify potential causal genomic regions related to four traits: PR length, growth rate, cell proliferation and cell differentiation. Twenty-three genes and five statistically significant SNPs were identified. The SNP with the highest score mapped to the fifth exon of NAC048 and this change makes a missense variant in only 33.3% of the accessions with a large PR, compared with the accessions with a short PR length. Moreover, we detected five more SNPs in this gene and in NAC3 that allow us to discover closely related accessions according to the phylogenetic tree analysis. We also found that the association between genetic variants among the 18 genes with the highest scores in our GWAS and the phenotypic classes into which we divided our accessions are not straightforward and likely follow historical patterns.
Collapse
Affiliation(s)
- Brenda Anabel López-Ruiz
- Laboratorio de Genética Molecular, Desarrollo y Evolución de Plantas, Departamento de Ecología Funcional, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| | - Elsa H. Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México 04510, Mexico
| | - Alma Piñeyro-Nelson
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México 04510, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| | - Hugo Tovar
- División de Genómica Computacional, Instituto Nacional de Medicina Genómica (INMEGEN), Ciudad de México 14610, Mexico
| | - Berenice García-Ponce
- Laboratorio de Genética Molecular, Desarrollo y Evolución de Plantas, Departamento de Ecología Funcional, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| | - María de la Paz Sánchez
- Laboratorio de Genética Molecular, Desarrollo y Evolución de Plantas, Departamento de Ecología Funcional, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| | - Elena R. Álvarez-Buylla
- Laboratorio de Genética Molecular, Desarrollo y Evolución de Plantas, Departamento de Ecología Funcional, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| | - Adriana Garay-Arroyo
- Laboratorio de Genética Molecular, Desarrollo y Evolución de Plantas, Departamento de Ecología Funcional, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México 04510, Mexico
| |
Collapse
|
16
|
John M, Haselbeck F, Dass R, Malisi C, Ricca P, Dreischer C, Schultheiss SJ, Grimm DG. A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species. FRONTIERS IN PLANT SCIENCE 2022; 13:932512. [PMID: 36407627 PMCID: PMC9673477 DOI: 10.3389/fpls.2022.932512] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare 12 different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allow us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well-established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | - Florian Haselbeck
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| | | | | | | | | | | | - Dominik G. Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
- Technical University of Munich, Department of Informatics, Garching, Germany
| |
Collapse
|
17
|
John M, Ankenbrand MJ, Artmann C, Freudenthal JA, Korte A, Grimm DG. Efficient permutation-based genome-wide association studies for normal and skewed phenotypic distributions. Bioinformatics 2022; 38:ii5-ii12. [PMID: 36124808 PMCID: PMC9486594 DOI: 10.1093/bioinformatics/btac455] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) are an integral tool for studying the architecture of complex genotype and phenotype relationships. Linear mixed models (LMMs) are commonly used to detect associations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of the residuals and that the genetic markers are independent and identically distributed-both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice, they are rarely implemented due to the high computational complexity. RESULTS We propose permGWAS, an efficient LMM reformulation based on 4D tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than 8 days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results. AVAILABILITY AND IMPLEMENTATION permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maura John
- To whom correspondence should be addressed. or
| | - Markus J Ankenbrand
- Center for Computational and Theoretical Biology, University of Würzburg, 97078 Würzburg, Germany
| | - Carolin Artmann
- Center for Computational and Theoretical Biology, University of Würzburg, 97078 Würzburg, Germany
| | - Jan A Freudenthal
- Center for Computational and Theoretical Biology, University of Würzburg, 97078 Würzburg, Germany
| | | | | |
Collapse
|
18
|
Gloss AD, Vergnol A, Morton TC, Laurin PJ, Roux F, Bergelson J. Genome-wide association mapping within a local Arabidopsis thaliana population more fully reveals the genetic architecture for defensive metabolite diversity. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200512. [PMID: 35634919 PMCID: PMC9149790 DOI: 10.1098/rstb.2020.0512] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 03/08/2022] [Indexed: 12/16/2022] Open
Abstract
A paradoxical finding from genome-wide association studies (GWAS) in plants is that variation in metabolite profiles typically maps to a small number of loci, despite the complexity of underlying biosynthetic pathways. This discrepancy may partially arise from limitations presented by geographically diverse mapping panels. Properties of metabolic pathways that impede GWAS by diluting the additive effect of a causal variant, such as allelic and genetic heterogeneity and epistasis, would be expected to increase in severity with the geographical range of the mapping panel. We hypothesized that a population from a single locality would reveal an expanded set of associated loci. We tested this in a French Arabidopsis thaliana population (less than 1 km transect) by profiling and conducting GWAS for glucosinolates, a suite of defensive metabolites that have been studied in depth through functional and genetic mapping approaches. For two distinct classes of glucosinolates, we discovered more associations at biosynthetic loci than the previous GWAS with continental-scale mapping panels. Candidate genes underlying novel associations were supported by concordance between their observed effects in the TOU-A population and previous functional genetic and biochemical characterization. Local populations complement geographically diverse mapping panels to reveal a more complete genetic architecture for metabolic traits. This article is part of the theme issue 'Genetic basis of adaptation and speciation: from loci to causative mutations'.
Collapse
Affiliation(s)
- Andrew D. Gloss
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Amélie Vergnol
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Timothy C. Morton
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Peter J. Laurin
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Fabrice Roux
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan, France
| | - Joy Bergelson
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| |
Collapse
|
19
|
Lian Q, Solier V, Walkemeier B, Durand S, Huettel B, Schneeberger K, Mercier R. The megabase-scale crossover landscape is largely independent of sequence divergence. Nat Commun 2022; 13:3828. [PMID: 35780220 PMCID: PMC9250513 DOI: 10.1038/s41467-022-31509-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/20/2022] [Indexed: 02/01/2023] Open
Abstract
Meiotic recombination frequency varies along chromosomes and strongly correlates with sequence divergence. However, the causal relationship between recombination landscapes and polymorphisms is unclear. Here, we characterize the genome-wide recombination landscape in the quasi-absence of polymorphisms, using Arabidopsis thaliana homozygous inbred lines in which a few hundred genetic markers were introduced through mutagenesis. We find that megabase-scale recombination landscapes in inbred lines are strikingly similar to the recombination landscapes in hybrids, with the notable exception of heterozygous large rearrangements where recombination is prevented locally. In addition, the megabase-scale recombination landscape can be largely explained by chromatin features. Our results show that polymorphisms are not a major determinant of the shape of the megabase-scale recombination landscape but rather favour alternative models in which recombination and chromatin shape sequence divergence across the genome. The frequency of recombination varies along chromosomes and highly correlates with sequence divergence. Here, the authors show that polymorphisms are not a major determinant of the megabase-scale recombination landscape in Arabidopsis, which is rather determined by chromatin accessibility and DNA methylation.
Collapse
Affiliation(s)
- Qichao Lian
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Victor Solier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Birgit Walkemeier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Stéphanie Durand
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Bruno Huettel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Korbinian Schneeberger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany. .,Faculty of Biology, LMU Munich, 82152, Planegg-Martinsried, Germany.
| | - Raphael Mercier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany.
| |
Collapse
|
20
|
Fournier-Level A, Taylor MA, Paril JF, Martínez-Berdeja A, Stitzer MC, Cooper MD, Roe JL, Wilczek AM, Schmitt J. Adaptive significance of flowering time variation across natural seasonal environments in Arabidopsis thaliana. THE NEW PHYTOLOGIST 2022; 234:719-734. [PMID: 35090191 DOI: 10.1111/nph.17999] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 01/04/2022] [Indexed: 06/14/2023]
Abstract
The relevance of flowering time variation and plasticity to climate adaptation requires a comprehensive empirical assessment. We investigated natural selection and the genetic architecture of flowering time in Arabidopsis through field experiments in Europe across multiple sites and seasons. We estimated selection for flowering time, plasticity and canalization. Loci associated with flowering time, plasticity and canalization by genome-wide association studies were tested for a geographic signature of climate adaptation. Selection favored early flowering and increased canalization, except at the northernmost site, but was rarely detected for plasticity. Genome-wide association studies revealed significant associations with flowering traits and supported a substantial polygenic inheritance. Alleles associated with late flowering, including functional FRIGIDA variants, were more common in regions experiencing high annual temperature variation. Flowering time plasticity to fall vs spring and summer environments was associated with GIGANTEA SUPPRESSOR 5, which promotes early flowering under decreasing day length and temperature. The finding that late flowering genotypes and alleles are associated with climate is evidence for past adaptation. Real-time phenotypic selection analysis, however, reveals pervasive contemporary selection for rapid flowering in agricultural settings across most of the species range. The response to this selection may involve genetic shifts in environmental cuing compared to the ancestral state.
Collapse
Affiliation(s)
| | - Mark A Taylor
- Department of Evolution and Ecology, University of California at Davis, Davis, CA, 95616, USA
| | - Jefferson F Paril
- School of BioSciences, The University of Melbourne, Parkville, Vic., 3010, Australia
| | | | - Michelle C Stitzer
- Department of Evolution and Ecology, University of California at Davis, Davis, CA, 95616, USA
| | - Martha D Cooper
- Department of Ecology and Evolution, Brown University, Providence, RI, 02912, USA
| | - Judith L Roe
- College of Arts and Sciences, Biology, Agricultural Science & Agribusiness, University of Maine at Presque Isle, Presque Isle, ME, 04769, USA
| | | | - Johanna Schmitt
- Department of Evolution and Ecology, University of California at Davis, Davis, CA, 95616, USA
| |
Collapse
|
21
|
Zhang H, Jiang H, Hu Z, Song Q, An YQC. Development of a versatile resource for post-genomic research through consolidating and characterizing 1500 diverse wild and cultivated soybean genomes. BMC Genomics 2022; 23:250. [PMID: 35361112 PMCID: PMC8973893 DOI: 10.1186/s12864-022-08326-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 01/20/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND With advances in next-generation sequencing technologies, an unprecedented amount of soybean accessions has been sequenced by many individual studies and made available as raw sequencing reads for post-genomic research. RESULTS To develop a consolidated and user-friendly genomic resource for post-genomic research, we consolidated the raw resequencing data of 1465 soybean genomes available in the public and 91 highly diverse wild soybean genomes newly sequenced. These altogether provided a collection of 1556 sequenced genomes of 1501 diverse accessions (1.5 K). The collection comprises of wild, landraces and elite cultivars of soybean that were grown in East Asia or major soybean cultivating areas around the world. Our extensive sequence analysis discovered 32 million single nucleotide polymorphisms (32mSNPs) and revealed a SNP density of 30 SNPs/kb and 12 non-synonymous SNPs/gene reflecting a high structural and functional genomic diversity of the new collection. Each SNP was annotated with 30 categories of structural and/or functional information. We further identified paired accessions between the 1.5 K and 20,087 (20 K) accessions in US collection as genomic "equivalent" accessions sharing the highest genomic identity for minimizing the barriers in soybean germplasm exchange between countries. We also exemplified the utility of 32mSNPs in enhancing post-genomics research through in-silico genotyping, high-resolution GWAS, discovering and/or characterizing genes and alleles/mutations, identifying germplasms containing beneficial alleles that are potentially experiencing artificial selection. CONCLUSION The comprehensive analysis of publicly available large-scale genome sequencing data of diverse cultivated accessions and the newly in-house sequenced wild accessions greatly increased the soybean genome-wide variation resolution. This could facilitate a variety of genetic and molecular-level analyses in soybean. The 32mSNPs and 1.5 K accessions with their comprehensive annotation have been made available at the SoyBase and Ag Data Commons. The dataset could further serve as a versatile and expandable core resource for exploring the exponentially increasing genome sequencing data for a variety of post-genomic research.
Collapse
Affiliation(s)
- Hengyou Zhang
- Donald Danforth Plant Science Center, St Louis, MO 63132, USA
| | - He Jiang
- Donald Danforth Plant Science Center, St Louis, MO 63132, USA
| | - Zhenbin Hu
- Donald Danforth Plant Science Center, St Louis, MO 63132, USA
| | - Qijian Song
- US Department of Agriculture, Agricultural Research Service, Soybean Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Yong-Qiang Charles An
- Donald Danforth Plant Science Center, St Louis, MO 63132, USA.
- US Department of Agriculture, Agricultural Research Service, Midwest Area, Plant Genetics Research Unit, 975 N Warson Rd, St. Louis, MO 63132, USA.
| |
Collapse
|
22
|
Zhu F, Alseekh S, Koper K, Tong H, Nikoloski Z, Naake T, Liu H, Yan J, Brotman Y, Wen W, Maeda H, Cheng Y, Fernie AR. Genome-wide association of the metabolic shifts underpinning dark-induced senescence in Arabidopsis. THE PLANT CELL 2022; 34:557-578. [PMID: 34623442 PMCID: PMC8774053 DOI: 10.1093/plcell/koab251] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 10/05/2021] [Indexed: 05/31/2023]
Abstract
Dark-induced senescence provokes profound metabolic shifts to recycle nutrients and to guarantee plant survival. To date, research on these processes has largely focused on characterizing mutants deficient in individual pathways. Here, we adopted a time-resolved genome-wide association-based approach to characterize dark-induced senescence by evaluating the photochemical efficiency and content of primary and lipid metabolites at the beginning, or after 3 or 6 days in darkness. We discovered six patterns of metabolic shifts and identified 215 associations with 81 candidate genes being involved in this process. Among these associations, we validated the roles of four genes associated with glycine, galactinol, threonine, and ornithine levels. We also demonstrated the function of threonine and galactinol catabolism during dark-induced senescence. Intriguingly, we determined that the association between tyrosine contents and TYROSINE AMINOTRANSFERASE 1 influences enzyme activity of the encoded protein and transcriptional activity of the gene under normal and dark conditions, respectively. Moreover, the single-nucleotide polymorphisms affecting the expression of THREONINE ALDOLASE 1 and the amino acid transporter gene AVT1B, respectively, only underlie the variation in threonine and glycine levels in the dark. Taken together, these results allow us to present a very detailed model of the metabolic aspects of dark-induced senescence, as well as the process itself.
Collapse
Affiliation(s)
- Feng Zhu
- National R&D Center for Citrus Preservation, Key Laboratory of Horticultural Plant Biology, Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm 14476, Germany
| | - Saleh Alseekh
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv 4000, Bulgaria
| | - Kaan Koper
- Department of Botany, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA
| | - Hao Tong
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv 4000, Bulgaria
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam 14476, Germany
| | - Zoran Nikoloski
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv 4000, Bulgaria
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam 14476, Germany
| | - Thomas Naake
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm 14476, Germany
| | - Haijun Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna 1030, Austria
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Yariv Brotman
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Potsdam-Golm 14476, Germany
- Department of Life Sciences, Ben-Gurion University of the Negev, Beersheba, Israel
| | - Weiwei Wen
- National R&D Center for Citrus Preservation, Key Laboratory of Horticultural Plant Biology, Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Hiroshi Maeda
- Department of Botany, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA
| | | | | |
Collapse
|
23
|
Deng S, Caddell DF, Xu G, Dahlen L, Washington L, Yang J, Coleman-Derr D. Genome wide association study reveals plant loci controlling heritability of the rhizosphere microbiome. THE ISME JOURNAL 2021; 15:3181-3194. [PMID: 33980999 PMCID: PMC8528814 DOI: 10.1038/s41396-021-00993-z] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 04/02/2021] [Accepted: 04/20/2021] [Indexed: 02/03/2023]
Abstract
Host genetics has recently been shown to be a driver of plant microbiome composition. However, identifying the underlying genetic loci controlling microbial selection remains challenging. Genome-wide association studies (GWAS) represent a potentially powerful, unbiased method to identify microbes sensitive to the host genotype and to connect them with the genetic loci that influence their colonization. Here, we conducted a population-level microbiome analysis of the rhizospheres of 200 sorghum genotypes. Using 16S rRNA amplicon sequencing, we identify rhizosphere-associated bacteria exhibiting heritable associations with plant genotype, and identify significant overlap between these lineages and heritable taxa recently identified in maize. Furthermore, we demonstrate that GWAS can identify host loci that correlate with the abundance of specific subsets of the rhizosphere microbiome. Finally, we demonstrate that these results can be used to predict rhizosphere microbiome structure for an independent panel of sorghum genotypes based solely on knowledge of host genotypic information.
Collapse
Affiliation(s)
- Siwen Deng
- grid.47840.3f0000 0001 2181 7878Department of Plant and Microbial Biology, University of California, Berkeley, CA USA ,grid.465232.4Plant Gene Expression Center, USDA-ARS, Albany, CA USA
| | | | - Gen Xu
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE USA ,grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE USA
| | - Lindsay Dahlen
- grid.47840.3f0000 0001 2181 7878Department of Plant and Microbial Biology, University of California, Berkeley, CA USA ,grid.27860.3b0000 0004 1936 9684Present Address: Department of Plant Sciences, University of California, Davis, CA USA
| | - Lorenzo Washington
- grid.47840.3f0000 0001 2181 7878Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - Jinliang Yang
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE USA ,grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE USA
| | - Devin Coleman-Derr
- grid.47840.3f0000 0001 2181 7878Department of Plant and Microbial Biology, University of California, Berkeley, CA USA ,grid.465232.4Plant Gene Expression Center, USDA-ARS, Albany, CA USA
| |
Collapse
|
24
|
Arouisse B, Theeuwen TPJM, van Eeuwijk FA, Kruijer W. Improving Genomic Prediction Using High-Dimensional Secondary Phenotypes. Front Genet 2021; 12:667358. [PMID: 34108993 PMCID: PMC8181460 DOI: 10.3389/fgene.2021.667358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 04/14/2021] [Indexed: 11/17/2022] Open
Abstract
In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or “secondary” traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set.
Collapse
Affiliation(s)
- Bader Arouisse
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| | - Tom P J M Theeuwen
- Laboratory of Genetics, Wageningen University and Research, Wageningen, Netherlands
| | | | - Willem Kruijer
- Biometris, Wageningen University and Research, Wageningen, Netherlands
| |
Collapse
|
25
|
Gao Y, Yang Z, Yang W, Yang Y, Gong J, Yang QY, Niu X. Plant-ImputeDB: an integrated multiple plant reference panel database for genotype imputation. Nucleic Acids Res 2021; 49:D1480-D1488. [PMID: 33137192 PMCID: PMC7779032 DOI: 10.1093/nar/gkaa953] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/23/2020] [Accepted: 10/08/2020] [Indexed: 12/21/2022] Open
Abstract
Genotype imputation is a process that estimates missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs), boost the power to identify genetic association and promote the combination of genetic studies. However, there has been a lack of high-quality reference panels for most plants, which greatly hinders the application of genotype imputation. Here, we developed Plant-ImputeDB (http://gong_lab.hzau.edu.cn/Plant_imputeDB/), a comprehensive database with reference panels of 12 plant species for online genotype imputation, SNP and block search and free download. By integrating genotype data and whole-genome resequencing data of plants from various studies and databases, the current Plant-ImputeDB provides high-quality reference panels of 12 plant species, including ∼69.9 million SNPs from 34 244 samples. It also provides an easy-to-use online tool with the option of two popular tools specifically designed for genotype imputation. In addition, Plant-ImputeDB accepts submissions of different types of genomic variations, and provides free and open access to all publicly available data in support of related research worldwide. In general, Plant-ImputeDB may serve as an important resource for plant genotype imputation and greatly facilitate the research on plant genetic research.
Collapse
Affiliation(s)
- Yingjie Gao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Zhiquan Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Wenqian Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Yanbo Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Jing Gong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China.,College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Qing-Yong Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China.,College of Agriculture, Shihezi University, Xinjiang 832003, P.R. China
| | - Xiaohui Niu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| |
Collapse
|
26
|
Kim MS, Lozano R, Kim JH, Bae DN, Kim ST, Park JH, Choi MS, Kim J, Ok HC, Park SK, Gore MA, Moon JK, Jeong SC. The patterns of deleterious mutations during the domestication of soybean. Nat Commun 2021; 12:97. [PMID: 33397978 PMCID: PMC7782591 DOI: 10.1038/s41467-020-20337-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 11/25/2020] [Indexed: 01/29/2023] Open
Abstract
Globally, soybean is a major protein and oil crop. Enhancing our understanding of the soybean domestication and improvement process helps boost genomics-assisted breeding efforts. Here we present a genome-wide variation map of 10.6 million single-nucleotide polymorphisms and 1.4 million indels for 781 soybean individuals which includes 418 domesticated (Glycine max), 345 wild (Glycine soja), and 18 natural hybrid (G. max/G. soja) accessions. We describe the enhanced detection of 183 domestication-selective sweeps and the patterns of putative deleterious mutations during domestication and improvement. This predominantly selfing species shows 7.1% reduction of overall deleterious mutations in domesticated soybean relative to wild soybean and a further 1.4% reduction from landrace to improved accessions. The detected domestication-selective sweeps also show reduced levels of deleterious alleles. Importantly, genotype imputation with this resource increases the mapping resolution of genome-wide association studies for seed protein and oil traits in a soybean diversity panel.
Collapse
Affiliation(s)
- Myung-Shin Kim
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
- Plant Immunity Research Center, Plant Genomics and Breeding Institute, College of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Korea
| | - Roberto Lozano
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Ji Hong Kim
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
| | - Dong Nyuk Bae
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
| | - Sang-Tae Kim
- Department of Life Science, The Catholic University of Korea, Bucheon, 14662, Korea
| | - Jung-Ho Park
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea
| | - Man Soo Choi
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Jaehyun Kim
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Hyun-Choong Ok
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Soo-Kwon Park
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Jung-Kyung Moon
- National Institute of Crop Science, Rural Development Administration, Wanju, Jeonbuk, 55365, Korea.
- Agricultural Genome Center, National Academy of Agricultural Sciences, Rural Development Administration, Jeonju, Jeonbuk, 55365, Korea.
| | - Soon-Chun Jeong
- Bio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Cheongju, Chungbuk, 28116, Korea.
| |
Collapse
|