1
|
Xavier A, Runcie D, Habier D. Megavariate methods capture complex genotype-by-environment interactions. Genetics 2025; 229:iyae179. [PMID: 39495661 PMCID: PMC12005252 DOI: 10.1093/genetics/iyae179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 10/26/2024] [Indexed: 11/06/2024] Open
Abstract
Genomic prediction models that capture genotype-by-environment (GxE) interaction are useful for predicting site-specific performance by leveraging information among related individuals and correlated environments, but implementing such models is computationally challenging. This study describes the algorithm of these scalable approaches, including 2 models with latent representations of GxE interactions, namely MegaLMM and MegaSEM, and an efficient multivariate mixed-model solver, namely Pseudo-expectation Gauss-Seidel (PEGS), fitting different covariance structures [unstructured, extended factor analytic (XFA), Heteroskedastic compound symmetry (HCS)]. Accuracy and runtime are benchmarked on simulated scenarios with varying numbers of genotypes and environments. MegaLMM and PEGS-based XFA and HCS models provided the highest accuracy under sparse testing with 100 testing environments. PEGS-based unstructured model was orders of magnitude faster than restricted maximum likelihood (REML) based multivariate genomic best linear unbiased predictions (GBLUP) while providing the same accuracy. MegaSEM provided the lowest runtime, fitting a model with 200 traits and 20,000 individuals in ∼5 min, and a model with 2,000 traits and 2,000 individuals in less than 3 min. With the genomes-to-fields data, the most accurate predictions were attained with the univariate model fitted across environments and by averaging environment-level genomic estimated breeding values (GEBVs) from models with HCS and XFA covariance structures.
Collapse
Affiliation(s)
- Alencar Xavier
- Corteva Agrisciences, Seed Product Development, 8305 NW 62nd Ave, Johnston, IA 50131, USA
- Purdue University, Department of Agronomy, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, USA
| | - Daniel Runcie
- University of California Davis, Department of Plant Sciences, One Shield Ave, Davis, CA 95616, USA
| | - David Habier
- Corteva Agrisciences, Seed Product Development, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| |
Collapse
|
2
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Trucillo Silva I, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Soriano Chavez E, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Segura Abá K, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global genotype by environment prediction competition reveals that diverse modeling strategies can deliver satisfactory maize yield estimates. Genetics 2025; 229:iyae195. [PMID: 39576009 PMCID: PMC12054733 DOI: 10.1093/genetics/iyae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/13/2024] [Indexed: 11/27/2024] Open
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023, the first open-to-the-public Genomes to Fields initiative Genotype by Environment prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements, and field management notes gathered by the project over 9 years. The competition attracted registrants from around the world with representation from academic, government, industry, and nonprofit institutions as well as unaffiliated. These participants came from diverse disciplines, including plant science, animal science, breeding, statistics, computational biology, and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved 2 models combining machine learning and traditional breeding tools: 1 model emphasized environment using features extracted by random forest, ridge regression, and least squares, and 1 focused on genetics. Other high-performing teams' methods included quantitative genetics, machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics, weather, and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D Washburn
- USDA-ARS, MWA-PGRU, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, USA
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA 50131, USA
| | - Joseph L Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - James B Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
- USDA-ARS, Plant Science Research Unit, Raleigh, NC 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology and Biostatistics and Statistics and Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr, East Lansing, MI 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology and Biostatistics and Statistics and Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr, East Lansing, MI 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | | | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA 50131, USA
| | - Renaud Rincent
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE 19808, USA
| | - Julie Aubert
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Hugo Gangloff
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris—Saclay, INRAE, CNRS, AgroParisTech, GQE—Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris—Saclay, AgroParisTech, INRAE, UMR MIA Paris—Saclay, 91120 Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R Kick
- USDA-ARS, MWA-PGRU, 302-A Curtis Hall, University of Missouri, Columbia, MO 65211, USA
| | - Emily S Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Jason L Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd, Jonesboro, AR 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska—Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Gayara D Fernando
- Department of Statistics, University of Nebraska—Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104, USA
| | - Annan J Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Max J Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - B Kirtley Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben 6466, Germany
| | | | - Hawlader A Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Monica F Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Shriprabha R Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin—Madison, 1575 Linden Drive, Madison, WI 53706, USA
| |
Collapse
|
3
|
Week B, Ralph PL, Tavalire HF, Cresko WA, Bohannan BJM. Quantitative Genetics of Microbiome Mediated Traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.16.628599. [PMID: 39763787 PMCID: PMC11702574 DOI: 10.1101/2024.12.16.628599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2025]
Abstract
Multicellular organisms host a rich assemblage of associated microorganisms, collectively known as their "microbiomes". Microbiomes have the capacity to influence their hosts' fitnesses, but the conditions under which such influences contribute to evolution are not clear. This is due in part to a lack of a comprehensive theoretical framework for describing the combined effects of host and associated microbes on phenotypic variation. Here we begin to address this gap by extending the foundations of quantitative genetic theory to include host-associated microbes, as well as alleles of hosts, as factors that explain quantitative host trait variation. We introduce a way to partition host-associated microbiomes into componenents relevant for predicting a microbiome-mediated response to selection. We then apply our general framework to a simulation model of microbiome inheritance to illustrate principles for predicting host trait dynamics, and to generalize classical narrow and broad sense heritabilities to account for microbial effects. We demonstrate that microbiome-mediated responses to host selection can arise from various transmission modes, not solely vertical, with the contribution of non-vertical modes depending on host life history. Our work lays a foundation for integrating microbiome-mediated host variation and adaptation into our understanding of natural variation.
Collapse
|
4
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Silva IT, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Chavez ES, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Abá KS, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global Genotype by Environment Prediction Competition Reveals That Diverse Modeling Strategies Can Deliver Satisfactory Maize Yield Estimates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612969. [PMID: 39345633 PMCID: PMC11429743 DOI: 10.1101/2024.09.13.612969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023 the first open-to-the-public Genomes to Fields (G2F) initiative Genotype by Environment (GxE) prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements and field management notes, gathered by the project over nine years. The competition attracted registrants from around the world with representation from academic, government, industry, and non-profit institutions as well as unaffiliated. These participants came from diverse disciplines include plant science, animal science, breeding, statistics, computational biology and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved two models combining machine learning and traditional breeding tools: one model emphasized environment using features extracted by Random Forest, Ridge Regression and Least-squares, and one focused on genetics. Other high-performing teams' methods included quantitative genetics, classical machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics; weather; and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D. Washburn
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, United States
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA, 50131, USA
| | - Joseph L. Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - James B. Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
- USDA-ARS Plant Science Research Unit, Raleigh, NC, 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Cristiano Zimmer
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Julie Aubert
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Hugo Gangloff
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R. Kick
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - Emily S. Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Jason L. Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Gayara D. Fernando
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA
| | - Annan J. Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Max J. Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - B K. Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - C. P. James Chen
- School of Animal Sciences, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Hawlader A. Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Monica F. Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Shriprabha R. Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| |
Collapse
|
5
|
Li X, Chen X, Wang Q, Yang N, Sun C. Integrating Bioinformatics and Machine Learning for Genomic Prediction in Chickens. Genes (Basel) 2024; 15:690. [PMID: 38927626 PMCID: PMC11202573 DOI: 10.3390/genes15060690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/12/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
Genomic prediction plays an increasingly important role in modern animal breeding, with predictive accuracy being a crucial aspect. The classical linear mixed model is gradually unable to accommodate the growing number of target traits and the increasingly intricate genetic regulatory patterns. Hence, novel approaches are necessary for future genomic prediction. In this study, we used an illumina 50K SNP chip to genotype 4190 egg-type female Rhode Island Red chickens. Machine learning (ML) and classical bioinformatics methods were integrated to fit genotypes with 10 economic traits in chickens. We evaluated the effectiveness of ML methods using Pearson correlation coefficients and the RMSE between predicted and actual phenotypic values and compared them with rrBLUP and BayesA. Our results indicated that ML algorithms exhibit significantly superior performance to rrBLUP and BayesA in predicting body weight and eggshell strength traits. Conversely, rrBLUP and BayesA demonstrated 2-58% higher predictive accuracy in predicting egg numbers. Additionally, the incorporation of suggestively significant SNPs obtained through the GWAS into the ML models resulted in an increase in the predictive accuracy of 0.1-27% across nearly all traits. These findings suggest the potential of combining classical bioinformatics methods with ML techniques to improve genomic prediction in the future.
Collapse
Affiliation(s)
| | | | | | | | - Congjiao Sun
- State Key Laboratory of Animal Biotech Breeding and Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing 100193, China; (X.L.); (X.C.); (Q.W.); (N.Y.)
| |
Collapse
|
6
|
Lynn SC, Dunwell JM, Whitehouse AB, Cockerton HM. Genetic loci associated with tissue-specific resistance to powdery mildew in octoploid strawberry ( Fragaria × ananassa). FRONTIERS IN PLANT SCIENCE 2024; 15:1376061. [PMID: 38742212 PMCID: PMC11089197 DOI: 10.3389/fpls.2024.1376061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/10/2024] [Indexed: 05/16/2024]
Abstract
Powdery mildew is one of the most problematic diseases in strawberry production. To date, few commercial strawberry cultivars are deemed to have complete resistance and as such, an extensive spray programme must be implemented to control the pathogen. Here, a large-scale field experiment was used to determine the powdery mildew resistance status of leaf and fruit tissues across a diverse panel of strawberry genotypes. This phenotypic data was used to identify Quantitative Trait Nucleotides (QTN) associated with tissue-specific powdery mildew resistance. In total, six stable QTN were found to be associated with foliar resistance, with one QTN on chromosome 7D associated with a 61% increase in resistance. In contrast to the foliage results, there were no QTN associated with fruit disease resistance and there was a high level of resistance observed on strawberry fruit, with no genetic correlation observed between fruit and foliar symptoms, indicating a tissue-specific response. Beyond the identification of genetic loci, we also demonstrate that genomic selection can lead to rapid gains in foliar resistance across genotypes, with the potential to capture >50% of the genetic foliage resistance present in the population. To date, breeding of robust powdery mildew resistance in strawberry has been impeded by the quantitative nature of natural resistance and a lack of knowledge relating to the genetic control of the trait. These results address this shortfall, through providing the community with a wealth of information that could be utilized for genomic informed breeding, implementation of which could deliver a natural resistance strategy for combatting powdery mildew.
Collapse
Affiliation(s)
- Samantha C. Lynn
- Genetics, Genomics and Breeding, National Institute of Agricultural Botany (NIAB), Kent, United Kingdom
- Crop Science, University of Reading, Reading, United Kingdom
| | - Jim M. Dunwell
- Crop Science, University of Reading, Reading, United Kingdom
| | - Adam B. Whitehouse
- Genetics, Genomics and Breeding, National Institute of Agricultural Botany (NIAB), Kent, United Kingdom
| | | |
Collapse
|
7
|
Miller MJ, Song Q, Li Z. Genomic selection of soybean (Glycine max) for genetic improvement of yield and seed composition in a breeding context. THE PLANT GENOME 2023; 16:e20384. [PMID: 37749946 DOI: 10.1002/tpg2.20384] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/19/2023] [Accepted: 08/01/2023] [Indexed: 09/27/2023]
Abstract
Genomic selection has been utilized for genetic improvement in both plant and animal breeding and is a favorable technique for quantitative trait development. Within this study, genomic selection was evaluated within a breeding program, using novel validation methods in addition to plant materials and data from a commercial soybean (Glycine max) breeding program. A total of 1501 inbred lines were used to test multiple genomic selection models for multiple traits. Validation included cross-validation, inter-environment, and empirical validation. The results indicated that the extended genomic best linear unbiased prediction (EGBLUP) model was the most effective model tested for yield, protein, and oil in cross-validation with accuracies of 0.50, 0.68, and 0.64, respectively. Increasing marker number from 1000 to 3000 to 6000 single nucleotide polymorphism markers leads to statistically significant increases in accuracy. Cross-environment predictions were statistically lower than cross-validation with accuracies of 0.24, 0.54, and 0.42 for yield, protein, and oil, respectively, using the extended genomic BLUP model. Empirical validation, predicting the yield of 510 soybean lines, had a prediction accuracy of 0.34, with the inclusion of a maturity covariate leading to a notable increase in accuracy. Genomic selection identified high-performance lines in inter-environment predictions: 34% of lines within the upper quartile of yield, and 51% and 48% of the highest quartile protein and oil lines, respectively. Statistically similar results occurred comparing rankings in empirical validation and selection for advancements in yield trials. These results indicate that genomic selection is a useful tool for selection decisions.
Collapse
Affiliation(s)
- Mark J Miller
- Institute of Plant Breeding, Genetics and Genomics, and Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia, USA
| | | | - Zenglu Li
- Institute of Plant Breeding, Genetics and Genomics, and Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia, USA
| |
Collapse
|
8
|
Hu X, Jiang X, Li J, Zhao N, Gan H, Hu X, Li L, Liu X, Shan H, Bai Y, Pang P. Identification of potential genetic Loci and polygenic risk model for Budd-Chiari syndrome in Chinese population. iScience 2023; 26:107287. [PMID: 37539039 PMCID: PMC10393737 DOI: 10.1016/j.isci.2023.107287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 08/05/2023] Open
Abstract
Budd-Chiari syndrome (BCS) is characterized by hepatic venous outflow obstruction, posing life-threatening risks in severe cases. Reported risk factors include inherited and acquired hypercoagulable states or other predisposing factors. However, many patients have no identifiable etiology, and causes of BCS differ between the West and East. This study recruited 500 BCS patients and 696 normal individuals for whole-exome sequencing and developed a polygenic risk scoring (PRS) model using PLINK, LASSOSUM, BLUP, and BayesA methods. Risk factors for venous thromboembolism and vascular malformations were also assessed for BCS risk prediction. Ultimately, we discovered potential BCS risk mutations, such as rs1042331, and the optimal BayesA-generated PRS model presented an AUC >0.9 in the external replication cohort. This model provides particular insights into genetic risk differences between China and the West and suggests shared genetic risks among BCS, venous thromboembolism, and vascular malformations, offering different perspectives on BCS pathogenesis.
Collapse
Affiliation(s)
- Xiaojun Hu
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xiaosen Jiang
- BGI-Shenzhen, Shenzhen, China
- College of Life Sciences, University of the Chinese Academy of Sciences, Beijing, China
| | - Jia Li
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
- Hebei Industrial Technology Research Institute of Genomics in Maternal & Child Health, Shijiazhuang BGI Genomics Co., Ltd, Shijiazhuang, China
| | - Ni Zhao
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Hairun Gan
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xinyan Hu
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Luting Li
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xingtao Liu
- Changfeng Hospital of Jinjiang District, Chengdu, China
| | - Hong Shan
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | | | - Pengfei Pang
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
- Guangdong Provincial Key Laboratory of Biomedical Imaging, Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
- Guangdong Provincial Engineering Research Center of Molecular Imaging, Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
| |
Collapse
|
9
|
Miller MJ, Song Q, Fallen B, Li Z. Genomic prediction of optimal cross combinations to accelerate genetic improvement of soybean ( Glycine max). FRONTIERS IN PLANT SCIENCE 2023; 14:1171135. [PMID: 37235007 PMCID: PMC10206060 DOI: 10.3389/fpls.2023.1171135] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 04/17/2023] [Indexed: 05/28/2023]
Abstract
Improving yield is a primary soybean breeding goal, as yield is the main determinant of soybean's profitability. Within the breeding process, selection of cross combinations is one of most important elements. Cross prediction will assist soybean breeders in identifying the best cross combinations among parental genotypes prior to crossing, increasing genetic gain and breeding efficiency. In this study optimal cross selection methods were created and applied in soybean and validated using historical data from the University of Georgia soybean breeding program, under multiple training set compositions and marker densities utilizing multiple genomic selection models for marker evaluation. Plant materials consisted of 702 advanced breeding lines evaluated in multiple environments and genotyped using SoySNP6k BeadChips. An additional marker set, the SoySNP3k marker set, was tested in this study as well. Optimal cross selection methods were used to predict the yield of 42 previously made crosses and compared to the performance of the cross's offspring in replicated field trials. The best prediction accuracy was obtained when using Extended Genomic BLUP with the SoySNP6k marker set, consisting of 3,762 polymorphic markers, with an accuracy of 0.56 with a training set maximally related to the crosses predicted and 0.4 in a training set with minimized relatedness to predicted crosses. Prediction accuracy was most significantly impacted by training set relatedness to the predicted crosses, marker density, and the genomic model used to predict marker effects. The usefulness criterion selected had an impact on prediction accuracy within training sets with low relatedness to the crosses predicted. Optimal cross prediction provides a useful method that assists plant breeders in selecting crosses in soybean breeding.
Collapse
Affiliation(s)
- Mark J. Miller
- Institute of Plant Breeding, Genetics and Genomics, and Department of Crop and Soil Sciences, University of Georgia, Athens, GA, United States
| | - Qijian Song
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture - Agricultural Research Service, Beltsville, MD, United States
| | - Benjamin Fallen
- Soybean and Nitrogen Fixation Research Unit, United States Department of Agriculture - Agricultural Research Service, Raleigh, NC, United States
| | - Zenglu Li
- Institute of Plant Breeding, Genetics and Genomics, and Department of Crop and Soil Sciences, University of Georgia, Athens, GA, United States
| |
Collapse
|
10
|
Lopez MA, Moreira FF, Hearst A, Cherkauer K, Rainey KM. Physiological breeding for yield improvement in soybean: solar radiation interception-conversion, and harvest index. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:1477-1491. [PMID: 35275253 DOI: 10.1007/s00122-022-04048-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 01/27/2022] [Indexed: 06/14/2023]
Abstract
KEY MESSAGE Efficiency of light interception, Radiation use efficiency and harvest index can be used as targets to improve grain yield potential in soybean. Grain yield (GY) production can be expressed as the result of three main efficiencies: light interception (Ei), radiation use (RUE), and harvest index (HI). Although dissecting GY through these three efficiencies is not entirely new, there is a lack of knowledge about the phenotypic variation, the genetic architecture, and the relative contribution of these three efficiencies on GY in soybean. This knowledge gap coupled with laborious phenotyping prevents the active consideration of these efficiencies into breeding programs. This study aims to reveal the phenotypic variation, heritability, genetic relationships, genetic architecture, and genomic prediction for Ei, RUE, and HI in soybean. We evaluated a maturity control panel of 383 Recombinant Inbred Lines (RILs) selected from the soybean nested association mapping (SoyNAM) population. Dry matter ground measured along with canopy coverage (CC) from UAS imagery were collected in three environments. Light interception was modeled through a logistic curve using CC as a proxy. The total above-ground biomass collected during the growing season and its respective cumulative light intercepted were used to derive RUE through linear models fitting. Additive-genetic correlations, genome-wide association (GWA) and whole-genome regressions (WGR) were performed to evaluate the relationship between traits, their association with genomic regions, and the feasibility of predicting these efficiencies with genomic information. Correlation analyses considered three groups: the entire data set, and the high- and low-yielding RILs to determine association as a function of the GY. Our results revealed moderate to high phenotypic variation for Ei, RUE, and HI with ranges of 8.5%, 1.1 g MJ-1, and 0.2, respectively. Additive-genetic correlation revealed a strong relationship of GY with HI and moderate with RUE and Ei when whole data set was considered, but negligible contribution of HI on GY when just the top 100 was analyzed. The GWA analyses showed that Ei is associated with three SNPs; two of them located on chromosome 7 and one on chromosome 11 with no previous quantitative trait loci (QTLs) reported for these regions. RUE is associated with four SNPs on chromosomes 1, 7, 11, and 18. Some of these QTLs are novel, while others are previously documented for plant architecture and chlorophyll content. Two SNPs positioned on chromosome 13 and 15 with previous QTLs reported for plant height and seed set, weight and abortion were associated with HI. WGR showed high predictive ability for Ei, RUE, and HI with maximum correlation ranging between 0.75 and 0.80. Future improvements in GY can be expected through strategies prioritizing Ei for short-term results when using high yielding germplasm and RUE for medium- and long-term outcomes. This work is a pioneer attempt to integrate traditional physiological traits into the breeding process in the context of physiological breeding.
Collapse
Affiliation(s)
| | | | - Anthony Hearst
- Department of Agricultural and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | - Keith Cherkauer
- Department of Agricultural and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | | |
Collapse
|
11
|
Xu W, Liu X, Liao M, Xiao S, Zheng M, Yao T, Chen Z, Huang L, Zhang Z. FMixFN: A Fast Big Data-Oriented Genomic Selection Model Based on an Iterative Conditional Expectation algorithm. Front Genet 2021; 12:721600. [PMID: 34868200 PMCID: PMC8637923 DOI: 10.3389/fgene.2021.721600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/22/2021] [Indexed: 11/13/2022] Open
Abstract
Genomic selection is an approach to select elite breeding stock based on the use of dense genetic markers and that has led to the development of various models to derive a predictive equation. However, the current genomic selection software faces several issues such as low prediction accuracy, low computational efficiency, or an inability to handle large-scale sample data. We report the development of a genomic prediction model named FMixFN with four zero-mean normal distributions as the prior distributions to optimize the predictive ability and computing efficiency. The variance of the prior distributions in our model is precisely determined based on an F2 population, and genomic estimated breeding values (GEBV) can be obtained accurately and quickly in combination with an iterative conditional expectation algorithm. We demonstrated that FMixFN improves computational efficiency and predictive ability compared to other methods, such as GBLUP, SSgblup, MIX, BayesR, BayesA, and BayesB. Most importantly, FMixFN may handle large-scale sample data, and thus should be able to meet the needs of large breeding companies or combined breeding schedules. Our study developed a Bayes genomic selection model called FMixFN, which combines stable predictive ability and high computational efficiency, and is a big data-oriented genomic selection model that has potential in the future. The FMixFN method can be freely accessed at https://zenodo.org/record/5560913 (DOI: 10.5281/zenodo.5560913).
Collapse
Affiliation(s)
- Wenwu Xu
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Xiaodong Liu
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Mingfu Liao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Shijun Xiao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Min Zheng
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Tianxiong Yao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Zuoquan Chen
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Lusheng Huang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Zhiyan Zhang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
12
|
da Silva ÉDB, Xavier A, Faria MV. Impact of Genomic Prediction Model, Selection Intensity, and Breeding Strategy on the Long-Term Genetic Gain and Genetic Erosion in Soybean Breeding. Front Genet 2021; 12:637133. [PMID: 34539725 PMCID: PMC8440908 DOI: 10.3389/fgene.2021.637133] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 08/05/2021] [Indexed: 11/21/2022] Open
Abstract
Genomic-assisted breeding has become an important tool in soybean breeding. However, the impact of different genomic selection (GS) approaches on short- and long-term gains is not well understood. Such gains are conditional on the breeding design and may vary with a combination of the prediction model, family size, selection strategies, and selection intensity. To address these open questions, we evaluated various scenarios through a simulated closed soybean breeding program over 200 breeding cycles. Genomic prediction was performed using genomic best linear unbiased prediction (GBLUP), Bayesian methods, and random forest, benchmarked against selection on phenotypic values, true breeding values (TBV), and random selection. Breeding strategies included selections within family (WF), across family (AF), and within pre-selected families (WPSF), with selection intensities of 2.5, 5.0, 7.5, and 10.0%. Selections were performed at the F4 generation, where individuals were phenotyped and genotyped with a 6K single nucleotide polymorphism (SNP) array. Initial genetic parameters for the simulation were estimated from the SoyNAM population. WF selections provided the most significant long-term genetic gains. GBLUP and Bayesian methods outperformed random forest and provided most of the genetic gains within the first 100 generations, being outperformed by phenotypic selection after generation 100. All methods provided similar performances under WPSF selections. A faster decay in genetic variance was observed when individuals were selected AF and WPSF, as 80% of the genetic variance was depleted within 28-58 cycles, whereas WF selections preserved the variance up to cycle 184. Surprisingly, the selection intensity had less impact on long-term gains than did the breeding strategies. The study supports that genetic gains can be optimized in the long term with specific combinations of prediction models, family size, selection strategies, and selection intensity. A combination of strategies may be necessary for balancing the short-, medium-, and long-term genetic gains in breeding programs while preserving the genetic variance.
Collapse
Affiliation(s)
| | - Alencar Xavier
- Department of Biostatistics, Corteva Agriscience, Johnston, IA, United States
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Marcos Ventura Faria
- Department of Agronomy, Universidade Estadual do Centro-Oeste, Guarapuava, Brazil
| |
Collapse
|
13
|
Beche E, Gillman JD, Song Q, Nelson R, Beissinger T, Decker J, Shannon G, Scaboo AM. Genomic prediction using training population design in interspecific soybean populations. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:15. [PMID: 37309481 PMCID: PMC10236090 DOI: 10.1007/s11032-021-01203-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 01/11/2021] [Indexed: 06/14/2023]
Abstract
Agronomically important traits generally have complex genetic architecture, where many genes have a small and largely additive effect. Genomic prediction has been demonstrated to increase genetic gain and efficiency in plant breeding programs beyond marker-assisted selection and phenotypic selection. The objective of this study was to evaluate the impact of allelic origin, marker density, training population size, and cross-validation schemes on the accuracy of genomic prediction models in an interspecific soybean nested association mapping (NAM) panel. Three cross-validation schemes were used: (a) Within-Family (WF): training population and predictions are made exclusively within each family; (b) Across All families (AF): all the individuals from the three families were randomly assigned to either the training or validation set; (c) Leave one Family out (LFO): each family is predicted using a training set that contains the other two families. Predictive abilities increased with training population size up to 350 individuals, but no significant gains were noted beyond 250 individuals in the training population. The number of markers had a limited impact on the observed predictive ability across traits; increasing markers used in the model above 1000 revealed no significant increases in prediction accuracy. Predictive abilities for AF were not significantly different from the WF method, and predictive abilities across populations for the WF method had a range of 0.58 to 0.70 for maturity, protein, meal, and oil. Our results also showed encouraging prediction accuracies for grain yield (0.58-0.69) using the WF method. Partitioning genomic prediction between G. max and G. soja alleles revealed useful information to select material with a larger allele contribution from both parents and could accelerate allele introgression from exotic germplasm into the elite soybean gene pool. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-021-01203-6.
Collapse
Affiliation(s)
- Eduardo Beche
- Division of Plant Science, University of Missouri, Columbia, MO USA
| | | | - Qijian Song
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD USA
| | - Randall Nelson
- Department of Crop Sciences, University of Illinois, and USDA-Agricultural Research Service (retired), 1101 W. Peabody Dr., Urbana, IL 61801 USA
| | - Tim Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, Georg-August-Universität, Göttingen, Germany
| | - Jared Decker
- Division of Animal Science, University of Missouri, Columbia, MO USA
| | - Grover Shannon
- Division of Plant Science, University of Missouri, Columbia, MO USA
| | - Andrew M. Scaboo
- Division of Plant Science, University of Missouri, Columbia, MO USA
| |
Collapse
|
14
|
Xavier A, Rainey KM. Quantitative Genomic Dissection of Soybean Yield Components. G3 (BETHESDA, MD.) 2020; 10:665-675. [PMID: 31818873 PMCID: PMC7003100 DOI: 10.1534/g3.119.400896] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/06/2019] [Indexed: 11/25/2022]
Abstract
Soybean is a crop of major economic importance with low rates of genetic gains for grain yield compared to other field crops. A deeper understanding of the genetic architecture of yield components may enable better ways to tackle the breeding challenges. Key yield components include the total number of pods, nodes and the ratio pods per node. We evaluated the SoyNAM population, containing approximately 5600 lines from 40 biparental families that share a common parent, in 6 environments distributed across 3 years. The study indicates that the yield components under evaluation have low heritability, a reasonable amount of epistatic control, and partially oligogenic architecture: 18 quantitative trait loci were identified across the three yield components using multi-approach signal detection. Genetic correlation between yield and yield components was highly variable from family-to-family, ranging from -0.2 to 0.5. The genotype-by-environment correlation of yield components ranged from -0.1 to 0.4 within families. The number of pods can be utilized for indirect selection of yield. The selection of soybean for enhanced yield components can be successfully performed via genomic prediction, but the challenging data collections necessary to recalibrate models over time makes the introgression of QTL a potentially more feasible breeding strategy. The genomic prediction of yield components was relatively accurate across families, but less accurate predictions were obtained from within family predictions and predicting families not observed included in the calibration set.
Collapse
Affiliation(s)
- Alencar Xavier
- Department of Agronomy, Purdue University, West Lafayette IN 47907 and
- Department of Biostatistics, Corteva Agrisciences, Johnston IA 50131
| | - Katy M Rainey
- Department of Agronomy, Purdue University, West Lafayette IN 47907 and
| |
Collapse
|