1
|
Zhang Z, Wang X, Zhang Y, Zhou K, Yu G, Yang W, Li F, Guan X, Zhang X, Yang Z, Xu C, Xu Y. SPDC-HG: An accelerator of genomic hybrid breeding in maize. PLANT BIOTECHNOLOGY JOURNAL 2025; 23:1847-1861. [PMID: 40014659 PMCID: PMC12018846 DOI: 10.1111/pbi.70011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 01/22/2025] [Accepted: 02/04/2025] [Indexed: 03/01/2025]
Abstract
Integrating multiple modern breeding techniques in maize has always been challenging. This study aimed to address this issue by applying a flexible sparse partial diallel cross design composed of 945 maize hybrids derived from 266 inbred lines across different heterotic groups. The research integrated genome-wide association studies, genomic selection and genomic evaluation of parental inbred lines to accelerate the breeding process for developing single-cross hybrids. Significant associations were identified for 7-25 stable single nucleotide polymorphisms (SNPs) associated with the general combining abilities (GCAs) of nine yield-related traits. Using the maizeGDB and NCBI databases, 264 candidate genes were screened and functionally annotated based on significant SNPs detected by at least three statistical methods. The marker set developed from these GCA SNPs significantly improved the prediction accuracy of hybrids across all traits. The GCA estimates of the inbred lines involved in the top 100 and bottom 100 hybrids consistently ranked at the top and bottom, thereby confirming the accuracy of the predictions. Furthermore, the top 100 crosses selected using BayesB, GBLUP and LASSO showed a 105.4-108.6% increase in average ear weight compared to the bottom 100 crosses in field validation, demonstrating strong selection gains. Notably, amongst the top 100 hybrids, A017/A037 and A037/A169, each containing six superior genotypes were registered as Suyu 161 and Tongyu 1701, respectively, by the National Crop Variety Approval Committee in China. These results highlight the effectiveness of genomic selection and provide valuable insights for advancing genomic hybrid breeding in maize.
Collapse
Affiliation(s)
- Zhenliang Zhang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
- Jiangsu Yanjiang Institute of Agricultural SciencesNantongChina
| | - Xin Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
- College of Information EngineeringYangzhou UniversityYangzhouJiangsuChina
| | - Yuxiang Zhang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Kai Zhou
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Guangning Yu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Wenyan Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Furong Li
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Xiusheng Guan
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT)TexcocoMéxico
| | - Zefeng Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Chenwu Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| | - Yang Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Jiangsu Co‐Innovation Center for Modern Production Technology of Grain Crops, College of AgricultureYangzhou UniversityYangzhouJiangsuChina
| |
Collapse
|
2
|
Mertten D, McKenzie CM, Baldwin S, Thomson S, Souleyre EJF, Lenhard M, Datson PM. Genomic selection in a kiwiberry breeding programme: integrating intra- and inter-specific crossing. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2025; 45:31. [PMID: 40061125 PMCID: PMC11889281 DOI: 10.1007/s11032-025-01550-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 02/20/2025] [Indexed: 03/21/2025]
Abstract
Inter-specific hybridisation between natural populations within the genus Actinidia is a common phenomenon and has been used in breeding programmes. Hybridisation between species increases the diversity of breeding populations, incorporating new desirable traits into potential cultivars. We explored genomic prediction in Actinidia breeding, focusing on the closely related species Actinidia arguta and Actinidia melanandra. We investigated the potential of genomic selection by analysing four quantitative traits across intra-specific A. arguta crosses and inter-specific crosses between A. arguta and A. melanandra. The continuous distributions of the studied traits in both intra-specific and inter-specific crosses indicated a polygenic background. A linear mixed model approach was used, incorporating the factor of year of season and a marker-based relationship matrix instead of pedigree as a random effect. After evaluation, the best model was applied to assess variance components and heritability for each quantitative trait. Expanding beyond intra-specific crosses, predictive ability was calculated to investigate inter-specific cross effect. Considering predictive ability, this study explored the impacts of sample size and population structure. A reduction in sample size correlated with decreased predictive ability, while the influence of population structure was particularly pronounced in inter-specific crosses. Finally, the prediction accuracy of genomic estimated breeding values, for parental genotypes, revealed an inter-species effect on prediction confidence. Considering the imbalance in genotype numbers between intra- and inter-specific cross populations, this research highlights the difficulty of genomic prediction in hybrid populations. Understanding prediction accuracy in inter-species crossing designs provides valuable insights for optimising genomic selection. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-025-01550-8.
Collapse
Affiliation(s)
- Daniel Mertten
- The New Zealand Institute for Plant and Food Research Ltd, Auckland, 1142 New Zealand
- Institute for Biochemistry and Biology, University of Potsdam, 14476 Potsdam-Golm, Germany
| | - Catherine M. McKenzie
- The New Zealand Institute for Plant and Food Research Ltd, Te Puke, 3182 New Zealand
| | - Samantha Baldwin
- The New Zealand Institute for Plant and Food Research Ltd, Lincoln, 7608 New Zealand
| | - Susan Thomson
- The New Zealand Institute for Plant and Food Research Ltd, Lincoln, 7608 New Zealand
| | - Edwige J. F. Souleyre
- The New Zealand Institute for Plant and Food Research Ltd, Auckland, 1142 New Zealand
| | - Michael Lenhard
- Institute for Biochemistry and Biology, University of Potsdam, 14476 Potsdam-Golm, Germany
| | | |
Collapse
|
3
|
Stricker C, Fernando RL, Melchinger A, Auinger HJ, Schoen CC. On the inverse association between the number of QTL and the trait-specific genomic relationship of a candidate to the training set. Genet Sel Evol 2024; 56:75. [PMID: 39673063 PMCID: PMC11639121 DOI: 10.1186/s12711-024-00940-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 10/29/2024] [Indexed: 12/15/2024] Open
Abstract
BACKGROUND Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the Min ( N QTL , M e ) , where N QTL is the number of QTL and M e is the number of independently segregating chromosomal segments. Due to LD, the number Q e of independently segregating QTL (effective QTL) can be lower than Min ( N QTL , M e ) . In this paper, we show that Q e is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between Q e and the accuracy of prediction. METHODS To quantify the genomic relationship of a candidate to all members of the training set, we considered the k 2 statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with Q e . Simulation was used to demonstrate the dependence of the trait-specific k 2 statistic on Q e , which is related to N QTL . CONCLUSIONS The posterior distributions of the trait-specific k 2 statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to Q e and N QTL . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.
Collapse
Affiliation(s)
- Christian Stricker
- agn Genetics, Boertjistrasse 8b, Davos, 7260, Switzerland.
- Plant Breeding, School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Strasse 2, Freising, 85354, Germany.
| | - Rohan L Fernando
- Department of Animal Science, Iowa State University, Kildee Hall, Ames, 50011, IA, USA
| | - Albrecht Melchinger
- Plant Breeding, School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Strasse 2, Freising, 85354, Germany
| | - Hans-Juergen Auinger
- Plant Breeding, School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Strasse 2, Freising, 85354, Germany
| | - Chris-Carolin Schoen
- Plant Breeding, School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Strasse 2, Freising, 85354, Germany
| |
Collapse
|
4
|
Durge AR, Shrimankar DD. DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences. Curr Genomics 2024; 25:185-201. [PMID: 39087000 PMCID: PMC11288165 DOI: 10.2174/0113892029268176240125055419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 08/02/2024] Open
Abstract
Background Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets. Aim This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection-based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences. Methods The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization. Results Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model. Conclusion DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.
Collapse
Affiliation(s)
- Aditi R Durge
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| | - Deepti D Shrimankar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| |
Collapse
|
5
|
Melchinger AE, Fernando R, Melchinger AJ, Schön CC. Optimizing selection based on BLUPs or BLUEs in multiple sets of genotypes differing in their population parameters. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:104. [PMID: 38622324 PMCID: PMC11018695 DOI: 10.1007/s00122-024-04592-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 03/05/2024] [Indexed: 04/17/2024]
Abstract
KEY MESSAGE Selection response in truncation selection across multiple sets of candidates hinges on their post-selection proportions, which can deviate grossly from their initial proportions. For BLUPs, using a uniform threshold for all candidates maximizes the selection response, irrespective of differences in population parameters. Plant breeding programs typically involve multiple families from either the same or different populations, varying in means, genetic variances and prediction accuracy of BLUPs or BLUEs for true genetic values (TGVs) of candidates. We extend the classical breeder's equation for truncation selection from single to multiple sets of genotypes, indicating that the expected overall selection response ( Δ G Tot ) for TGVs depends on the selection response within individual sets and their post-selection proportions. For BLUEs, we show that maximizingΔ G Tot requires thresholds optimally tailored for each set, contingent on their population parameters. For BLUPs, we prove thatΔ G Tot is maximized by applying a uniform threshold across all candidates from all sets. We provide explicit formulas for the origin of the selected candidates from different sets and show that their proportions before and after selection can differ substantially, especially for sets with inferior properties and low proportion. We discuss implications of these results for (a) optimum allocation of resources to training and prediction sets and (b) the need to counteract narrowing the genetic variation under genomic selection. For genomic selection of hybrids based on BLUPs of GCA of their parent lines, selecting distinct proportions in the two parent populations can be advantageous, if these differ substantially in the variance and/or prediction accuracy of GCA. Our study sheds light on the complex interplay of selection thresholds and population parameters for the selection response in plant breeding programs, offering insights into the effective resource management and prudent application of genomic selection for improved crop development.
Collapse
Affiliation(s)
- Albrecht E Melchinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany.
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | | | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| |
Collapse
|
6
|
Lorenzi A, Bauland C, Pin S, Madur D, Combes V, Palaffre C, Guillaume C, Touzy G, Mary-Huard T, Charcosset A, Moreau L. Portability of genomic predictions trained on sparse factorial designs across two maize silage breeding cycles. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:75. [PMID: 38453705 PMCID: PMC11341662 DOI: 10.1007/s00122-024-04566-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 01/30/2024] [Indexed: 03/09/2024]
Abstract
KEY MESSAGE We validated the efficiency of genomic predictions calibrated on sparse factorial training sets to predict the next generation of hybrids and tested different strategies for updating predictions along generations. Genomic selection offers new prospects for revisiting hybrid breeding schemes by replacing extensive phenotyping of individuals with genomic predictions. Finding the ideal design for training genomic prediction models is still an open question. Previous studies have shown promising predictive abilities using sparse factorial instead of tester-based training sets to predict single-cross hybrids from the same generation. This study aims to further investigate the use of factorials and their optimization to predict line general combining abilities (GCAs) and hybrid values across breeding cycles. It relies on two breeding cycles of a maize reciprocal genomic selection scheme involving multiparental connected reciprocal populations from flint and dent complementary heterotic groups selected for silage performances. Selection based on genomic predictions trained on a factorial design resulted in a significant genetic gain for dry matter yield in the new generation. Results confirmed the efficiency of sparse factorial training sets to predict candidate line GCAs and hybrid values across breeding cycles. Compared to a previous study based on the first generation, the advantage of factorial over tester training sets appeared lower across generations. Updating factorial training sets by adding single-cross hybrids between selected lines from the previous generation or a random subset of hybrids from the new generation both improved predictive abilities. The CDmean criterion helped determine the set of single-crosses to phenotype to update the training set efficiently. Our results validated the efficiency of sparse factorial designs for calibrating hybrid genomic prediction experimentally and showed the benefit of updating it along generations.
Collapse
Affiliation(s)
- Alizarine Lorenzi
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
- RAGT2n, Genetics and Analytics Unit, 12510, Druelle, France
| | - Cyril Bauland
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Sophie Pin
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Delphine Madur
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Valérie Combes
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Carine Palaffre
- UE 0394 SMH, INRAE, 2297 Route de l'INRA, 40390, Saint-Martin-de-Hinx, France
| | | | - Gaëtan Touzy
- RAGT2n, Genetics and Analytics Unit, 12510, Druelle, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Alain Charcosset
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France
| | - Laurence Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Génétique Quantitative et Evolution (GQE) - Le Moulon, 91190, Gif-Sur-Yvette, France.
| |
Collapse
|
7
|
Lanzl T, Melchinger AE, Schön CC. Influence of the mating design on the additive genetic variance in plant breeding populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:236. [PMID: 37906322 PMCID: PMC10618341 DOI: 10.1007/s00122-023-04447-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 08/14/2023] [Indexed: 11/02/2023]
Abstract
KEY MESSAGE Mating designs determine the realized additive genetic variance in a population sample. Deflated or inflated variances can lead to reduced or overly optimistic assessment of future selection gains. The additive genetic variance [Formula: see text] inherent to a breeding population is a major determinant of short- and long-term genetic gain. When estimated from experimental data, it is not only the additive variances at individual loci (QTL) but also covariances between QTL pairs that contribute to estimates of [Formula: see text]. Thus, estimates of [Formula: see text] depend on the genetic structure of the data source and vary between population samples. Here, we provide a theoretical framework for calculating the expectation and variance of [Formula: see text] from genotypic data of a given population sample. In addition, we simulated breeding populations derived from different numbers of parents (P = 2, 4, 8, 16) and crossed according to three different mating designs (disjoint, factorial and half-diallel crosses). We calculated the variance of [Formula: see text] and of the parameter b reflecting the covariance component in [Formula: see text] standardized by the genic variance. Our results show that mating designs resulting in large biparental families derived from few disjoint crosses carry a high risk of generating progenies exhibiting strong covariances between QTL pairs on different chromosomes. We discuss the consequences of the resulting deflated or inflated [Formula: see text] estimates for phenotypic and genome-based selection as well as for applying the usefulness criterion in selection. We show that already one round of recombination can effectively break negative and positive covariances between QTL pairs induced by the mating design. We suggest to obtain reliable estimates of [Formula: see text] and its components in a population sample by applying statistical methods differing in their treatment of QTL covariances.
Collapse
Affiliation(s)
- Tobias Lanzl
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Albrecht E Melchinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
8
|
Melchinger AE, Fernando R, Stricker C, Schön CC, Auinger HJ. Genomic prediction in hybrid breeding: I. Optimizing the training set design. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:176. [PMID: 37532821 PMCID: PMC10397156 DOI: 10.1007/s00122-023-04413-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/23/2023] [Indexed: 08/04/2023]
Abstract
KEY MESSAGE Training sets produced by maximizing the number of parent lines, each involved in one cross, had the highest prediction accuracy for H0 hybrids, but lowest for H1 and H2 hybrids. Genomic prediction holds great promise for hybrid breeding but optimum composition of the training set (TS) as determined by the number of parents (nTS) and crosses per parent (c) has received little attention. Our objective was to examine prediction accuracy ([Formula: see text]) of GCA for lines used as parents of the TS (I1 lines) or not (I0 lines), and H0, H1 and H2 hybrids, comprising crosses of type I0 × I0, I1 × I0 and I1 × I1, respectively, as function of nTS and c. In the theory, we developed estimates for [Formula: see text] of GBLUPs for hybrids: (i)[Formula: see text] based on the expected prediction accuracy, and (ii) [Formula: see text] based on [Formula: see text] of GBLUPs of GCA and SCA effects. In the simulation part, hybrid populations were generated using molecular data from two experimental maize data sets. Additive and dominance effects of QTL borrowed from literature were used to simulate six scenarios of traits differing in the proportion (τSCA = 1%, 6%, 22%) of SCA variance in σG2 and heritability (h2 = 0.4, 0.8). Values of [Formula: see text] and [Formula: see text] closely agreed with [Formula: see text] for hybrids. For given size NTS = nTS × c of TS, [Formula: see text] of H0 hybrids and GCA of I0 lines was highest for c = 1. Conversely, for GCA of I1 lines and H1 and H2 hybrids, c = 1 yielded lowest [Formula: see text] with concordant results across all scenarios for both data sets. In view of these opposite trends, the optimum choice of c for maximizing selection response across all types of hybrids depends on the size and resources of the breeding program.
Collapse
Affiliation(s)
- Albrecht E Melchinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany.
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Christian Stricker
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Chris-Carolin Schön
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Hans-Jürgen Auinger
- Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| |
Collapse
|
9
|
Durge AR, Shrimankar DD, Sawarkar AD. Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective. Curr Genomics 2022; 23:299-317. [PMID: 36778194 PMCID: PMC9878859 DOI: 10.2174/1389202923666220927105311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 11/22/2022] Open
Abstract
Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.
Collapse
Affiliation(s)
- Aditi R. Durge
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| | - Deepti D. Shrimankar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India,Address correspondence to this author at the Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India; Tel: 9860606477; E-mail:
| | - Ankush D. Sawarkar
- Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology (VNIT), Nagpur, India
| |
Collapse
|
10
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
11
|
Atanda SA, Govindan V, Singh R, Robbins KR, Crossa J, Bentley AR. Sparse testing using genomic prediction improves selection for breeding targets in elite spring wheat. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:1939-1950. [PMID: 35348821 PMCID: PMC9205816 DOI: 10.1007/s00122-022-04085-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/16/2022] [Indexed: 06/08/2023]
Abstract
Sparse testing using genomic prediction can be efficiently used to increase the number of testing environments while maintaining selection intensity in the early yield testing stage without increasing the breeding budget. Sparse testing using genomic prediction enables expanded use of selection environments in early-stage yield testing without increasing phenotyping cost. We evaluated different sparse testing strategies in the yield testing stage of a CIMMYT spring wheat breeding pipeline characterized by multiple populations each with small family sizes of 1-9 individuals. Our results indicated that a substantial overlap between lines across environments should be used to achieve optimal prediction accuracy. As sparse testing leverages information generated within and across environments, the genetic correlations between environments and genomic relationships of lines across environments were the main drivers of prediction accuracy in multi-environment yield trials. Including information from previous evaluation years did not consistently improve the prediction performance. Genomic best linear unbiased prediction was found to be the best predictor of true breeding value, and therefore, we propose that it should be used as a selection decision metric in the early yield testing stages. We also propose it as a proxy for assessing prediction performance to mirror breeder's advancement decisions in a breeding program so that it can be readily applied for advancement decisions by breeding programs.
Collapse
Affiliation(s)
| | - Velu Govindan
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Ravi Singh
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Kelly R Robbins
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, USA
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico.
| |
Collapse
|
12
|
Theoretical and experimental assessment of genome-based prediction in landraces of allogamous crops. Proc Natl Acad Sci U S A 2022; 119:e2121797119. [PMID: 35486687 PMCID: PMC9170147 DOI: 10.1073/pnas.2121797119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
SignificanceGenetic variation inherent in landraces is essential for broadening the genetic diversity of our crops. This study pioneers the development of a theoretical framework to link molecular inventories of plant genetic resources to phenotypic variation, allowing an informed choice of landraces and their crossing partners. We show that genome-based prediction of genetic values can be implemented successfully in landrace-derived material, despite a strongly reduced level of relatedness compared with elite germplasm. Theoretical derivations are validated with unique experimental data collected on two different landraces. Our results are a pivotal contribution toward the optimization of genome-enabled prebreeding schemes.
Collapse
|
13
|
Weiß TM, Zhu X, Leiser WL, Li D, Liu W, Schipprack W, Melchinger AE, Hahn V, Würschum T. Unraveling the potential of phenomic selection within and among diverse breeding material of maize (Zea mays L.). G3 (BETHESDA, MD.) 2022; 12:6509517. [PMID: 35100379 PMCID: PMC8895988 DOI: 10.1093/g3journal/jkab445] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 12/16/2021] [Indexed: 12/19/2022]
Abstract
Genomic selection is a well-investigated approach that facilitates and supports selection decisions for complex traits and has meanwhile become a standard tool in modern plant breeding. Phenomic selection has only recently been suggested and uses the same statistical procedures to predict the targeted traits but replaces marker data with near-infrared spectroscopy data. It may represent an attractive low-cost, high-throughput alternative but has not been sufficiently studied until now. Here, we used 400 genotypes of maize (Zea mays L.) comprising elite lines of the Flint and Dent heterotic pools as well as 6 Flint landraces, which were phenotyped in multienvironment trials for anthesis-silking-interval, early vigor, final plant height, grain dry matter content, grain yield, and phosphorus concentration in the maize kernels, to compare the predictive abilities of genomic as well as phenomic prediction under different scenarios. We found that both approaches generally achieved comparable predictive abilities within material groups. However, phenomic prediction was less affected by population structure and performed better than its genomic counterpart for predictions among diverse groups of breeding material. We therefore conclude that phenomic prediction is a promising tool for practical breeding, for instance when working with unknown and rather diverse germplasm. Moreover, it may make the highly monopolized sector of plant breeding more accessible also for low-tech institutions by combining well established, widely available, and cost-efficient spectral phenotyping with the statistical procedures elaborated for genomic prediction - while achieving similar or even better results than with marker data.
Collapse
Affiliation(s)
- Thea Mi Weiß
- State Plant Breeding Institute, University of Hohenheim, Stuttgart 70593, Germany.,Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart 70593, Germany
| | - Xintian Zhu
- State Plant Breeding Institute, University of Hohenheim, Stuttgart 70593, Germany.,Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart 70593, Germany
| | - Willmar L Leiser
- State Plant Breeding Institute, University of Hohenheim, Stuttgart 70593, Germany
| | - Dongdong Li
- Key Laboratory of Crop Heterosis and Utilization, Ministry of Education, Key Laboratory of Crop Genetic Improvement, Beijing Municipality, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China
| | - Wenxin Liu
- Key Laboratory of Crop Heterosis and Utilization, Ministry of Education, Key Laboratory of Crop Genetic Improvement, Beijing Municipality, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China
| | - Wolfgang Schipprack
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart 70593, Germany
| | - Albrecht E Melchinger
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart 70593, Germany
| | - Volker Hahn
- State Plant Breeding Institute, University of Hohenheim, Stuttgart 70593, Germany
| | - Tobias Würschum
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart 70593, Germany
| |
Collapse
|
14
|
Gianola D. Opinionated Views on Genome-Assisted Inference and Prediction During a Pandemic. FRONTIERS IN PLANT SCIENCE 2021; 12:717284. [PMID: 34421971 PMCID: PMC8377666 DOI: 10.3389/fpls.2021.717284] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 06/30/2021] [Indexed: 06/13/2023]
|