1
|
Han D, Zhao X, Zhang D, Wang Z, Zhu Z, Sun H, Qu Z, Wang L, Liu Z, Zhu X, Yuan M. Genome-wide association studies reveal novel QTLs for agronomic traits in soybean. FRONTIERS IN PLANT SCIENCE 2024; 15:1375646. [PMID: 38807775 PMCID: PMC11132100 DOI: 10.3389/fpls.2024.1375646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/15/2024] [Indexed: 05/30/2024]
Abstract
Introduction Soybean, as a globally significant crop, has garnered substantial attention due to its agricultural importance. The utilization of molecular approaches to enhance grain yield in soybean has gained popularity. Methods In this study, we conducted a genome-wide association study (GWAS) using 156 Chinese soybean accessions over a two-year period. We employed the general linear model (GLM) and the mixed linear model (MLM) to analyze three agronomic traits: pod number, grain number, and grain weight. Results Our findings revealed significant associations between qgPNpP-98, qgGNpP-89 and qgHGW-85 QTLs and pod number, grain number, and grain weight, respectively. These QTLs were identified on chromosome 16, a region spanning 413171bp exhibited associations with all three traits. Discussion These QTL markers identified in this study hold potential for improving yield and agronomic traits through marker-assisted selection and genomic selection in breeding programs.
Collapse
Affiliation(s)
- Dongwei Han
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
- Heilongjiang Chinese Academy of Sciences Qiuying Zhang Soybean Scientist Studio, Qiqihar, Heilongjiang, China
| | - Xi Zhao
- Biotechnology Institute, Heilongjiang Academy of Agricultural Science, Harbin, Heilongjiang, China
| | - Di Zhang
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| | - Zhen Wang
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| | - Zhijia Zhu
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| | - Haoyue Sun
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| | - Zhongcheng Qu
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| | - Lianxia Wang
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| | - Zhangxiong Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xu Zhu
- Department of Research and Development, Ruibiotech Co., Ltd, Beijing, China
| | - Ming Yuan
- Qiqihar Branch of Heilongjiang Academy of Agricultural Science, Qiqihar, Heilongjiang, China
| |
Collapse
|
2
|
Jafari M, Daneshvar MH. Machine learning-mediated Passiflora caerulea callogenesis optimization. PLoS One 2024; 19:e0292359. [PMID: 38266002 PMCID: PMC10807783 DOI: 10.1371/journal.pone.0292359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 09/19/2023] [Indexed: 01/26/2024] Open
Abstract
Callogenesis is one of the most powerful biotechnological approaches for in vitro secondary metabolite production and indirect organogenesis in Passiflora caerulea. Comprehensive knowledge of callogenesis and optimized protocol can be obtained by the application of a combination of machine learning (ML) and optimization algorithms. In the present investigation, the callogenesis responses (i.e., callogenesis rate and callus fresh weight) of P. caerulea were predicted based on different types and concentrations of plant growth regulators (PGRs) (i.e., 2,4-dichlorophenoxyacetic acid (2,4-D), 6-benzylaminopurine (BAP), 1-naphthaleneacetic acid (NAA), and indole-3-Butyric Acid (IBA)) as well as explant types (i.e., leaf, node, and internode) using multilayer perceptron (MLP). Moreover, the developed models were integrated into the genetic algorithm (GA) to optimize the concentration of PGRs and explant types for maximizing callogenesis responses. Furthermore, sensitivity analysis was conducted to assess the importance of each input variable on the callogenesis responses. The results showed that MLP had high predictive accuracy (R2 > 0.81) in both training and testing sets for modeling all studied parameters. Based on the results of the optimization process, the highest callogenesis rate (100%) would be obtained from the leaf explant cultured in the medium supplemented with 0.52 mg/L IBA plus 0.43 mg/L NAA plus 1.4 mg/L 2,4-D plus 0.2 mg/L BAP. The results of the sensitivity analysis showed the explant-dependent impact of the exogenous application of PGRs on callogenesis. Generally, the results showed that a combination of MLP and GA can display a forward-thinking aid to optimize and predict in vitro culture systems and consequentially cope with several challenges faced currently in Passiflora tissue culture.
Collapse
Affiliation(s)
- Marziyeh Jafari
- Department of Horticultural Science, College of Agriculture, Shiraz University, Shiraz, Iran
- Department of Horticultural Sciences, Agricultural Sciences and Natural Resources University of Khuzestan, Mollasani, Iran
| | - Mohammad Hosein Daneshvar
- Department of Horticultural Sciences, Agricultural Sciences and Natural Resources University of Khuzestan, Mollasani, Iran
| |
Collapse
|
3
|
Rezaei H, Mirzaie-asl A, Abdollahi MR, Tohidfar M. Enhancing petunia tissue culture efficiency with machine learning: A pathway to improved callogenesis. PLoS One 2023; 18:e0293754. [PMID: 37922261 PMCID: PMC10624318 DOI: 10.1371/journal.pone.0293754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/18/2023] [Indexed: 11/05/2023] Open
Abstract
The important feature of petunia in tissue culture is its unpredictable and genotype-dependent callogenesis, posing challenges for efficient regeneration and biotechnology applications. To address this issue, machine learning (ML) can be considered a powerful tool to analyze callogenesis data, extract key parameters, and predict optimal conditions for petunia callogenesis, facilitating more controlled and productive tissue culture processes. The study aimed to develop a predictive model for callogenesis in petunia using ML algorithms and to optimize the concentrations of phytohormones to enhance callus formation rate (CFR) and callus fresh weight (CFW). The inputs for the model were BAP, KIN, IBA, and NAA, while the outputs were CFR and CFW. Three ML algorithms, namely MLP, RBF, and GRNN, were compared, and the results revealed that GRNN (R2≥83) outperformed MLP and RBF in terms of accuracy. Furthermore, a sensitivity analysis was conducted to determine the relative importance of the four phytohormones. IBA exhibited the highest importance, followed by NAA, BAP, and KIN. Leveraging the superior performance of the GRNN model, a genetic algorithm (GA) was integrated to optimize the concentration of phytohormones for maximizing CFR and CFW. The genetic algorithm identified an optimized combination of phytohormones consisting of 1.31 mg/L BAP, 1.02 mg/L KIN, 1.44 mg/L NAA, and 1.70 mg/L IBA, resulting in 95.83% CFR. To validate the reliability of the predicted results, optimized combinations of phytohormones were tested in a laboratory experiment. The results of the validation experiment indicated no significant difference between the experimental and optimized results obtained through the GA. This study presents a novel approach combining ML, sensitivity analysis, and GA for modeling and predicting callogenesis in petunia. The findings offer valuable insights into the optimization of phytohormone concentrations, facilitating improved callus formation and potential applications in plant tissue culture and genetic engineering.
Collapse
Affiliation(s)
- Hamed Rezaei
- Department of Plant Biotechnology, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran
| | - Asghar Mirzaie-asl
- Department of Plant Biotechnology, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran
| | - Mohammad Reza Abdollahi
- Department of Agronomy and Plant Breeding, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran
| | - Masoud Tohidfar
- Department of Plant Biotechnology, Faculty of Life Science and Biotechnology, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
4
|
Canella Vieira C, Zhou J, Jarquin D, Zhou J, Diers B, Riechers DE, Nguyen HT, Shannon G. Genetic architecture of soybean tolerance to off-target dicamba. FRONTIERS IN PLANT SCIENCE 2023; 14:1230068. [PMID: 37877091 PMCID: PMC10590897 DOI: 10.3389/fpls.2023.1230068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 09/27/2023] [Indexed: 10/26/2023]
Abstract
The adoption of dicamba-tolerant (DT) soybean in the United States resulted in extensive off-target dicamba damage to non-DT vegetation across soybean-producing states. Although soybeans are highly sensitive to dicamba, the intensity of observed symptoms and yield losses are affected by the genetic background of genotypes. Thus, the objective of this study was to detect novel marker-trait associations and expand on previously identified genomic regions related to soybean response to off-target dicamba. A total of 551 non-DT advanced breeding lines derived from 232 unique bi-parental populations were phenotyped for off-target dicamba across nine environments for three years. Breeding lines were genotyped using the Illumina Infinium BARCSoySNP6K BeadChip. Filtered SNPs were included as predictors in Random Forest (RF) and Support Vector Machine (SVM) models in a forward stepwise selection loop to identify the combination of SNPs yielding the highest classification accuracy. Both RF and SVM models yielded high classification accuracies (0.76 and 0.79, respectively) with minor extreme misclassifications (observed tolerant predicted as susceptible, and vice-versa). Eight genomic regions associated with off-target dicamba tolerance were identified on chromosomes 6 [Linkage Group (LG) C2], 8 (LG A2), 9 (LG K), 10 (LG O), and 19 (LG L). Although the genetic architecture of tolerance is complex, high classification accuracies were obtained when including the major effect SNP identified on chromosome 6 as the sole predictor. In addition, candidate genes with annotated functions associated with phases II (conjugation of hydroxylated herbicides to endogenous sugar molecules) and III (transportation of herbicide conjugates into the vacuole) of herbicide detoxification in plants were co-localized with significant markers within each genomic region. Genomic prediction models, as reported in this study, can greatly facilitate the identification of genotypes with superior tolerance to off-target dicamba.
Collapse
Affiliation(s)
- Caio Canella Vieira
- Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Jing Zhou
- Biological Systems Engineering, University of Wisconsin-Madison, Madison, WI, United States
| | - Diego Jarquin
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Jianfeng Zhou
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Brian Diers
- Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Dean E. Riechers
- Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Henry T. Nguyen
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Grover Shannon
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| |
Collapse
|
5
|
Haidar S, Lackey S, Charette M, Yoosefzadeh-Najafabadi M, Gahagan AC, Hotte T, Belzile F, Rajcan I, Golshani A, Morrison MJ, Cober ER, Samanfar B. Genome-wide analysis of cold imbibition stress in soybean, Glycine max. FRONTIERS IN PLANT SCIENCE 2023; 14:1221644. [PMID: 37670866 PMCID: PMC10476531 DOI: 10.3389/fpls.2023.1221644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/17/2023] [Indexed: 09/07/2023]
Abstract
In Canada, the length of the frost-free season necessitates planting crops as early as possible to ensure that the plants have enough time to reach full maturity before they are harvested. Early planting carries inherent risks of cold water imbibition (specifically less than 4°C) affecting seed germination. A marker dataset developed for a previously identified Canadian soybean GWAS panel was leveraged to investigate the effect of cold water imbibition on germination. Seed from a panel of 137 soybean elite cultivars, grown in the field at Ottawa, ON, over three years, were placed on filter paper in petri dishes and allowed to imbibe water for 16 hours at either 4°C or 20°C prior to being transferred to a constant 20°C. Observations on seed germination, defined as the presence of a 1 cm radicle, were done from day two to seven. A three-parameter exponential rise to a maximum equation (3PERM) was fitted to estimate germination, time to the one-half maximum germination, and germination uniformity for each cultivar. Genotype-by-sequencing was used to identify SNPs in 137 soybean lines, and using genome-wide association studies (GWAS - rMVP R package, with GLM, MLM, and FarmCPU as methods), haplotype block analysis, and assumed linkage blocks of ±100 kbp, a threshold for significance was established using the qvalue package in R, and five significant SNPs were identified on chromosomes 1, 3, 4, 6, and 13 for maximum germination after cold water imbibition. Percent of phenotypic variance explained (PVE) and allele substitution effect (ASE) eliminated two of the five candidate SNPs, leaving three QTL regions on chromosomes 3, 6, and 13 (Chr3-3419152, Chr6-5098454, and Chr13-29649544). Based on the gene ontology (GO) enrichment analysis, 14 candidate genes whose function is predicted to include germination and cold tolerance related pathways were identified as candidate genes. The identified QTLs can be used to select future soybean cultivars tolerant to cold water imbibition and mitigate risks associated with early soybean planting.
Collapse
Affiliation(s)
- Siwar Haidar
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| | - Simon Lackey
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| | - Martin Charette
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
| | | | - A. Claire Gahagan
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
| | - Thomas Hotte
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
| | - Francois Belzile
- Department of Phytology, Institut de Biologie Intégrative et des Systèmes (IBIS), Université de Laval, Quebec City, QC, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Ashkan Golshani
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| | - Malcolm J. Morrison
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
| | - Elroy R. Cober
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
| | - Bahram Samanfar
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
- Department of Biology, Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| |
Collapse
|
6
|
Jafari M, Daneshvar MH. Prediction and optimization of indirect shoot regeneration of Passiflora caerulea using machine learning and optimization algorithms. BMC Biotechnol 2023; 23:27. [PMID: 37528396 PMCID: PMC10394921 DOI: 10.1186/s12896-023-00796-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 07/21/2023] [Indexed: 08/03/2023] Open
Abstract
BACKGROUND Optimization of indirect shoot regeneration protocols is one of the key prerequisites for the development of Agrobacterium-mediated genetic transformation and/or genome editing in Passiflora caerulea. Comprehensive knowledge of indirect shoot regeneration and optimized protocol can be obtained by the application of a combination of machine learning (ML) and optimization algorithms. MATERIALS AND METHODS In the present investigation, the indirect shoot regeneration responses (i.e., de novo shoot regeneration rate, the number of de novo shoots, and length of de novo shoots) of P. caerulea were predicted based on different types and concentrations of PGRs (i.e., TDZ, BAP, PUT, KIN, and IBA) as well as callus types (i.e., callus derived from different explants including leaf, node, and internode) using generalized regression neural network (GRNN) and random forest (RF). Moreover, the developed models were integrated into the genetic algorithm (GA) to optimize the concentration of PGRs and callus types for maximizing indirect shoot regeneration responses. Moreover, sensitivity analysis was conducted to assess the importance of each input variable on the studied parameters. RESULTS The results showed that both algorithms (RF and GRNN) had high predictive accuracy (R2 > 0.86) in both training and testing sets for modeling all studied parameters. Based on the results of optimization process, the highest de novo shoot regeneration rate (100%) would be obtained from callus derived from nodal segments cultured in the medium supplemented with 0.77 mg/L BAP plus 2.41 mg/L PUT plus 0.06 mg/L IBA. The results of the sensitivity analysis showed the explant-dependent impact of exogenous application of PGRs on indirect de novo shoot regeneration. CONCLUSIONS A combination of ML (GRNN and RF) and GA can display a forward-thinking aid to optimize and predict in vitro culture systems and consequentially cope with several challenges faced currently in Passiflora tissue culture.
Collapse
Affiliation(s)
- Marziyeh Jafari
- Department of Horticultural Science, College of Agriculture, Shiraz University, Shiraz, 7144113131, Iran.
- Department of Horticultural Sciences, Agricultural Sciences and Natural Resources University of Khuzestan, Mollasani, 6341773637, Iran.
| | - Mohammad Hosein Daneshvar
- Department of Horticultural Sciences, Agricultural Sciences and Natural Resources University of Khuzestan, Mollasani, 6341773637, Iran
| |
Collapse
|
7
|
Yoosefzadeh-Najafabadi M, Torabi S, Tulpan D, Rajcan I, Eskandari M. Application of SVR-Mediated GWAS for Identification of Durable Genetic Regions Associated with Soybean Seed Quality Traits. PLANTS (BASEL, SWITZERLAND) 2023; 12:2659. [PMID: 37514272 PMCID: PMC10383196 DOI: 10.3390/plants12142659] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Soybean (Glycine max L.) is an important food-grade strategic crop worldwide because of its high seed protein and oil contents. Due to the negative correlation between seed protein and oil percentage, there is a dire need to detect reliable quantitative trait loci (QTL) underlying these traits in order to be used in marker-assisted selection (MAS) programs. Genome-wide association study (GWAS) is one of the most common genetic approaches that is regularly used for detecting QTL associated with quantitative traits. However, the current approaches are mainly focused on estimating the main effects of QTL, and, therefore, a substantial statistical improvement in GWAS is required to detect associated QTL considering their interactions with other QTL as well. This study aimed to compare the support vector regression (SVR) algorithm as a common machine learning method to fixed and random model circulating probability unification (FarmCPU), a common conventional GWAS method in detecting relevant QTL associated with soybean seed quality traits such as protein, oil, and 100-seed weight using 227 soybean genotypes. The results showed a significant negative correlation between soybean seed protein and oil concentrations, with heritability values of 0.69 and 0.67, respectively. In addition, SVR-mediated GWAS was able to identify more relevant QTL underlying the target traits than the FarmCPU method. Our findings demonstrate the potential use of machine learning algorithms in GWAS to detect durable QTL associated with soybean seed quality traits suitable for genomic-based breeding approaches. This study provides new insights into improving the accuracy and efficiency of GWAS and highlights the significance of using advanced computational methods in crop breeding research.
Collapse
Affiliation(s)
| | - Sepideh Torabi
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Dan Tulpan
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada
| |
Collapse
|
8
|
Massahiro Yassue R, Galli G, James Chen C, Fritsche‐Neto R, Morota G. Genome-wide association analysis of hyperspectral reflectance data to dissect the genetic architecture of growth-related traits in maize under plant growth-promoting bacteria inoculation. PLANT DIRECT 2023; 7:e492. [PMID: 37102161 PMCID: PMC10123960 DOI: 10.1002/pld3.492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 03/09/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
Plant growth-promoting bacteria (PGPB) may be of use for increasing crop yield and plant resilience to biotic and abiotic stressors. Using hyperspectral reflectance data to assess growth-related traits may shed light on the underlying genetics as such data can help assess biochemical and physiological traits. This study aimed to integrate hyperspectral reflectance data with genome-wide association analyses to examine maize growth-related traits under PGPB inoculation. A total of 360 inbred maize lines with 13,826 single nucleotide polymorphisms (SNPs) were evaluated with and without PGPB inoculation; 150 hyperspectral wavelength reflectances at 386-1021 nm and 131 hyperspectral indices were used in the analysis. Plant height, stalk diameter, and shoot dry mass were measured manually. Overall, hyperspectral signatures produced similar or higher genomic heritability estimates than those of manually measured phenotypes, and they were genetically correlated with manually measured phenotypes. Furthermore, several hyperspectral reflectance values and spectral indices were identified by genome-wide association analysis as potential markers for growth-related traits under PGPB inoculation. Eight SNPs were detected, which were commonly associated with manually measured and hyperspectral phenotypes. Different genomic regions were found for plant growth and hyperspectral phenotypes between with and without PGPB inoculation. Moreover, the hyperspectral phenotypes were associated with genes previously reported as candidates for nitrogen uptake efficiency, tolerance to abiotic stressors, and kernel size. In addition, a Shiny web application was developed to explore multiphenotype genome-wide association results interactively. Taken together, our results demonstrate the usefulness of hyperspectral-based phenotyping for studying maize growth-related traits in response to PGPB inoculation.
Collapse
Affiliation(s)
- Rafael Massahiro Yassue
- Department of Genetics, ‘Luiz de Queiroz’ College of AgricultureUniversity of São PauloSão PauloBrazil
- School of Animal SciencesVirginia Polytechnic Institute and State UniversityBlacksburgVirginiaUSA
| | - Giovanni Galli
- Department of Genetics, ‘Luiz de Queiroz’ College of AgricultureUniversity of São PauloSão PauloBrazil
| | - Chun‐Peng James Chen
- School of Animal SciencesVirginia Polytechnic Institute and State UniversityBlacksburgVirginiaUSA
- Center for Advanced Innovation in AgricultureVirginia Polytechnic Institute and State UniversityBlacksburgVirginiaUSA
| | - Roberto Fritsche‐Neto
- Department of Genetics, ‘Luiz de Queiroz’ College of AgricultureUniversity of São PauloSão PauloBrazil
- Quantitative Genetics and Biometrics ClusterInternational Rice Research InstituteLos BañosPhilippines
| | - Gota Morota
- School of Animal SciencesVirginia Polytechnic Institute and State UniversityBlacksburgVirginiaUSA
- Center for Advanced Innovation in AgricultureVirginia Polytechnic Institute and State UniversityBlacksburgVirginiaUSA
| |
Collapse
|
9
|
Yoosefzadeh Najafabadi M, Hesami M, Eskandari M. Machine Learning-Assisted Approaches in Modernized Plant Breeding Programs. Genes (Basel) 2023; 14:genes14040777. [PMID: 37107535 PMCID: PMC10137951 DOI: 10.3390/genes14040777] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 03/11/2023] [Accepted: 03/21/2023] [Indexed: 04/29/2023] Open
Abstract
In the face of a growing global population, plant breeding is being used as a sustainable tool for increasing food security. A wide range of high-throughput omics technologies have been developed and used in plant breeding to accelerate crop improvement and develop new varieties with higher yield performance and greater resilience to climate changes, pests, and diseases. With the use of these new advanced technologies, large amounts of data have been generated on the genetic architecture of plants, which can be exploited for manipulating the key characteristics of plants that are important for crop improvement. Therefore, plant breeders have relied on high-performance computing, bioinformatics tools, and artificial intelligence (AI), such as machine-learning (ML) methods, to efficiently analyze this vast amount of complex data. The use of bigdata coupled with ML in plant breeding has the potential to revolutionize the field and increase food security. In this review, some of the challenges of this method along with some of the opportunities it can create will be discussed. In particular, we provide information about the basis of bigdata, AI, ML, and their related sub-groups. In addition, the bases and functions of some learning algorithms that are commonly used in plant breeding, three common data integration strategies for the better integration of different breeding datasets using appropriate learning algorithms, and future prospects for the application of novel algorithms in plant breeding will be discussed. The use of ML algorithms in plant breeding will equip breeders with efficient and effective tools to accelerate the development of new plant varieties and improve the efficiency of the breeding process, which are important for tackling some of the challenges facing agriculture in the era of climate change.
Collapse
Affiliation(s)
| | - Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada
| |
Collapse
|
10
|
Canella Vieira C, Jarquin D, do Nascimento EF, Lee D, Zhou J, Smothers S, Zhou J, Diers B, Riechers DE, Xu D, Shannon G, Chen P, Nguyen HT. Identification of genomic regions associated with soybean responses to off-target dicamba exposure. FRONTIERS IN PLANT SCIENCE 2022; 13:1090072. [PMID: 36570921 PMCID: PMC9780662 DOI: 10.3389/fpls.2022.1090072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
The widespread adoption of genetically modified (GM) dicamba-tolerant (DT) soybean was followed by numerous reports of off-target dicamba damage and yield losses across most soybean-producing states. In this study, a subset of the USDA Soybean Germplasm Collection consisting of 382 genetically diverse soybean accessions originating from 15 countries was used to identify genomic regions associated with soybean response to off-target dicamba exposure. Accessions were genotyped with the SoySNP50K BeadChip and visually screened for damage in environments with prolonged exposure to off-target dicamba. Two models were implemented to detect significant marker-trait associations: the Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK) and a model that allows the inclusion of population structure in interaction with the environment (G×E) to account for variable patterns of genotype responses in different environments. Most accessions (84%) showed a moderate response, either moderately tolerant or moderately susceptible, with approximately 8% showing tolerance and susceptibility. No differences in off-target dicamba damage were observed across maturity groups and centers of origin. Both models identified significant associations in regions of chromosomes 10 and 19. The BLINK model identified additional significant marker-trait associations on chromosomes 11, 14, and 18, while the G×E model identified another significant marker-trait association on chromosome 15. The significant SNPs identified by both models are located within candidate genes possessing annotated functions involving different phases of herbicide detoxification in plants. These results entertain the possibility of developing non-GM soybean cultivars with improved tolerance to off-target dicamba exposure and potentially other synthetic auxin herbicides. Identification of genetic sources of tolerance and genomic regions conferring higher tolerance to off-target dicamba may sustain and improve the production of other non-DT herbicide soybean production systems, including the growing niche markets of organic and conventional soybean.
Collapse
Affiliation(s)
- Caio Canella Vieira
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Diego Jarquin
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Emanuel Ferrari do Nascimento
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Dongho Lee
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Jing Zhou
- Biological Systems Engineering, University of Wisconsin-Madison, Madison, WI, United States
| | - Scotty Smothers
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Jianfeng Zhou
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Brian Diers
- Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Dean E. Riechers
- Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Grover Shannon
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Pengyin Chen
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Henry T. Nguyen
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| |
Collapse
|
11
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
12
|
Rairdin A, Fotouhi F, Zhang J, Mueller DS, Ganapathysubramanian B, Singh AK, Dutta S, Sarkar S, Singh A. Deep learning-based phenotyping for genome wide association studies of sudden death syndrome in soybean. FRONTIERS IN PLANT SCIENCE 2022; 13:966244. [PMID: 36340398 PMCID: PMC9634489 DOI: 10.3389/fpls.2022.966244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 09/26/2022] [Indexed: 06/07/2023]
Abstract
Using a reliable and accurate method to phenotype disease incidence and severity is essential to unravel the complex genetic architecture of disease resistance in plants, and to develop disease resistant cultivars. Genome-wide association studies (GWAS) involve phenotyping large numbers of accessions, and have been used for a myriad of traits. In field studies, genetic accessions are phenotyped across multiple environments and replications, which takes a significant amount of labor and resources. Deep Learning (DL) techniques can be effective for analyzing image-based tasks; thus DL methods are becoming more routine for phenotyping traits to save time and effort. This research aims to conduct GWAS on sudden death syndrome (SDS) of soybean [Glycine max L. (Merr.)] using disease severity from both visual field ratings and DL-based (using images) severity ratings collected from 473 accessions. Images were processed through a DL framework that identified soybean leaflets with SDS symptoms, and then quantified the disease severity on those leaflets into a few classes with mean Average Precision of 0.34 on unseen test data. Both visual field ratings and image-based ratings identified significant single nucleotide polymorphism (SNP) markers associated with disease resistance. These significant SNP markers are either in the proximity of previously reported candidate genes for SDS or near potentially novel candidate genes. Four previously reported SDS QTL were identified that contained a significant SNPs, from this study, from both a visual field rating and an image-based rating. The results of this study provide an exciting avenue of using DL to capture complex phenotypic traits from images to get comparable or more insightful results compared to subjective visual field phenotyping of traits for disease symptoms.
Collapse
Affiliation(s)
- Ashlyn Rairdin
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Fateme Fotouhi
- Department of Mechanical Engineering, Iowa State University, Ames, IA, United States
- Department of Computer Science, Iowa State University, Ames, IA, United States
| | - Jiaoping Zhang
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Daren S. Mueller
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, United States
| | | | - Asheesh K. Singh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Somak Dutta
- Department of Statistics, Iowa State University, Ames, IA, United States
| | - Soumik Sarkar
- Department of Mechanical Engineering, Iowa State University, Ames, IA, United States
- Department of Computer Science, Iowa State University, Ames, IA, United States
| | - Arti Singh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| |
Collapse
|
13
|
Ro N, Haile M, Kim B, Cho GT, Lee J, Lee YJ, Hyun DY. Genome-Wide Association Study for Agro-Morphological Traits in Eggplant Core Collection. PLANTS (BASEL, SWITZERLAND) 2022; 11:2627. [PMID: 36235493 PMCID: PMC9571982 DOI: 10.3390/plants11192627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/01/2022] [Accepted: 10/04/2022] [Indexed: 06/16/2023]
Abstract
Eggplant is one of the most economically and nutritionally important vegetables worldwide. The study of the association of phenotypic traits with genetic factors is vital for the rapid and efficient identification and selection of eggplant genetic resources for breeding purposes with desired traits. The eggplant resources (587) collected from different countries, including Korea, were used for establishing the core collection. A total of 288 accessions were selected from 587 Solanum accessions based on 52 single nucleotide polymorphisms (SNPs) markers together with 17 morphological traits. This core collection was further used to analyze the genetic associations of eggplant morphological variations. A large variation was found among the evaluated eggplant accessions for some agro-morphological traits. Stem prickles and leaf prickles showed a significant positive correlation (r = 0.83***), followed by days to flowering and days to maturity (r = 0.64***). A total of 114,981 SNPs were filtered and used for phylogenetic tree analysis, population structure analysis, and genome-wide association study (GWAS). Among the agro-morphological traits, significantly associated SNPs were found for six traits. A total of 377 significantly associated SNPs with six agro-morphological traits were identified. These six traits and the number of SNPs were: days to maturity (51), flower size (121), fruit width (20), harvest fruit color (42), leaf prickles (38), and stem prickles (105). The largest fraction of significant SNPs (11.94%) was obtained on chromosome Ch01, followed by Ch07 and Ch06 with 11.67% and 10.08%, respectively. This study will help to develop markers linked to the most important agro-morphological traits of eggplant genetic resources and support the selection of desirable traits for eggplant breeding programs.
Collapse
Affiliation(s)
- Nayoung Ro
- National Agrobiodiversity Center, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea; (M.H.); (B.K.); (G.-T.C.); (J.L.); (Y.-J.L.)
| | - Mesfin Haile
- National Agrobiodiversity Center, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea; (M.H.); (B.K.); (G.-T.C.); (J.L.); (Y.-J.L.)
| | - Bichsaem Kim
- National Agrobiodiversity Center, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea; (M.H.); (B.K.); (G.-T.C.); (J.L.); (Y.-J.L.)
| | - Gyu-Taek Cho
- National Agrobiodiversity Center, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea; (M.H.); (B.K.); (G.-T.C.); (J.L.); (Y.-J.L.)
| | - Jungro Lee
- National Agrobiodiversity Center, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea; (M.H.); (B.K.); (G.-T.C.); (J.L.); (Y.-J.L.)
| | - Yoon-Jung Lee
- National Agrobiodiversity Center, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea; (M.H.); (B.K.); (G.-T.C.); (J.L.); (Y.-J.L.)
| | - Do Yoon Hyun
- Department of Crops and Forestry, Korea National University of Agriculture and Fisheries, Jeonju 54874, Korea;
| |
Collapse
|
14
|
Yoosefzadeh-Najafabadi M, Rajcan I, Vazin M. High-throughput plant breeding approaches: Moving along with plant-based food demands for pet food industries. Front Vet Sci 2022; 9:991844. [PMID: 36254260 PMCID: PMC9568371 DOI: 10.3389/fvets.2022.991844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/05/2022] [Indexed: 12/02/2022] Open
Affiliation(s)
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Mahsa Vazin
- PawCo Foods, San Francisco, CA, United States
- *Correspondence: Mahsa Vazin
| |
Collapse
|
15
|
Hong H, Najafabadi MY, Torkamaneh D, Rajcan I. Identification of quantitative trait loci associated with seed quality traits between Canadian and Ukrainian mega-environments using genome-wide association study. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:2515-2530. [PMID: 35716202 DOI: 10.1007/s00122-022-04134-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/17/2022] [Indexed: 06/15/2023]
Abstract
KEY MESSAGE Identifying QTL associated with soybean seed quality traits from a diverse GWAS panel cultivated in Canadian and Ukrainian mega-environments may facilitate future cultivar development for foreign markets. Understanding the complex genetic basis of seed quality traits for soybean in the mega-environments (MEs) is critical for developing a marker-assisted selection program that will lead to breeding superior cultivars adapted to specific regions. This study aimed to analyze the accumulation of 14 soybean seed quality traits in Canadian ME and two seed quality traits in Ukrainian ME and identify associated ME specific quantitative trait loci (QTLSP) and ME universal QTL (QTLU) for protein and oil using a genome-wide association study (GWAS) panel consisting of 184 soybean genotypes. The panel was planted in three locations in Canada and two locations in Ukraine in 2018 and 2019. Genotype plus genotype-by-environment biplot analysis was conducted to assess the accumulation of individual seed compounds across different locations. The protein accumulation was high in the Canadian ME and low in the Ukrainian ME, whereas the oil concentration showed the opposite trends between the two MEs. No QTLU were identified across the MEs for protein and oil concentrations. In contrast, nine Canadian QTLSP for protein were identified on various chromosomes, which were co-located with QTL controlling other traits identified in the Canadian ME. The lack of common QTLU for protein and oil suggests that it may be necessary to use QTLSP associated with these traits separately for the Canadian and Ukrainian ME. Additional Ukrainian data for seed compounds other than oil and protein are required to identify novel QTLSP and QTLU for such traits for the individual or combined Canadian and Ukrainian MEs.
Collapse
Affiliation(s)
- Huilin Hong
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada.
| |
Collapse
|
16
|
Gabur I, Simioniuc DP, Snowdon RJ, Cristea D. Machine Learning Applied to the Search for Nonlinear Features in Breeding Populations. Front Artif Intell 2022; 5:876578. [PMID: 35669178 PMCID: PMC9164111 DOI: 10.3389/frai.2022.876578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/19/2022] [Indexed: 11/13/2022] Open
Abstract
Large plant breeding populations are traditionally a source of novel allelic diversity and are at the core of selection efforts for elite material. Finding rare diversity requires a deep understanding of biological interactions between the genetic makeup of one genotype and its environmental conditions. Most modern breeding programs still rely on linear regression models to solve this problem, generalizing the complex genotype by phenotype interactions through manually constructed linear features. However, the identification of positive alleles vs. background can be addressed using deep learning approaches that have the capacity to learn complex nonlinear functions for the inputs. Machine learning (ML) is an artificial intelligence (AI) approach involving a range of algorithms to learn from input data sets and predict outcomes in other related samples. This paper describes a variety of techniques that include supervised and unsupervised ML algorithms to improve our understanding of nonlinear interactions from plant breeding data sets. Feature selection (FS) methods are combined with linear and nonlinear predictors and compared to traditional prediction methods used in plant breeding. Recent advances in ML allowed the construction of complex models that have the capacity to better differentiate between positive alleles and the genetic background. Using real plant breeding program data, we show that ML methods have the ability to outperform current approaches, increase prediction accuracies, decrease the computing time drastically, and improve the detection of important alleles involved in qualitative or quantitative traits.
Collapse
Affiliation(s)
- Iulian Gabur
- Department of Plant Breeding, Justus-Liebig-University, Giessen, Germany
- Department of Plant Sciences, Iasi University of Life Sciences, Iasi, Romania
- *Correspondence: Iulian Gabur
| | | | - Rod J. Snowdon
- Department of Plant Breeding, Justus-Liebig-University, Giessen, Germany
| | - Dan Cristea
- Institute of Computer Science, Romanian Academy, Iasi Branch, Iasi, Romania
| |
Collapse
|
17
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
18
|
Yoosefzadeh-Najafabadi M, Eskandari M, Torabi S, Torkamaneh D, Tulpan D, Rajcan I. Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components. Int J Mol Sci 2022; 23:5538. [PMID: 35628351 PMCID: PMC9141736 DOI: 10.3390/ijms23105538] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 05/11/2022] [Accepted: 05/13/2022] [Indexed: 12/14/2022] Open
Abstract
A genome-wide association study (GWAS) is currently one of the most recommended approaches for discovering marker-trait associations (MTAs) for complex traits in plant species. Insufficient statistical power is a limiting factor, especially in narrow genetic basis species, that conventional GWAS methods are suffering from. Using sophisticated mathematical methods such as machine learning (ML) algorithms may address this issue and advance the implication of this valuable genetic method in applied plant-breeding programs. In this study, we evaluated the potential use of two ML algorithms, support-vector machine (SVR) and random forest (RF), in a GWAS and compared them with two conventional methods of mixed linear models (MLM) and fixed and random model circulating probability unification (FarmCPU), for identifying MTAs for soybean-yield components. In this study, important soybean-yield component traits, including the number of reproductive nodes (RNP), non-reproductive nodes (NRNP), total nodes (NP), and total pods (PP) per plant along with yield and maturity, were assessed using a panel of 227 soybean genotypes evaluated at two locations over two years (four environments). Using the SVR-mediated GWAS method, we were able to discover MTAs colocalized with previously reported quantitative trait loci (QTL) with potential causal effects on the target traits, supported by the functional annotation of candidate gene analyses. This study demonstrated the potential benefit of using sophisticated mathematical approaches, such as SVR, in a GWAS to complement conventional GWAS methods for identifying MTAs that can improve the efficiency of genomic-based soybean-breeding programs.
Collapse
Affiliation(s)
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada; (M.Y.-N.); (S.T.); (I.R.)
| | - Sepideh Torabi
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada; (M.Y.-N.); (S.T.); (I.R.)
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC G1V 0A6, Canada;
| | - Dan Tulpan
- Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada;
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON N1G 2W1, Canada; (M.Y.-N.); (S.T.); (I.R.)
| |
Collapse
|
19
|
Canella Vieira C, Zhou J, Usovsky M, Vuong T, Howland AD, Lee D, Li Z, Zhou J, Shannon G, Nguyen HT, Chen P. Exploring Machine Learning Algorithms to Unveil Genomic Regions Associated With Resistance to Southern Root-Knot Nematode in Soybeans. FRONTIERS IN PLANT SCIENCE 2022; 13:883280. [PMID: 35592556 PMCID: PMC9111516 DOI: 10.3389/fpls.2022.883280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 04/08/2022] [Indexed: 06/15/2023]
Abstract
Southern root-knot nematode [SRKN, Meloidogyne incognita (Kofold & White) Chitwood] is a plant-parasitic nematode challenging to control due to its short life cycle, a wide range of hosts, and limited management options, of which genetic resistance is the main option to efficiently control the damage caused by SRKN. To date, a major quantitative trait locus (QTL) mapped on chromosome (Chr.) 10 plays an essential role in resistance to SRKN in soybean varieties. The confidence of discovered trait-loci associations by traditional methods is often limited by the assumptions of individual single nucleotide polymorphisms (SNPs) always acting independently as well as the phenotype following a Gaussian distribution. Therefore, the objective of this study was to conduct machine learning (ML)-based genome-wide association studies (GWAS) utilizing Random Forest (RF) and Support Vector Machine (SVM) algorithms to unveil novel regions of the soybean genome associated with resistance to SRKN. A total of 717 breeding lines derived from 330 unique bi-parental populations were genotyped with the Illumina Infinium BARCSoySNP6K BeadChip and phenotyped for SRKN resistance in a greenhouse. A GWAS pipeline involving a supervised feature dimension reduction based on Variable Importance in Projection (VIP) and SNP detection based on classification accuracy was proposed. Minor effect SNPs were detected by the proposed ML-GWAS methodology but not identified using Bayesian-information and linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and Random Model Circulating Probability Unification (FarmCPU), and Enriched Compressed Mixed Linear Model (ECMLM) models. Besides the genomic region on Chr. 10 that can explain most of SRKN resistance variance, additional minor effects SNPs were also identified on Chrs. 10 and 11. The findings in this study demonstrated that overfitting in GWAS may lead to lower prediction accuracy, and the detection of significant SNPs based on classification accuracy limited false-positive associations. The expansion of the basis of the genetic resistance to SRKN can potentially reduce the selection pressure over the major QTL on Chr. 10 and achieve higher levels of resistance.
Collapse
Affiliation(s)
- Caio Canella Vieira
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Jing Zhou
- Biological Systems Engineering, University of Wisconsin–Madison, Madison, WI, United States
| | - Mariola Usovsky
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Tri Vuong
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Amanda D. Howland
- Department of Entomology, College of Agriculture and Natural Resources, Michigan State University, East Lansing, MI, United States
| | - Dongho Lee
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Zenglu Li
- Institute of Plant Breeding, Genetics, and Genomics, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, United States
| | - Jianfeng Zhou
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Grover Shannon
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| | - Henry T. Nguyen
- Division of Plant Science and Technology, University of Missouri, Columbia, MO, United States
| | - Pengyin Chen
- Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States
| |
Collapse
|