1
|
Xie Z, Xu X, Li L, Wu C, Ma Y, He J, Wei S, Wang J, Feng X. Residual networks without pooling layers improve the accuracy of genomic predictions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:138. [PMID: 38771334 DOI: 10.1007/s00122-024-04649-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 05/10/2024] [Indexed: 05/22/2024]
Abstract
KEY MESSAGE Residual neural network genomic selection is the first GS algorithm to reach 35 layers, and its prediction accuracy surpasses previous algorithms. With the decrease in DNA sequencing costs and the development of deep learning, phenotype prediction accuracy by genomic selection (GS) continues to improve. Residual networks, a widely validated deep learning technique, are introduced to deep learning for GS. Since each locus has a different weighted impact on the phenotype, strided convolutions are more suitable for GS problems than pooling layers. Through the above technological innovations, we propose a GS deep learning algorithm, residual neural network for genomic selection (ResGS). ResGS is the first neural network to reach 35 layers in GS. In 15 cases from four public data, the prediction accuracy of ResGS is higher than that of ridge-regression best linear unbiased prediction, support vector regression, random forest, gradient boosting regressor, and deep neural network genomic prediction in most cases. ResGS performs well in dealing with gene-environment interaction. Phenotypes from other environments are imported into ResGS along with genetic data. The prediction results are much better than just providing genetic data as input, which demonstrates the effectiveness of GS multi-modal learning. Standard deviation is recommended as an auxiliary GS evaluation metric, which could improve the distribution of predicted results. Deep learning for GS, such as ResGS, is becoming more accurate in phenotype prediction.
Collapse
Affiliation(s)
| | - Xiaogang Xu
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, 310012, China.
| | - Ling Li
- Zhejiang Laboratory, Hangzhou, 311100, China
| | - Cuiling Wu
- Zhejiang Laboratory, Hangzhou, 311100, China
| | - Yinxing Ma
- Zhejiang Laboratory, Hangzhou, 311100, China
| | - Jingjing He
- Zhejiang Laboratory, Hangzhou, 311100, China
| | - Sidi Wei
- Zhejiang Laboratory, Hangzhou, 311100, China
| | - Jun Wang
- Zhejiang Laboratory, Hangzhou, 311100, China
| | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, 130102, China
| |
Collapse
|
2
|
Montesinos-López OA, Crespo-Herrera L, Pierre CS, Cano-Paez B, Huerta-Prado GI, Mosqueda-González BA, Ramos-Pulido S, Gerard G, Alnowibet K, Fritsche-Neto R, Montesinos-López A, Crossa J. Feature engineering of environmental covariates improves plant genomic-enabled prediction. FRONTIERS IN PLANT SCIENCE 2024; 15:1349569. [PMID: 38812738 PMCID: PMC11135473 DOI: 10.3389/fpls.2024.1349569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/11/2024] [Indexed: 05/31/2024]
Abstract
Introduction Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.
Collapse
Affiliation(s)
| | | | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacioanl Autónoma de México (UNAM), México City, Mexico
| | | | | | - Sofia Ramos-Pulido
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Guillermo Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
| | - Khalid Alnowibet
- Department of Statistics and Operations Research, King Saud University, Riyah, Saudi Arabia
| | | | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Edo. de Mexico, Mexico
- Louisiana State University, Baton Rouge, LA, United States
- Distinguished Scientist Fellowship Program, King Saud University, Riyah, Saudi Arabia
- Instituto de Socieconomia, Estadistica e Informatica, Colegio de Postgraduados, Montecillos, Edo. de México, Texcoco, Mexico
| |
Collapse
|
3
|
Duan H, Dai X, Shi Q, Cheng Y, Ge Y, Chang S, Liu W, Wang F, Shi H, Hu J. Enhancing genome-wide populus trait prediction through deep convolutional neural networks. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 38741374 DOI: 10.1111/tpj.16790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/02/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024]
Abstract
As a promising model, genome-based plant breeding has greatly promoted the improvement of agronomic traits. Traditional methods typically adopt linear regression models with clear assumptions, neither obtaining the linkage between phenotype and genotype nor providing good ideas for modification. Nonlinear models are well characterized in capturing complex nonadditive effects, filling this gap under traditional methods. Taking populus as the research object, this paper constructs a deep learning method, DCNGP, which can effectively predict the traits including 65 phenotypes. The method was trained on three datasets, and compared with other four classic models-Bayesian ridge regression (BRR), Elastic Net, support vector regression, and dualCNN. The results show that DCNGP has five typical advantages in performance: strong prediction ability on multiple experimental datasets; the incorporation of batch normalization layers and Early-Stopping technology enhancing the generalization capabilities and prediction stability on test data; learning potent features from the data and thus circumventing the tedious steps of manual production; the introduction of a Gaussian Noise layer enhancing predictive capabilities in the case of inherent uncertainties or perturbations; fewer hyperparameters aiding to reduce tuning time across datasets and improve auto-search efficiency. In this way, DCNGP shows powerful predictive ability from genotype to phenotype, which provide an important theoretical reference for building more robust populus breeding programs.
Collapse
Affiliation(s)
- Huaichuan Duan
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Xiangwei Dai
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
| | - Quanshan Shi
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Yan Cheng
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
| | - Yutong Ge
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Shan Chang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
| | - Wei Liu
- School of Life Science, Leshan Normal University, Leshan, China
| | - Feng Wang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
- School of Computer Engineering, Suzhou Vocational University, Suzhou, China
| | - Hubing Shi
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
| | - Jianping Hu
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| |
Collapse
|
4
|
Li H, Li X, Zhang P, Feng Y, Mi J, Gao S, Sheng L, Ali M, Yang Z, Li L, Fang W, Wang W, Qian Q, Gu F, Zhou W. Smart Breeding Platform: A web-based tool for high-throughput population genetics, phenomics, and genomic selection. MOLECULAR PLANT 2024; 17:677-681. [PMID: 38449308 DOI: 10.1016/j.molp.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/04/2024] [Accepted: 03/04/2024] [Indexed: 03/08/2024]
Affiliation(s)
- Huihui Li
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China.
| | - Xin Li
- DAMO Academy, Alibaba Group, Hangzhou 310023, China; Hupan Lab, Hangzhou 310023, China
| | - Peng Zhang
- DAMO Academy, Alibaba Group, Hangzhou 310023, China; Hupan Lab, Hangzhou 310023, China
| | - Yingwei Feng
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | - Junri Mi
- DAMO Academy, Alibaba Group, Hangzhou 310023, China; Hupan Lab, Hangzhou 310023, China
| | - Shang Gao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | - Lele Sheng
- DAMO Academy, Alibaba Group, Hangzhou 310023, China; Hupan Lab, Hangzhou 310023, China
| | - Mohsin Ali
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | - Zikun Yang
- DAMO Academy, Alibaba Group, Hangzhou 310023, China; Hupan Lab, Hangzhou 310023, China
| | - Liang Li
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China
| | - Wensheng Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | - Qian Qian
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China; Yazhouwan National Laboratory, No. 8 Huanjin Road, Yazhou District, Sanya City, Hainan Province 572024, China; State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou, Zhejiang 310006, China
| | - Fei Gu
- DAMO Academy, Alibaba Group, Hangzhou 310023, China; Hupan Lab, Hangzhou 310023, China.
| | - Wenbin Zhou
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100081, China
| |
Collapse
|
5
|
Yang X, Yu S, Yan S, Wang H, Fang W, Chen Y, Ma X, Han L. Progress in Rice Breeding Based on Genomic Research. Genes (Basel) 2024; 15:564. [PMID: 38790193 PMCID: PMC11121554 DOI: 10.3390/genes15050564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/18/2024] [Accepted: 04/25/2024] [Indexed: 05/26/2024] Open
Abstract
The role of rice genomics in breeding progress is becoming increasingly important. Deeper research into the rice genome will contribute to the identification and utilization of outstanding functional genes, enriching the diversity and genetic basis of breeding materials and meeting the diverse demands for various improvements. Here, we review the significant contributions of rice genomics research to breeding progress over the last 25 years, discussing the profound impact of genomics on rice genome sequencing, functional gene exploration, and novel breeding methods, and we provide valuable insights for future research and breeding practices.
Collapse
Affiliation(s)
- Xingye Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China;
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Hao Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Yanqing Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Xiaoding Ma
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Longzhi Han
- National Crop Genebank, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
6
|
He L, Sui Y, Che Y, Liu L, Liu S, Wang X, Cao G. New Insights into the Genetic Basis of Lysine Accumulation in Rice Revealed by Multi-Model GWAS. Int J Mol Sci 2024; 25:4667. [PMID: 38731885 PMCID: PMC11083390 DOI: 10.3390/ijms25094667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 04/21/2024] [Accepted: 04/22/2024] [Indexed: 05/13/2024] Open
Abstract
Lysine is an essential amino acid that cannot be synthesized in humans. Rice is a global staple food for humans but has a rather low lysine content. Identification of the quantitative trait nucleotides (QTNs) and genes underlying lysine content is crucial to increase lysine accumulation. In this study, five grain and three leaf lysine content datasets and 4,630,367 single nucleotide polymorphisms (SNPs) of 387 rice accessions were used to perform a genome-wide association study (GWAS) by ten statistical models. A total of 248 and 71 common QTNs associated with grain/leaf lysine content were identified. The accuracy of genomic selection/prediction RR-BLUP models was up to 0.85, and the significant correlation between the number of favorable alleles per accession and lysine content was up to 0.71, which validated the reliability and additive effects of these QTNs. Several key genes were uncovered for fine-tuning lysine accumulation. Additionally, 20 and 30 QTN-by-environment interactions (QEIs) were detected in grains/leaves. The QEI-sf0111954416 candidate gene LOC_Os01g21380 putatively accounted for gene-by-environment interaction was identified in grains. These findings suggested the application of multi-model GWAS facilitates a better understanding of lysine accumulation in rice. The identified QTNs and genes hold the potential for lysine-rich rice with a normal phenotype.
Collapse
Affiliation(s)
- Liqiang He
- School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Yao Sui
- School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Yanru Che
- School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Lihua Liu
- School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Shuo Liu
- School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China
| | - Xiaobing Wang
- Institute of Tropical Crop Genetic Resources, Chinese Academy of Tropical Agricultural Sciences, Danzhou 571737, China
| | - Guangping Cao
- Hainan Key Laboratory of Crop Genetics and Breeding, Institute of Food Crops, Hainan Academy of Agricultural Sciences, Haikou 571100, China
| |
Collapse
|
7
|
Wang H, Chen M, Wei X, Xia R, Pei D, Huang X, Han B. Computational tools for plant genomics and breeding. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-024-2578-6. [PMID: 38676814 DOI: 10.1007/s11427-024-2578-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/25/2024] [Indexed: 04/29/2024]
Abstract
Plant genomics and crop breeding are at the intersection of biotechnology and information technology. Driven by a combination of high-throughput sequencing, molecular biology and data science, great advances have been made in omics technologies at every step along the central dogma, especially in genome assembling, genome annotation, epigenomic profiling, and transcriptome profiling. These advances further revolutionized three directions of development. One is genetic dissection of complex traits in crops, along with genomic prediction and selection. The second is comparative genomics and evolution, which open up new opportunities to depict the evolutionary constraints of biological sequences for deleterious variant discovery. The third direction is the development of deep learning approaches for the rational design of biological sequences, especially proteins, for synthetic biology. All three directions of development serve as the foundation for a new era of crop breeding where agronomic traits are enhanced by genome design.
Collapse
Affiliation(s)
- Hai Wang
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
- Hainan Yazhou Bay Seed Laboratory, Sanya, 572025, China.
| | - Mengjiao Chen
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
| | - Xin Wei
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Rui Xia
- College of Horticulture, South China Agricultural University, Guangzhou, 510640, China
| | - Dong Pei
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, 100091, China
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Bin Han
- National Center for Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200233, China
| |
Collapse
|
8
|
Fang C, Du H, Wang L, Liu B, Kong F. Mechanisms underlying key agronomic traits and implications for molecular breeding in soybean. J Genet Genomics 2024; 51:379-393. [PMID: 37717820 DOI: 10.1016/j.jgg.2023.09.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 09/05/2023] [Accepted: 09/05/2023] [Indexed: 09/19/2023]
Abstract
Soybean (Glycine max [L.] Merr.) is an important crop that provides protein and vegetable oil for human consumption. As soybean is a photoperiod-sensitive crop, its cultivation and yield are limited by the photoperiodic conditions in the field. In contrast to other major crops, soybean has a special plant architecture and a special symbiotic nitrogen fixation system, representing two unique breeding directions. Thus, flowering time, plant architecture, and symbiotic nitrogen fixation are three critical or unique yield-determining factors. This review summarizes the progress made in our understanding of these three critical yield-determining factors in soybean. Meanwhile, we propose potential research directions to increase soybean production, discuss the application of genomics and genomic-assisted breeding, and explore research directions to address future challenges, particularly those posed by global climate changes.
Collapse
Affiliation(s)
- Chao Fang
- Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, Guangdong 510006, China
| | - Haiping Du
- Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, Guangdong 510006, China
| | - Lingshuang Wang
- Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, Guangdong 510006, China
| | - Baohui Liu
- Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, Guangdong 510006, China
| | - Fanjiang Kong
- Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou, Guangdong 510006, China.
| |
Collapse
|
9
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
10
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
11
|
Liu W, He G, Deng XW. Toward understanding and utilizing crop heterosis in the age of biotechnology. iScience 2024; 27:108901. [PMID: 38533455 PMCID: PMC10964264 DOI: 10.1016/j.isci.2024.108901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024] Open
Abstract
Heterosis, a universal phenomenon in nature, mainly reflected in the superior productivity, quality, and fitness of F1 hybrids compared with their inbred parents, has been exploited in agriculture and greatly benefited human society in terms of food security. However, the flexible and efficient utilization of heterosis has remained a challenge in hybrid breeding systems because of the limitations of "three-line" and "two-line" methods. In the past two decades, rapidly developed biotechnologies have provided unprecedented conveniences for both understanding and utilizing heterosis. Notably, "third-generation" (3G) hybrid breeding technology together with high-throughput sequencing and gene editing greatly promoted the efficiency of hybrid breeding. Here, we review emerging ideas about the genetic or molecular mechanisms of heterosis and the development of 3G hybrid breeding system in the age of biotechnology. In addition, we summarized opportunities and challenges for optimal heterosis utilization in the future.
Collapse
Affiliation(s)
- Wenwen Liu
- School of Advanced Agricultural Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, Shandong 261325, China
| | - Guangming He
- School of Advanced Agricultural Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Xing Wang Deng
- School of Advanced Agricultural Sciences and School of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Weifang, Shandong 261325, China
| |
Collapse
|
12
|
Wu C, Luo J, Xiao Y. Multi-omics assists genomic prediction of maize yield with machine learning approaches. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2024; 44:14. [PMID: 38343399 PMCID: PMC10853138 DOI: 10.1007/s11032-024-01454-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/19/2024] [Indexed: 02/28/2024]
Abstract
With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-024-01454-z.
Collapse
Affiliation(s)
- Chengxiu Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Jingyun Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
- Hubei Hongshan Laboratory, Wuhan, 430070 China
| |
Collapse
|
13
|
Cao Y, Tian D, Tang Z, Liu X, Hu W, Zhang Z, Song S. OPIA: an open archive of plant images and related phenotypic traits. Nucleic Acids Res 2024; 52:D1530-D1537. [PMID: 37930849 PMCID: PMC10767956 DOI: 10.1093/nar/gkad975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/11/2023] [Accepted: 10/16/2023] [Indexed: 11/08/2023] Open
Abstract
High-throughput plant phenotype acquisition technologies have been extensively utilized in plant phenomics studies, leading to vast quantities of images and image-based phenotypic traits (i-traits) that are critically essential for accelerating germplasm screening, plant diseases identification and biotic & abiotic stress classification. Here, we present the Open Plant Image Archive (OPIA, https://ngdc.cncb.ac.cn/opia/), an open archive of plant images and i-traits derived from high-throughput phenotyping platforms. Currently, OPIA houses 56 datasets across 11 plants, comprising a total of 566 225 images with 2 417 186 labeled instances. Notably, it incorporates 56 i-traits of 93 rice and 105 wheat cultivars based on 18 644 individual RGB images, and these i-traits are further annotated based on the Plant Phenotype and Trait Ontology (PPTO) and cross-linked with GWAS Atlas. Additionally, each dataset in OPIA is assigned an evaluation score that takes account of image data volume, image resolution, and the number of labeled instances. More importantly, OPIA is equipped with useful tools for online image pre-processing and intelligent prediction. Collectively, OPIA provides open access to valuable datasets, pre-trained models, and phenotypic traits across diverse plants and thus bears great potential to play a crucial role in facilitating artificial intelligence-assisted breeding research.
Collapse
Affiliation(s)
- Yongrong Cao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongmei Tian
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhixin Tang
- University of Chinese Academy of Sciences, Beijing 100049, China
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaonan Liu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Weijuan Hu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shuhui Song
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
14
|
Wu C, Zhang Y, Ying Z, Li L, Wang J, Yu H, Zhang M, Feng X, Wei X, Xu X. A transformer-based genomic prediction method fused with knowledge-guided module. Brief Bioinform 2023; 25:bbad438. [PMID: 38058185 PMCID: PMC10701102 DOI: 10.1093/bib/bbad438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/15/2023] [Accepted: 11/03/2023] [Indexed: 12/08/2023] Open
Abstract
Genomic prediction (GP) uses single nucleotide polymorphisms (SNPs) to establish associations between markers and phenotypes. Selection of early individuals by genomic estimated breeding value shortens the generation interval and speeds up the breeding process. Recently, methods based on deep learning (DL) have gained great attention in the field of GP. In this study, we explore the application of Transformer-based structures to GP and develop a novel deep-learning model named GPformer. GPformer obtains a global view by gleaning beneficial information from all relevant SNPs regardless of the physical distance between SNPs. Comprehensive experimental results on five different crop datasets show that GPformer outperforms ridge regression-based linear unbiased prediction (RR-BLUP), support vector regression (SVR), light gradient boosting machine (LightGBM) and deep neural network genomic prediction (DNNGP) in terms of mean absolute error, Pearson's correlation coefficient and the proposed metric consistent index. Furthermore, we introduce a knowledge-guided module (KGM) to extract genome-wide association studies-based information, which is fused into GPformer as prior knowledge. KGM is very flexible and can be plugged into any DL network. Ablation studies of KGM on three datasets illustrate the efficiency of KGM adequately. Moreover, GPformer is robust and stable to hyperparameters and can generalize to each phenotype of every dataset, which is suitable for practical application scenarios.
Collapse
Affiliation(s)
- Cuiling Wu
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Yiyi Zhang
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Zhiwen Ying
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Ling Li
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Jun Wang
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Hui Yu
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130012, China
| | - Mengchen Zhang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou 310006, China
| | - Xianzhong Feng
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130012, China
| | - Xinghua Wei
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou 310006, China
| | - Xiaogang Xu
- School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China
| |
Collapse
|
15
|
Zhou G, Gao J, Zuo D, Li J, Li R. MSXFGP: combining improved sparrow search algorithm with XGBoost for enhanced genomic prediction. BMC Bioinformatics 2023; 24:384. [PMID: 37817077 PMCID: PMC10566073 DOI: 10.1186/s12859-023-05514-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 10/02/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND With the significant reduction in the cost of high-throughput sequencing technology, genomic selection technology has been rapidly developed in the field of plant breeding. Although numerous genomic selection methods have been proposed by researchers, the existing genomic selection methods still face the problem of poor prediction accuracy in practical applications. RESULTS This paper proposes a genome prediction method MSXFGP based on a multi-strategy improved sparrow search algorithm (SSA) to optimize XGBoost parameters and feature selection. Firstly, logistic chaos mapping, elite learning, adaptive parameter adjustment, Levy flight, and an early stop strategy are incorporated into the SSA. This integration serves to enhance the global and local search capabilities of the algorithm, thereby improving its convergence accuracy and stability. Subsequently, the improved SSA is utilized to concurrently optimize XGBoost parameters and feature selection, leading to the establishment of a new genomic selection method, MSXFGP. Utilizing both the coefficient of determination R2 and the Pearson correlation coefficient as evaluation metrics, MSXFGP was evaluated against six existing genomic selection models across six datasets. The findings reveal that MSXFGP prediction accuracy is comparable or better than existing widely used genomic selection methods, and it exhibits better accuracy when R2 is utilized as an assessment metric. Additionally, this research provides a user-friendly Python utility designed to aid breeders in the effective application of this innovative method. MSXFGP is accessible at https://github.com/DIBreeding/MSXFGP . CONCLUSIONS The experimental results show that the prediction accuracy of MSXFGP is comparable or better than existing genome selection methods, providing a new approach for plant genome selection.
Collapse
Affiliation(s)
- Ganghui Zhou
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Erdos East Street No. 29, Hohhot, 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Zhaowuda Road No. 306, Hohhot, 010018, China
| | - Jing Gao
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Erdos East Street No. 29, Hohhot, 010011, China.
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Zhaowuda Road No. 306, Hohhot, 010018, China.
- Inner Mongolia Autonomous Region Big Data Center, Chilechuan Street No. 1, Hohhot, 010091, China.
| | - Dongshi Zuo
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Erdos East Street No. 29, Hohhot, 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Zhaowuda Road No. 306, Hohhot, 010018, China
| | - Jin Li
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Erdos East Street No. 29, Hohhot, 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Zhaowuda Road No. 306, Hohhot, 010018, China
| | - Rui Li
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Erdos East Street No. 29, Hohhot, 010011, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Zhaowuda Road No. 306, Hohhot, 010018, China
| |
Collapse
|
16
|
Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. FRONTIERS IN PLANT SCIENCE 2023; 14:1260089. [PMID: 37860239 PMCID: PMC10583549 DOI: 10.3389/fpls.2023.1260089] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/13/2023] [Indexed: 10/21/2023]
Abstract
Crop breeding is one of the main approaches to increase crop yield and improve crop quality. However, the breeding process faces challenges such as complex data, difficulties in data acquisition, and low prediction accuracy, resulting in low breeding efficiency and long cycle. Deep learning-based crop breeding is a strategy that applies deep learning techniques to improve and optimize the breeding process, leading to accelerated crop improvement, enhanced breeding efficiency, and the development of higher-yielding, more adaptive, and disease-resistant varieties for agricultural production. This perspective briefly discusses the mechanisms, key applications, and impact of deep learning in crop breeding. We also highlight the current challenges associated with this topic and provide insights into its future application prospects.
Collapse
Affiliation(s)
- Xiaoding Wang
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Haitao Zeng
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Limei Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Yanze Huang
- School of Computer Science and Mathematics, Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China
| | - Hui Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Youxiong Que
- Key Laboratory of Sugarcane Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, Fujian Agriculture and Forestry University, Fuzhou, China
- National Key Laboratory for Tropical Crop Breeding, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Hainan, China
| |
Collapse
|
17
|
Gao P, Zhao H, Luo Z, Lin Y, Feng W, Li Y, Kong F, Li X, Fang C, Wang X. SoyDNGP: a web-accessible deep learning framework for genomic prediction in soybean breeding. Brief Bioinform 2023; 24:bbad349. [PMID: 37824739 DOI: 10.1093/bib/bbad349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 10/14/2023] Open
Abstract
Soybean is a globally significant crop, playing a vital role in human nutrition and agriculture. Its complex genetic structure and wide trait variation, however, pose challenges for breeders and researchers aiming to optimize its yield and quality. Addressing this biological complexity requires innovative and accurate tools for trait prediction. In response to this challenge, we have developed SoyDNGP, a deep learning-based model that offers significant advancements in the field of soybean trait prediction. Compared to existing methods, such as DeepGS and DNNGP, SoyDNGP boasts a distinct advantage due to its minimal increase in parameter volume and superior predictive accuracy. Through rigorous performance comparison, including prediction accuracy and model complexity, SoyDNGP represents improved performance to its counterparts. Furthermore, it effectively predicted complex traits with remarkable precision, demonstrating robust performance across different sample sizes and trait complexities. We also tested the versatility of SoyDNGP across multiple crop species, including cotton, maize, rice and tomato. Our results showed its consistent and comparable performance, emphasizing SoyDNGP's potential as a versatile tool for genomic prediction across a broad range of crops. To enhance its accessibility to users without extensive programming experience, we designed a user-friendly web server, available at http://xtlab.hzau.edu.cn/SoyDNGP. The server provides two features: 'Trait Lookup', offering users the ability to access pre-existing trait predictions for over 500 soybean accessions, and 'Trait Prediction', allowing for the upload of VCF files for trait estimation. By providing a high-performing, accessible tool for trait prediction, SoyDNGP opens up new possibilities in the quest for optimized soybean breeding.
Collapse
Affiliation(s)
- Pengfei Gao
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Haonan Zhao
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Zheng Luo
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Yifan Lin
- Hubei Hongshan Laboratory, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Wanjie Feng
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Yaling Li
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Fanjiang Kong
- Guangzhou Key Laboratory of Crop Gene Editing, Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China
| | - Xia Li
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| | - Chao Fang
- Guangzhou Key Laboratory of Crop Gene Editing, Guangdong Key Laboratory of Plant Adaptation and Molecular Design, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou University, Guangzhou 510006, China
| | - Xutong Wang
- National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
- Hubei Hongshan Laboratory, No. 1 Shizishan Road, Hongshan District, Wuhan, Hubei 430070, China
| |
Collapse
|
18
|
Zhang Y, Zhang N, Chai X, Sun T. Machine learning for image-based multi-omics analysis of leaf veins. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:4928-4941. [PMID: 37410807 DOI: 10.1093/jxb/erad251] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Veins are a critical component of the plant growth and development system, playing an integral role in supporting and protecting leaves, as well as transporting water, nutrients, and photosynthetic products. A comprehensive understanding of the form and function of veins requires a dual approach that combines plant physiology with cutting-edge image recognition technology. The latest advancements in computer vision and machine learning have facilitated the creation of algorithms that can identify vein networks and explore their developmental progression. Here, we review the functional, environmental, and genetic factors associated with vein networks, along with the current status of research on image analysis. In addition, we discuss the methods of venous phenotype extraction and multi-omics association analysis using machine learning technology, which could provide a theoretical basis for improving crop productivity by optimizing the vein network architecture.
Collapse
Affiliation(s)
- Yubin Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Ning Zhang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Xiujuan Chai
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| | - Tan Sun
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
- Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South St, Beijing 100081, China
| |
Collapse
|
19
|
Shahsavari M, Mohammadi V, Alizadeh B, Alizadeh H. Application of machine learning algorithms and feature selection in rapeseed (Brassica napus L.) breeding for seed yield. PLANT METHODS 2023; 19:57. [PMID: 37328913 DOI: 10.1186/s13007-023-01035-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 06/05/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Studying the relationships between rapeseed seed yield (SY) and its yield-related traits can assist rapeseed breeders in the efficient indirect selection of high-yielding varieties. However, since the conventional and linear methods cannot interpret the complicated relations between SY and other traits, employing advanced machine learning algorithms is inevitable. Our main goal was to find the best combination of machine learning algorithms and feature selection methods to maximize the efficiency of indirect selection for rapeseed SY. RESULTS To achieve that, twenty-five regression-based machine learning algorithms and six feature selection methods were employed. SY and yield-related data from twenty rapeseed genotypes were collected from field experiments over a period of 2 years (2019-2021). Root mean square error (RMSE), mean absolute error (MAE), and determination coefficient (R2) were used to evaluate the performance of the algorithms. The best performance with all fifteen measured traits as inputs was achieved by the Nu-support vector regression algorithm with quadratic polynomial kernel function (R2 = 0.860, RMSE = 0.266, MAE = 0.210). The multilayer perceptron neural network algorithm with identity activation function (MLPNN-Identity) using three traits obtained from stepwise and backward selection methods appeared to be the most efficient combination of algorithms and feature selection methods (R2 = 0.843, RMSE = 0.283, MAE = 0.224). Feature selection suggested that the set of pods per plant and days to physiological maturity along with plant height or first pod height from the ground are the most influential traits in predicting rapeseed SY. CONCLUSION The results of this study showed that MLPNN-Identity along with stepwise and backward selection methods can provide a robust combination to accurately predict the SY using fewer traits and therefore help optimize and accelerate SY breeding programs of rapeseed.
Collapse
Affiliation(s)
- Masoud Shahsavari
- Department of Agronomy and Plant Breeding, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Valiollah Mohammadi
- Department of Agronomy and Plant Breeding, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.
| | - Bahram Alizadeh
- Seed and Plant Improvement Institute, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| | - Houshang Alizadeh
- Department of Agronomy and Plant Breeding, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| |
Collapse
|