1
|
Singhal R, Izquierdo P, Ranaweera T, Segura Abá K, Brown BN, Lehti-Shiu MD, Shiu SH. Using supervised machine-learning approaches to understand abiotic stress tolerance and design resilient crops. Philos Trans R Soc Lond B Biol Sci 2025; 380:20240252. [PMID: 40439305 PMCID: PMC12121380 DOI: 10.1098/rstb.2024.0252] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 01/14/2025] [Accepted: 01/15/2025] [Indexed: 06/02/2025] Open
Abstract
Abiotic stresses such as drought, heat, cold, salinity and flooding significantly impact plant growth, development and productivity. As the planet has warmed, these abiotic stresses have increased in frequency and intensity, affecting the global food supply and making it imperative to develop stress-resilient crops. In the past 20 years, the development of omics technologies has contributed to the growth of datasets for plants grown under a wide range of abiotic environments. Integration of these rapidly growing data using machine-learning (ML) approaches can complement existing breeding efforts by providing insights into the mechanisms underlying plant responses to stressful conditions, which can be used to guide the design of resilient crops. In this review, we introduce ML approaches and provide examples of how researchers use these approaches to predict molecular activities, gene functions and genotype responses under stressful conditions. Finally, we consider the potential and challenges of using such approaches to enable the design of crops that are better suited to a changing environment.This article is part of the theme issue 'Crops under stress: can we mitigate the impacts of climate change on agriculture and launch the 'Resilience Revolution'?'.
Collapse
Affiliation(s)
- Rajneesh Singhal
- Department of Plant Biology, Michigan State University, East Lansing, MI48824, USA
| | - Paulo Izquierdo
- Department of Plant Biology, Michigan State University, East Lansing, MI48824, USA
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI48824, USA
| | - Thilanka Ranaweera
- Department of Plant Biology, Michigan State University, East Lansing, MI48824, USA
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI48824, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI48824, USA
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI48824, USA
| | - Brianna N.I. Brown
- Department of Plant Biology, Michigan State University, East Lansing, MI48824, USA
| | | | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI48824, USA
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI48824, USA
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI48824, USA
| |
Collapse
|
2
|
Sanchez-Munoz R, Depaepe T, Samalova M, Hejatko J, Zaplana I, Van Der Straeten D. Machine-learning meta-analysis reveals ethylene as a central component of the molecular core in abiotic stress responses in Arabidopsis. Nat Commun 2025; 16:4778. [PMID: 40404615 PMCID: PMC12098884 DOI: 10.1038/s41467-025-59542-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 04/22/2025] [Indexed: 05/24/2025] Open
Abstract
Understanding how plants adapt their physiology to overcome severe and often multifactorial stress conditions in nature is vital in light of the climate crisis. This remains a challenge given the complex nature of the underlying molecular mechanisms. To provide a comprehensive picture of stress-mitigation mechanisms, an exhaustive analysis of publicly available stress-related transcriptomic data has been conducted. We combine a meta-analysis with an unsupervised machine-learning algorithm to identify a core of stress-related genes active at 1-6 h and 12-24 h of exposure in Arabidopsis thaliana shoots and roots. To ensure robustness and biological significance of the output, often lacking in meta-analyses, a triple validation is incorporated. We present a 'stress gene core': a set of key genes involved in plant tolerance to ten adverse environmental conditions and ethylene-precursor supplementation rather than individual conditions. Notably, ethylene plays a key regulatory role in this core, influencing gene expression and acting as a critical factor in stress tolerance. Additionally, the analysis provides insights into previously uncharacterized genes, key genes within large families, and gene expression dynamics, which are used to create biologically validated databases that can guide further abiotic stress research. These findings establish a strong framework for advancing multi-stress-resilient crops, paving the way for sustainable agriculture in the face of climate challenges.
Collapse
Affiliation(s)
- Raul Sanchez-Munoz
- Laboratory of Functional Plant Biology, Department of Biology, Faculty of Sciences, Ghent University, Gent, B-9000, Belgium
- Department of Agri-Food Engineering and Biotechnology (DEAB), Universitat Politècnica de Catalunya - BarcelonaTech (UPC), Castelldefels, 08860, Barcelona, Spain
| | - Thomas Depaepe
- Laboratory of Functional Plant Biology, Department of Biology, Faculty of Sciences, Ghent University, Gent, B-9000, Belgium
| | - Marketa Samalova
- Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Jan Hejatko
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- National Centre for Biotechnological Research, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Isiah Zaplana
- Institute of Industrial and Control Engineering (IOC), Universitat Politècnica de Catalunya - BarcelonaTech (UPC), Barcelona, 08028, Spain.
| | - Dominique Van Der Straeten
- Laboratory of Functional Plant Biology, Department of Biology, Faculty of Sciences, Ghent University, Gent, B-9000, Belgium.
| |
Collapse
|
3
|
Pan SQ, Hum YC, Lai KW, Yap WS, Zhang Y, Heo HY, Tee YK. Artificial intelligence in chemical exchange saturation transfer magnetic resonance imaging. Artif Intell Rev 2025; 58:210. [DOI: 10.1007/s10462-025-11227-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2025] [Indexed: 05/03/2025]
|
4
|
Hu Z, Li X, Yuan Y, Xu Q, Zhang W, Lei H. Development and validation of machine learning models for predicting venous thromboembolism in colorectal cancer patients: A cohort study in China. Int J Med Inform 2025; 195:105770. [PMID: 39732129 DOI: 10.1016/j.ijmedinf.2024.105770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/15/2024] [Accepted: 12/18/2024] [Indexed: 12/30/2024]
Abstract
BACKGROUND With advancements in healthcare, traditional VTE risk assessment tools are increasingly insufficient to meet the demands of high-quality care, underscoring the need for innovative and specialized assessment methods. OBJECTIVE Owing to the remarkable success of machine learning in supervised learning and disease prediction, our objective is to develop a reliable and efficient model for assessing VTE risk by leveraging the fundamental data and clinical characteristics of colorectal cancer patients within our medical facility. METHODS Six commonly used machine learning algorithms were utilized in our study to predict the occurrence of VTE in patients with rectal cancer. In the modeling process, LASSO regression was employed to identify and exclude variables not associated with VTE. Additionally, hyperparameter tuning was conducted via 5-fold cross-validation to mitigate overfitting, and 200 bootstrap samples were used to adjust the apparent performance on the training set. The selection of the VTE assessment model was determined by a thorough evaluation of performance criteria, such as the AUC, ACC and F1 score. RESULTS The RF model exhibits consistent and efficient performance. Specifically, in the internally validation dataset, where generalizability was adjusted, the RF model achieved the highest scores across multiple metrics: AD-AUC (0.895), AD-ACC (0.871), AD-F1 (0.311), AD-MCC (0.316), AD-Precision (0.241), AD-Specificity (0.888). For external validation on unseen colon cancer data, the RF model also performed best in terms of ACC (0.728), F1 (0.292), MCC (0.225), Precision (0.192), and Specificity (0.740), with a suboptimal AUC of 0.745 and a Sensitivity (Recall) of 0.615. Additionally, the RF model demonstrates strong performance not only on the original dataset but also on datasets processed via alternative imbalance handling techniques. CONCLUSIONS Our research successfully established and validated a risk assessment model for assessing the risk of VTE in colorectal cancer patients.
Collapse
Affiliation(s)
- Zuhai Hu
- Chongqing Cancer Multiomics Big Data Application Engineering Research Center, Chongqing University Cancer Hospital, Chongqing 400030, China
| | - Xiaosheng Li
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized Treatment, Chongqing University Cancer Hospital, Chongqing 400030, China
| | - Yuliang Yuan
- Chongqing Cancer Multiomics Big Data Application Engineering Research Center, Chongqing University Cancer Hospital, Chongqing 400030, China
| | - Qianjie Xu
- Department of Health Statistics, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Wei Zhang
- Chongqing Key Laboratory of Translational Research for Cancer Metastasis and Individualized Treatment, Chongqing University Cancer Hospital, Chongqing 400030, China.
| | - Haike Lei
- Chongqing Cancer Multiomics Big Data Application Engineering Research Center, Chongqing University Cancer Hospital, Chongqing 400030, China.
| |
Collapse
|
5
|
MacNish TR, Danilevicz MF, Bayer PE, Bestry MS, Edwards D. Application of machine learning and genomics for orphan crop improvement. Nat Commun 2025; 16:982. [PMID: 39856113 PMCID: PMC11760368 DOI: 10.1038/s41467-025-56330-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 01/15/2025] [Indexed: 01/27/2025] Open
Abstract
Orphan crops are important sources of nutrition in developing regions and many are tolerant to biotic and abiotic stressors; however, modern crop improvement technologies have not been widely applied to orphan crops due to the lack of resources available. There are orphan crop representatives across major crop types and the conservation of genes between these related species can be used in crop improvement. Machine learning (ML) has emerged as a promising tool for crop improvement. Transferring knowledge from major crops to orphan crops and using machine learning to improve accuracy and efficiency can be used to improve orphan crops.
Collapse
Affiliation(s)
- Tessa R MacNish
- School of Biological Sciences, The University of Western Australia, Perth, Australia
- Centre for Applied Bioinformatics, The University of Western Australia, Perth, Australia
| | - Monica F Danilevicz
- School of Biological Sciences, The University of Western Australia, Perth, Australia
- Centre for Applied Bioinformatics, The University of Western Australia, Perth, Australia
- Australian Herbicide Resistance Initiative, The University of Western Australia, Perth, Australia
| | - Philipp E Bayer
- Centre for Applied Bioinformatics, The University of Western Australia, Perth, Australia
- The UWA Oceans Institute, The University of Western Australia, Perth, Australia
- Minderoo Foundation, Perth, Australia
| | - Mitchell S Bestry
- School of Biological Sciences, The University of Western Australia, Perth, Australia
- Centre for Applied Bioinformatics, The University of Western Australia, Perth, Australia
| | - David Edwards
- School of Biological Sciences, The University of Western Australia, Perth, Australia.
- Centre for Applied Bioinformatics, The University of Western Australia, Perth, Australia.
| |
Collapse
|
6
|
Liu L, Bao Z, Liang Y, Deng H, Zhang X, Cao T, Zhou C, Zhang Z. Unsupervised learning for lake underwater vegetation classification: Constructing high-precision, large-scale aquatic ecological datasets. THE SCIENCE OF THE TOTAL ENVIRONMENT 2025; 958:177895. [PMID: 39647205 DOI: 10.1016/j.scitotenv.2024.177895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/15/2024] [Accepted: 12/01/2024] [Indexed: 12/10/2024]
Abstract
Monitoring underwater vegetation is vital for evaluating lake ecosystem health. Automated data collection and analysis play key roles in achieving large-scale, high-precision, and high-frequency monitoring. While technologies such as unmanned vessels have made data collection more efficient, challenges persist in the analysis process, particularly in addressing the varied needs of different lake environments. Supervised AI methods can automatically identify underwater vegetation but are heavily reliant on labeled datasets. In practice, models trained on public datasets often struggle with generalization due to differences in vegetation types, collection environments, and equipment, resulting in discrepancies between training and testing datasets. Moreover, traditional dataset construction methods that rely on manual annotation are time-consuming and costly, limiting their scalability and application. This study aims to overcome these challenges by proposing an unsupervised method for automatically classifying underwater vegetation data, aiming to reduce manual annotation efforts and construct unbiased datasets at lower costs with greater efficiency. Compared with existing unsupervised, self-supervised, and unsupervised domain adaptation methods, this method introduces two key innovations: 1) a two-step dimensionality reduction method that combines pre-trained model and manifold learning to extract key features and 2) a multialgorithm voting mechanism to increase classification confidence. These features enable high-accuracy classification without prior data annotation. Experiments show 97.32 % accuracy on public dataset and 92.43 % and 96.15 % accuracy on private datasets from Erhai Lake and Wuhan East Lake, respectively, surpassing supervised methods and matching manual classification. Additionally, it drastically reduces the annotation effort, requiring only approximately 20 labeled images to classify thousands of points. By integrating unmanned vessel technology, this approach provides an efficient, cost-effective solution for large-scale, high-frequency underwater vegetation monitoring across diverse lakes.
Collapse
Affiliation(s)
- Lei Liu
- School of Engineering, Dali University, Yunnan 671003, China; National Observation and Research Station of Erhai Lake Ecosystem in Yunnan, Dali 671006, China.; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China.
| | - Zhengsen Bao
- School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China.
| | - Ying Liang
- School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China.
| | - Huanxi Deng
- School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China.
| | - Xiaolin Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
| | - Te Cao
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
| | - Chichun Zhou
- School of Engineering, Dali University, Yunnan 671003, China; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China.
| | - Zhenyu Zhang
- School of Engineering, Dali University, Yunnan 671003, China; National Observation and Research Station of Erhai Lake Ecosystem in Yunnan, Dali 671006, China.; Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Yunnan 671003, China.
| |
Collapse
|
7
|
Ding S, Hou H, Xu X, Zhang J, Guo L, Ding L. Graph-Based Semi-Supervised Deep Image Clustering With Adaptive Adjacency Matrix. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18828-18837. [PMID: 38416618 DOI: 10.1109/tnnls.2024.3367322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Image clustering is a research hotspot in machine learning and computer vision. Existing graph-based semi-supervised deep clustering methods suffer from three problems: 1) because clustering uses only high-level features, the detailed information contained in shallow-level features is ignored; 2) most feature extraction networks employ the step odd convolutional kernel, which results in an uneven distribution of receptive field intensity; and 3) because the adjacency matrix is precomputed and fixed, it cannot adapt to changes in the relationship between samples. To solve the above problems, we propose a novel graph-based semi-supervised deep clustering method for image clustering. First, the parity cross-convolutional feature extraction and fusion module is used to extract high-quality image features. Then, the clustering constraint layer is designed to improve the clustering efficiency. And, the output layer is customized to achieve unsupervised regularization training. Finally, the adjacency matrix is inferred by actual network prediction. A graph-based regularization method is adopted for unsupervised training networks. Experimental results show that our method significantly outperforms state-of-the-art methods on USPS, MNIST, street view house numbers (SVHN), and fashion MNIST (FMNIST) datasets in terms of ACC, normalized mutual information (NMI), and ARI.
Collapse
|
8
|
Hussain A, Saadia A, Alhussein M, Gul A, Aurangzeb K. Enhancing ransomware defense: deep learning-based detection and family-wise classification of evolving threats. PeerJ Comput Sci 2024; 10:e2546. [PMID: 39678277 PMCID: PMC11640932 DOI: 10.7717/peerj-cs.2546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 11/05/2024] [Indexed: 12/17/2024]
Abstract
Ransomware is a type of malware that locks access to or encrypts its victim's files for a ransom to be paid to get back locked or encrypted data. With the invention of obfuscation techniques, it became difficult to detect its new variants. Identifying the exact malware category and family can help to prepare for possible attacks. Traditional machine learning-based approaches failed to detect and classify advanced obfuscated ransomware variants using existing pattern-matching and signature-based detection techniques. Deep learning-based approaches have proven helpful in both detection and classification by analyzing obfuscated ransomware deeply. Researchers have contributed mainly to detection and minimaly to family attribution. This research aims to address all these multi-class classification problems by leveraging the power of deep learning. We have proposed a novel group normalization-based bidirectional long short-term memory (GN-BiLSTM) method to detect and classify ransomware variants with high accuracy. To validate the technique, five other deep learning models are also trained on the CIC-MalMem-2022, an obfuscated malware dataset. The proposed approach outperformed with an accuracy of 99.99% in detection, 85.48% in category-wise classification, and 74.65% in the identification of ransomware families. To verify its effectiveness, models are also trained on 10,876 self-collected latest samples of 26 malware families and the proposed model has achieved 99.20% accuracy in detecting malware, 97.44% in classifying its category, and 96.23% in identifying its family. Our proposed approach has proven the best for detecting new variants of ransomware with high accuracy and can be implemented in real-world applications of ransomware detection.
Collapse
Affiliation(s)
- Amjad Hussain
- Department of Cyber Security, Air University, Islamabad, Pakistan
| | - Ayesha Saadia
- Department of Computer Science, Air University, Islamabad, Pakistan
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Ammara Gul
- Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, United Kingdom
| | - Khursheed Aurangzeb
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
9
|
Cheng Q, Wang X. Machine Learning for AI Breeding in Plants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae051. [PMID: 38954837 PMCID: PMC11479635 DOI: 10.1093/gpbjnl/qzae051] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/21/2024] [Accepted: 06/25/2024] [Indexed: 07/04/2024]
Affiliation(s)
- Qian Cheng
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| |
Collapse
|
10
|
Peng S, Rajjou L. Advancing plant biology through deep learning-powered natural language processing. PLANT CELL REPORTS 2024; 43:208. [PMID: 39102077 DOI: 10.1007/s00299-024-03294-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024]
Abstract
The application of deep learning methods, specifically the utilization of Large Language Models (LLMs), in the field of plant biology holds significant promise for generating novel knowledge on plant cell systems. The LLM framework exhibits exceptional potential, particularly with the development of Protein Language Models (PLMs), allowing for in-depth analyses of nucleic acid and protein sequences. This analytical capacity facilitates the discernment of intricate patterns and relationships within biological data, encompassing multi-scale information within DNA or protein sequences. The contribution of PLMs extends beyond mere sequence patterns and structure--function recognition; it also supports advancements in genetic improvements for agriculture. The integration of deep learning approaches into the domain of plant sciences offers opportunities for major breakthroughs in basic research across multi-scale plant traits. Consequently, the strategic application of deep learning methodologies, particularly leveraging the potential of LLMs, will undoubtedly play a pivotal role in advancing plant sciences, plant production, plant uses and propelling the trajectory toward sustainable agroecological and agro-food transitions.
Collapse
Affiliation(s)
- Shuang Peng
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin for Plant Sciences (IJPB), 78000, Versailles, France
| | - Loïc Rajjou
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin for Plant Sciences (IJPB), 78000, Versailles, France.
| |
Collapse
|
11
|
Vasquez-Teuber P, Rouxel T, Mason AS, Soyer JL. Breeding and management of major resistance genes to stem canker/blackleg in Brassica crops. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:192. [PMID: 39052130 PMCID: PMC11272824 DOI: 10.1007/s00122-024-04641-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 04/29/2024] [Indexed: 07/27/2024]
Abstract
Blackleg (also known as Phoma or stem canker) is a major, worldwide disease of Brassica crop species, notably B. napus (rapeseed, canola), caused by the ascomycete fungus Leptosphaeria maculans. The outbreak and severity of this disease depend on environmental conditions and management practices, as well as a complex interaction between the pathogen and its hosts. Genetic resistance is a major method to control the disease (and the only control method in some parts of the world, such as continental Europe), but efficient use of genetic resistance is faced with many difficulties: (i) the scarcity of germplasm/genetic resources available, (ii) the different history of use of resistance genes in different parts of the world and the different populations of the fungus the resistance genes are exposed to, (iii) the complexity of the interactions between the plant and the pathogen that expand beyond typical gene-for-gene interactions, (iv) the incredible evolutionary potential of the pathogen and the importance of knowing the molecular processes set up by the fungus to "breakdown' resistances, so that we may design high-throughput diagnostic tools for population surveys, and (v) the different strategies and options to build up the best resistances and to manage them so that they are durable. In this paper, we aim to provide a comprehensive overview of these different points, stressing the differences between the different continents and the current prospects to generate new and durable resistances to blackleg disease.
Collapse
Affiliation(s)
- Paula Vasquez-Teuber
- Department of Plant Breeding, Justus Liebig University, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany
- Department of Plant Production, Faculty of Agronomy, University of Concepción, Av. Vicente Méndez 595, Chillán, Chile
- Plant Breeding Department, University of Bonn, Katzenburgweg 5, 53115, Bonn, Germany
| | - Thierry Rouxel
- Université Paris-Saclay, INRAE, UR BIOGER, 91120, Palaiseau, France
| | - Annaliese S Mason
- Department of Plant Breeding, Justus Liebig University, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany.
- Plant Breeding Department, University of Bonn, Katzenburgweg 5, 53115, Bonn, Germany.
| | - Jessica L Soyer
- Université Paris-Saclay, INRAE, UR BIOGER, 91120, Palaiseau, France.
| |
Collapse
|
12
|
Murmu S, Sinha D, Chaurasia H, Sharma S, Das R, Jha GK, Archak S. A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions. FRONTIERS IN PLANT SCIENCE 2024; 15:1292054. [PMID: 38504888 PMCID: PMC10948452 DOI: 10.3389/fpls.2024.1292054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 01/24/2024] [Indexed: 03/21/2024]
Abstract
Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases.
Collapse
Affiliation(s)
- Sneha Murmu
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Dipro Sinha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Himanshushekhar Chaurasia
- Central Institute for Research on Cotton Technology, Indian Council of Agricultural Research (ICAR), Mumbai, India
| | - Soumya Sharma
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Ritwika Das
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Girish Kumar Jha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Sunil Archak
- National Bureau of Plant Genetic Resources, Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
13
|
Artemenko NV, Genaev MA, Epifanov RUI, Komyshev EG, Kruchinina YV, Koval VS, Goncharov NP, Afonnikov DA. Image-based classification of wheat spikes by glume pubescence using convolutional neural networks. FRONTIERS IN PLANT SCIENCE 2024; 14:1336192. [PMID: 38283969 PMCID: PMC10811101 DOI: 10.3389/fpls.2023.1336192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 12/20/2023] [Indexed: 01/30/2024]
Abstract
Introduction Pubescence is an important phenotypic trait observed in both vegetative and generative plant organs. Pubescent plants demonstrate increased resistance to various environmental stresses such as drought, low temperatures, and pests. It serves as a significant morphological marker and aids in selecting stress-resistant cultivars, particularly in wheat. In wheat, pubescence is visible on leaves, leaf sheath, glumes and nodes. Regarding glumes, the presence of pubescence plays a pivotal role in its classification. It supplements other spike characteristics, aiding in distinguishing between different varieties within the wheat species. The determination of pubescence typically involves visual analysis by an expert. However, methods without the use of binocular loupe tend to be subjective, while employing additional equipment is labor-intensive. This paper proposes an integrated approach to determine glume pubescence presence in spike images captured under laboratory conditions using a digital camera and convolutional neural networks. Methods Initially, image segmentation is conducted to extract the contour of the spike body, followed by cropping of the spike images to an equal size. These images are then classified based on glume pubescence (pubescent/glabrous) using various convolutional neural network architectures (Resnet-18, EfficientNet-B0, and EfficientNet-B1). The networks were trained and tested on a dataset comprising 9,719 spike images. Results For segmentation, the U-Net model with EfficientNet-B1 encoder was chosen, achieving the segmentation accuracy IoU = 0.947 for the spike body and 0.777 for awns. The classification model for glume pubescence with the highest performance utilized the EfficientNet-B1 architecture. On the test sample, the model exhibited prediction accuracy parameters of F1 = 0.85 and AUC = 0.96, while on the holdout sample it showed F1 = 0.84 and AUC = 0.89. Additionally, the study investigated the relationship between image scale, artificial distortions, and model prediction performance, revealing that higher magnification and smaller distortions yielded a more accurate prediction of glume pubescence.
Collapse
Affiliation(s)
- Nikita V. Artemenko
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Mathematics and Mechanics, Novosibirsk State University, Novosibirsk, Russia
| | - Mikhail A. Genaev
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Rostislav UI. Epifanov
- Department of Mathematics and Mechanics, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeny G. Komyshev
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Yulia V. Kruchinina
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Vasiliy S. Koval
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Nikolay P. Goncharov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Dmitry A. Afonnikov
- Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Mathematics and Mechanics, Novosibirsk State University, Novosibirsk, Russia
- Kurchatov Center for Genome Research, Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
14
|
Wen T, Li JH, Wang Q, Gao YY, Hao GF, Song BA. Thermal imaging: The digital eye facilitates high-throughput phenotyping traits of plant growth and stress responses. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 899:165626. [PMID: 37481085 DOI: 10.1016/j.scitotenv.2023.165626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 07/13/2023] [Accepted: 07/16/2023] [Indexed: 07/24/2023]
Abstract
Plant phenotyping is important for plants to cope with environmental changes and ensure plant health. Imaging techniques are perceived as the most critical and reliable tools for studying plant phenotypes. Thermal imaging has opened up new opportunities for nondestructive imaging of plant phenotyping. However, a comprehensive summary of thermal imaging in plant phenotyping is still lacking. Here we discuss the progress and future prospects of thermal imaging for assessing plant growth and stress responses. First, we classify thermal imaging into ground-based and aerial platforms based on their adaptability to different experimental environments (including laboratory, greenhouse, and field). It is convenient to collect phenotypic information of different dimensions. Second, in order to enhance the efficiency of thermal image processing, automatic algorithms based on deep learning are employed instead of traditional manual methods, greatly reducing the time cost of experiments. Considering its ease of implementation, handling and instant response, thermal imaging has been widely used in research on environmental stress, crop yield, and seed vigor. We have found that thermal imaging can detect thermal energy dissipation caused by living organisms (e.g., pests, viruses, bacteria, fungi, and oomycetes), enabling early disease diagnosis. It also recognizes changes leaf surface temperatures resulting from reduced transpiration rates caused by nutrient deficiency, drought, salinity, or freezing. Furthermore, thermal imaging predicts crop yield under different water states and forecasts the viability of dormant seeds after water absorption by monitoring temperature changes in the seeds. This work will assist biologists and agronomists in studying plant phenotypes and serve a guide for breeders to develop high-yielding, stress-tolerant, and superior crops.
Collapse
Affiliation(s)
- Ting Wen
- National Key Laboratory of Green Pesticide, State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Jian-Hong Li
- National Key Laboratory of Green Pesticide, State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Qi Wang
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, PR China.
| | - Yang-Yang Gao
- National Key Laboratory of Green Pesticide, State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China.
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China; Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China.
| | - Bao-An Song
- National Key Laboratory of Green Pesticide, State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| |
Collapse
|
15
|
Lu M, Gao P, Hu J, Hou J, Wang D. A classification method of stress in plants using unsupervised learning algorithm and chlorophyll fluorescence technology. FRONTIERS IN PLANT SCIENCE 2023; 14:1202092. [PMID: 37936937 PMCID: PMC10626557 DOI: 10.3389/fpls.2023.1202092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 09/29/2023] [Indexed: 11/09/2023]
Abstract
Introduction Chilling injury is one of the most common meteorological disasters affecting cucumber production. For implementing remedial measures as soon as possible to minimize production loss, a timely and precise assessment of chilling injury is crucial. Methods To evaluate the possibility of detecting cucumber chilling injury using chlorophyll fluorescence (ChlF) technology, we investigated the continuous changes in ChlF parameters under various low-temperature conditions and created the criteria for evaluating chilling injury. The ChlF induction curves were first collected before low-temperature as unstressed samples and daily 1 to 5 days after low-temperature as chilling injury samples. Principal component analysis was employed to investigate the public information on ChlF parameters and evaluate the differences between samples with different degrees of chilling injury. The parameters (F v/F m, Y(NO), qP, and F o) accounted for a large proportion in the principal components and could characterize chilling injury. Uniform manifold approximation and projection method was employed to extract new features (Feature 1, Feature 2, Feature 3, and Feature 4) from ChlF parameters for subsequent classification model. Taking four features as input, a classification model based on the Fuzzy C-means clustering algorithm was constructed in order to identify the chilling injury classes of cucumber seedlings. The cucumber seedlings with different chilling injury classes were analyzed for ChlF images, rapid light curves, and malondialdehyde content. Results and discussion The results demonstrated that the variations in these indicators among the different chilling injury classes supported the validity of the classification model. Our findings provide a better understanding of the relationship between ChlF parameters and the impact of low-temperature treatment on cucumber seedlings. This finding offers an additional perspective that can be used to evaluate the responses and damage that plants experience under stress.
Collapse
Affiliation(s)
- Miao Lu
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Pan Gao
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Jin Hu
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
- Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Yangling, Shaanxi, China
| | - Junying Hou
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
- Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Yangling, Shaanxi, China
| | - Dong Wang
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
16
|
Scendoni R, Tomassini L, Cingolani M, Perali A, Pilati S, Fedeli P. Artificial Intelligence in Evaluation of Permanent Impairment: New Operational Frontiers. Healthcare (Basel) 2023; 11:1979. [PMID: 37510420 PMCID: PMC10378994 DOI: 10.3390/healthcare11141979] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/01/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) span multiple disciplines, including the medico-legal sciences, also with reference to the concept of disease and disability. In this context, the International Classification of Diseases, Injuries, and Causes of Death (ICD) is a standard for the classification of diseases and related problems developed by the World Health Organization (WHO), and it represents a valid tool for statistical and epidemiological studies. Indeed, the International Classification of Functioning, Disability, and Health (ICF) is outlined as a classification that aims to describe the state of health of people in relation to their existential spheres (social, family, work). This paper lays the foundations for proposing an operating model for the use of AI in the assessment of impairments with the aim of making the information system as homogeneous as possible, starting from the main coding systems of the reference pathologies and functional damages. Providing a scientific basis for the understanding and study of health, as well as establishing a common language for the assessment of disability in its various meanings through AI systems, will allow for the improvement and standardization of communication between the various expert users.
Collapse
Affiliation(s)
- Roberto Scendoni
- Department of Law, Institute of Legal Medicine, University of Macerata, 62100 Macerata, Italy
| | - Luca Tomassini
- International School of Advanced Studies, University of Camerino, 62032 Camerino, Italy
| | - Mariano Cingolani
- Department of Law, Institute of Legal Medicine, University of Macerata, 62100 Macerata, Italy
| | - Andrea Perali
- Physics Unit, School of Pharmacy, University of Camerino, 62032 Camerino, Italy
| | - Sebastiano Pilati
- Physics Division, School of Science and Technology, University of Camerino, 62032 Camerino, Italy
| | - Piergiorgio Fedeli
- School of Law, Legal Medicine, University of Camerino, 62032 Camerino, Italy
| |
Collapse
|
17
|
Wu X, Deng H, Wang Q, Lei L, Gao Y, Hao G. Meta-learning shows great potential in plant disease recognition under few available samples. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 114:767-782. [PMID: 36883481 DOI: 10.1111/tpj.16176] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/15/2023] [Accepted: 02/23/2023] [Indexed: 05/27/2023]
Abstract
Plant diseases worsen the threat of food shortage with the growing global population, and disease recognition is the basis for the effective prevention and control of plant diseases. Deep learning has made significant breakthroughs in the field of plant disease recognition. Compared with traditional deep learning, meta-learning can still maintain more than 90% accuracy in disease recognition with small samples. However, there is no comprehensive review on the application of meta-learning in plant disease recognition. Here, we mainly summarize the functions, advantages, and limitations of meta-learning research methods and their applications for plant disease recognition with a few data scenarios. Finally, we outline several research avenues for utilizing current and future meta-learning in plant science. This review may help plant science researchers obtain faster, more accurate, and more credible solutions through deep learning with fewer labeled samples.
Collapse
Affiliation(s)
- Xue Wu
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, 550025, Guizhou, China
| | - Hongyu Deng
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, 550025, Guizhou, China
| | - Qi Wang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, 550025, Guizhou, China
| | - Liang Lei
- School of Physics & Optoelectronic Engineering, Guangdong University of Technology, Guangzhou, 550000, Guangzhou, China
| | - Yangyang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, 550025, Guizhou, China
| | - Gefei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, State Key Laboratory of Public Big Data, Guizhou University, Guiyang, 550025, Guizhou, China
| |
Collapse
|
18
|
Li T, Jiang S, Fu R, Wang X, Cheng Q, Jiang S. IP4GS: Bringing genomic selection analysis to breeders. FRONTIERS IN PLANT SCIENCE 2023; 14:1131493. [PMID: 36950355 PMCID: PMC10025548 DOI: 10.3389/fpls.2023.1131493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 02/20/2023] [Indexed: 06/18/2023]
Abstract
Genomic selection (GS), a strategy to use genotypes to predict phenotypes via statistical or machine learning models, has become a routine practice in plant breeding programs. GS can speed up the genetic gain by reducing phenotyping costs and/or shortening the breeding cycles. GS analysis is complicated involving data clean up and formatting, training and test population analysis, model selection and evaluation, and parameter optimization. In addition, GS analysis also requires some programming skills and knowledge of statistical modeling. Thus, we need a more practical GS tools for breeders. To alleviate this difficulty, we developed the web-based platform IP4GS (https://ngdc.cncb.ac.cn/ip4gs/), which offers a user-friendly interface to perform GS analysis simply through point-and-click actions. IP4GS currently includes seven commonly used models, eleven evaluation metrics, and visualization modules, offering great convenience for plant breeders with limited bioinformatics knowledge to apply GS analysis.
Collapse
Affiliation(s)
| | | | | | | | - Qian Cheng
- *Correspondence: Qian Cheng, ; Shuqin Jiang,
| | | |
Collapse
|
19
|
Abstract
Over the past decade, advances in plant genotyping have been critical in enabling the identification of genetic diversity, in understanding evolution, and in dissecting important traits in both crops and native plants. The widespread popularity of single-nucleotide polymorphisms (SNPs) has prompted significant improvements to SNP-based genotyping, including SNP arrays, genotyping by sequencing, and whole-genome resequencing. More recent approaches, including genotyping structural variants, utilizing pangenomes to capture species-wide genetic diversity and exploiting machine learning to analyze genotypic data sets, are pushing the boundaries of what plant genotyping can offer. In this chapter, we highlight these innovations and discuss how they will accelerate and advance future genotyping efforts.
Collapse
|
20
|
Guo T, Li X. Machine learning for predicting phenotype from genotype and environment. Curr Opin Biotechnol 2023; 79:102853. [PMID: 36463837 DOI: 10.1016/j.copbio.2022.102853] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/01/2022] [Accepted: 11/07/2022] [Indexed: 12/03/2022]
Abstract
Predicting phenotype with genomic and environmental information is critically needed and challenging. Machine learning methods have emerged as powerful tools to make accurate predictions from large and complex biological data. Here, we review the progress of phenotype prediction models enabled or improved by machine learning methods. We categorized the applications into three scenarios: prediction with genotypic information, with environmental information, and with both. In each scenario, we illustrate the practicality of prediction models, the advantages of machine learning, and the challenges of modeling complex relationships. We discuss the promising potential of leveraging machine learning and genetics theories to develop models that can predict phenotype and also interpret the biological consequences of changes in genotype and environment.
Collapse
Affiliation(s)
- Tingting Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| | - Xianran Li
- USDA, Agricultural Research Service, Wheat Health, Genetics, and Quality Research Unit, Pullman, WA 99164, USA; Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA.
| |
Collapse
|
21
|
Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H. DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOLECULAR PLANT 2023; 16:279-293. [PMID: 36366781 DOI: 10.1016/j.molp.2022.11.004] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
Genomic prediction is an effective way to accelerate the rate of agronomic trait improvement in plants. Traditional methods typically use linear regression models with clear assumptions; such methods are unable to capture the complex relationships between genotypes and phenotypes. Non-linear models (e.g., deep neural networks) have been proposed as a superior alternative to linear models because they can capture complex non-additive effects. Here we introduce a deep learning (DL) method, deep neural network genomic prediction (DNNGP), for integration of multi-omics data in plants. We trained DNNGP on four datasets and compared its performance with methods built with five classic models: genomic best linear unbiased prediction (GBLUP); two methods based on a machine learning (ML) framework, light gradient boosting machine (LightGBM) and support vector regression (SVR); and two methods based on a DL framework, deep learning genomic selection (DeepGS) and deep learning genome-wide association study (DLGWAS). DNNGP is novel in five ways. First, it can be applied to a variety of omics data to predict phenotypes. Second, the multilayered hierarchical structure of DNNGP dynamically learns features from raw data, avoiding overfitting and improving the convergence rate using a batch normalization layer and early stopping and rectified linear activation (rectified linear unit) functions. Third, when small datasets were used, DNNGP produced results that are competitive with results from the other five methods, showing greater prediction accuracy than the other methods when large-scale breeding data were used. Fourth, the computation time required by DNNGP was comparable with that of commonly used methods, up to 10 times faster than DeepGS. Fifth, hyperparameters can easily be batch tuned on a local machine. Compared with GBLUP, LightGBM, SVR, DeepGS and DLGWAS, DNNGP is superior to these existing widely used genomic selection (GS) methods. Moreover, DNNGP can generate robust assessments from diverse datasets, including omics data, and quickly incorporate complex and large datasets into usable models, making it a promising and practical approach for straightforward integration into existing GS platforms.
Collapse
Affiliation(s)
- Kelin Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China
| | | | - Awais Rasheed
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Department of Plant Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Sarah Hearne
- International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, Texcoco, D.F. 06600, Mexico
| | - Huihui Li
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT - China Office, 12 Zhongguancun South Street, Beijing 100081, China; Nanfan Research Institute, CAAS, Sanya, Hainan 572024, China.
| |
Collapse
|
22
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
23
|
Abstract
AbstractRapid advances in hardware and software, accompanied by public- and private-sector investment, have led to a new generation of data-driven computational tools. Recently, there has been a particular focus on deep learning—a class of machine learning algorithms that uses deep neural networks to identify patterns in large and heterogeneous datasets. These developments have been accompanied by both hype and scepticism by ecologists and others. This review describes the context in which deep learning methods have emerged, the deep learning methods most relevant to ecosystem ecologists, and some of the problem domains they have been applied to. Deep learning methods have high predictive performance in a range of ecological contexts, leveraging the large data resources now available. Furthermore, deep learning tools offer ecosystem ecologists new ways to learn about ecosystem dynamics. In particular, recent advances in interpretable machine learning and in developing hybrid approaches combining deep learning and mechanistic models provide a bridge between pure prediction and causal explanation. We conclude by looking at the opportunities that deep learning tools offer ecosystem ecologists and assess the challenges in interpretability that deep learning applications pose.
Collapse
|