1
|
Bose S, Banerjee S, Kumar S, Saha A, Nandy D, Hazra S. Review of applications of artificial intelligence (AI) methods in crop research. J Appl Genet 2024; 65:225-240. [PMID: 38216788 DOI: 10.1007/s13353-023-00826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 12/23/2023] [Accepted: 12/26/2023] [Indexed: 01/14/2024]
Abstract
Sophisticated and modern crop improvement techniques can bridge the gap for feeding the ever-increasing population. Artificial intelligence (AI) refers to the simulation of human intelligence in machines, which refers to the application of computational algorithms, machine learning (ML) and deep learning (DL) techniques. This is aimed to generalise patterns and relationships from historical data, employing various mathematical optimisation techniques thus making prediction models for facilitating selection of superior genotypes. These techniques are less resource intensive and can solve the problem based on the analysis of large-scale phenotypic datasets. ML for genomic selection (GS) uses high-throughput genotyping technologies to gather genetic information on a large number of markers across the genome. The prediction of GS models is based on the mathematical relation between genotypic and phenotypic data from the training population. ML techniques have emerged as powerful tools for genome editing through analysing large-scale genomic data and facilitating the development of accurate prediction models. Precise phenotyping is a prerequisite to advance crop breeding for solving agricultural production-related issues. ML algorithms can solve this problem through generating predictive models, based on the analysis of large-scale phenotypic datasets. DL models also have the potential reliability of precise phenotyping. This review provides a comprehensive overview on various ML and DL models, their applications, potential to enhance the efficiency, specificity and safety towards advanced crop improvement protocols such as genomic selection, genome editing, along with phenotypic prediction to promote accelerated breeding.
Collapse
Affiliation(s)
- Suvojit Bose
- Department of Vegetables and Spice Crops, Uttar Banga Krishi Viswavidyalaya, Pundibari, Cooch Behar, 736165, West Bengal, India
| | | | - Soumya Kumar
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Akash Saha
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Debalina Nandy
- School of Agricultural Sciences, JIS University, Kolkata, 700109, West Bengal, India
| | - Soham Hazra
- Department of Agriculture, Brainware University, Barasat, 700125, West Bengal, India.
| |
Collapse
|
2
|
Murmu S, Sinha D, Chaurasia H, Sharma S, Das R, Jha GK, Archak S. A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions. FRONTIERS IN PLANT SCIENCE 2024; 15:1292054. [PMID: 38504888 PMCID: PMC10948452 DOI: 10.3389/fpls.2024.1292054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 01/24/2024] [Indexed: 03/21/2024]
Abstract
Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases.
Collapse
Affiliation(s)
- Sneha Murmu
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Dipro Sinha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Himanshushekhar Chaurasia
- Central Institute for Research on Cotton Technology, Indian Council of Agricultural Research (ICAR), Mumbai, India
| | - Soumya Sharma
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Ritwika Das
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Girish Kumar Jha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Sunil Archak
- National Bureau of Plant Genetic Resources, Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
3
|
Fong WJ, Tan HM, Garg R, Teh AL, Pan H, Gupta V, Krishna B, Chen ZH, Purwanto NY, Yap F, Tan KH, Chan KYJ, Chan SY, Goh N, Rane N, Tan ESE, Jiang Y, Han M, Meaney M, Wang D, Keppo J, Tan GCY. Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation. Front Neuroinform 2024; 17:1244336. [PMID: 38449836 PMCID: PMC10915285 DOI: 10.3389/fninf.2023.1244336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/18/2023] [Indexed: 03/08/2024] Open
Abstract
Introduction Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort. Methods Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models' performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites. Results Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model. Discussion The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.
Collapse
Affiliation(s)
- Wei Jing Fong
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Hong Ming Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Rishabh Garg
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Ai Ling Teh
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Hong Pan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Varsha Gupta
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Bernadus Krishna
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Zou Hui Chen
- Computational Biology, National University of Singapore, Singapore, Singapore
| | | | - Fabian Yap
- KK Women's and Children's Hospital, Singapore, Singapore
| | - Kok Hian Tan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Kok Yen Jerry Chan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Shiao-Yng Chan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National University Hospital, Singapore, Singapore
| | | | - Nikita Rane
- Institute of Mental Health,Singapore, Singapore
| | | | | | - Mei Han
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Michael Meaney
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Dennis Wang
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Jussi Keppo
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Geoffrey Chern-Yee Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
- Institute of Mental Health,Singapore, Singapore
| |
Collapse
|
4
|
Comajoan Cara M, Mas Montserrat D, Ioannidis AG. PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024; 29:327-340. [PMID: 38160290 PMCID: PMC10906137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.
Collapse
Affiliation(s)
- Marçal Comajoan Cara
- Dept. of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA2Dept. of Signal Theory & Communications, Universitat Politècnica de Catalunya, Barcelona, Spain
| | | | | |
Collapse
|
5
|
Cara MC, Montserrat DM, Ioannidis AG. PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.10.561715. [PMID: 37873492 PMCID: PMC10592760 DOI: 10.1101/2023.10.10.561715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.
Collapse
Affiliation(s)
- Marçal Comajoan Cara
- Dept. of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Dept. of Signal Theory & Communications, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Daniel Mas Montserrat
- Dept. of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexander G Ioannidis
- Dept. of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Dept. of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Institute for Computational & Mathematical Engineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
6
|
Hayes BJ, Chen C, Powell O, Dinglasan E, Villiers K, Kemper KE, Hickey LT. Advancing artificial intelligence to help feed the world. Nat Biotechnol 2023; 41:1188-1189. [PMID: 37524959 DOI: 10.1038/s41587-023-01898-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Affiliation(s)
- Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia.
| | - Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Kira Villiers
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Kathryn E Kemper
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Lee T Hickey
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
7
|
Bercovich N, Genze N, Todesco M, Owens GL, Légaré JS, Huang K, Rieseberg LH, Grimm DG. HeliantHOME, a public and centralized database of phenotypic sunflower data. Sci Data 2022; 9:735. [PMID: 36450875 PMCID: PMC9712528 DOI: 10.1038/s41597-022-01842-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 11/11/2022] [Indexed: 12/02/2022] Open
Abstract
Genomic studies often attempt to link natural genetic variation with important phenotypic variation. To succeed, robust and reliable phenotypic data, as well as curated genomic assemblies, are required. Wild sunflowers, originally from North America, are adapted to diverse and often extreme environments and have historically been a widely used model plant system for the study of population genomics, adaptation, and speciation. Moreover, cultivated sunflower, domesticated from a wild relative (Helianthus annuus) is a global oil crop, ranking fourth in production of vegetable oils worldwide. Public availability of data resources both for the plant research community and for the associated agricultural sector, are extremely valuable. We have created HeliantHOME ( http://www.helianthome.org ), a curated, public, and interactive database of phenotypes including developmental, structural and environmental ones, obtained from a large collection of both wild and cultivated sunflower individuals. Additionally, the database is enriched with external genomic data and results of genome-wide association studies. Finally, being a community open-source platform, HeliantHOME is expected to expand as new knowledge and resources become available.
Collapse
Affiliation(s)
- Natalia Bercovich
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Nikita Genze
- grid.6936.a0000000123222966Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany ,grid.4819.40000 0001 0704 7467Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany
| | - Marco Todesco
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Gregory L. Owens
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada ,grid.143640.40000 0004 1936 9465Department of Biology, University of Victoria, Victoria, BC Canada
| | - Jean-Sébastien Légaré
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada ,grid.17091.3e0000 0001 2288 9830Department of Computer Science, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Data Science Institute, University of British Columbia, Vancouver, British Columbia Canada
| | - Kaichi Huang
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Loren H. Rieseberg
- grid.17091.3e0000 0001 2288 9830Department of Botany, University of British Columbia, Vancouver, British Columbia Canada ,grid.17091.3e0000 0001 2288 9830Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Dominik G. Grimm
- grid.6936.a0000000123222966Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany ,grid.4819.40000 0001 0704 7467Weihenstephan-Triesdorf University of Applied Sciences, Straubing, Germany ,grid.6936.a0000000123222966Technical University of Munich, Department of Informatics, Garching, Germany
| |
Collapse
|