1
|
Zhu W, Li W, Zhang H, Li L. Big data and artificial intelligence-aided crop breeding: Progress and prospects. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2025; 67:722-739. [PMID: 39467106 PMCID: PMC11951406 DOI: 10.1111/jipb.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/25/2024] [Accepted: 09/10/2024] [Indexed: 10/30/2024]
Abstract
The past decade has witnessed rapid developments in gene discovery, biological big data (BBD), artificial intelligence (AI)-aided technologies, and molecular breeding. These advancements are expected to accelerate crop breeding under the pressure of increasing demands for food. Here, we first summarize current breeding methods and discuss the need for new ways to support breeding efforts. Then, we review how to combine BBD and AI technologies for genetic dissection, exploring functional genes, predicting regulatory elements and functional domains, and phenotypic prediction. Finally, we propose the concept of intelligent precision design breeding (IPDB) driven by AI technology and offer ideas about how to implement IPDB. We hope that IPDB will enhance the predictability, efficiency, and cost of crop breeding compared with current technologies. As an example of IPDB, we explore the possibilities offered by CropGPT, which combines biological techniques, bioinformatics, and breeding art from breeders, and presents an open, shareable, and cooperative breeding system. IPDB provides integrated services and communication platforms for biologists, bioinformatics experts, germplasm resource specialists, breeders, dealers, and farmers, and should be well suited for future breeding.
Collapse
Affiliation(s)
- Wanchao Zhu
- Key Laboratory of Biology and Genetic Improvement of Maize in Arid Area of Northwest Region, College of AgronomyNorthwest A&F UniversityYangling712100China
- National Key Laboratory of Crop Genetic ImprovementHuazhong Agricultural UniversityWuhan430070China
| | - Weifu Li
- College of InformaticsHuazhong Agricultural UniversityWuhan430070China
- Engineering Research Center of Intelligent Technology for AgricultureMinistry of EducationWuhan430070China
| | - Hongwei Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijing100081China
| | - Lin Li
- National Key Laboratory of Crop Genetic ImprovementHuazhong Agricultural UniversityWuhan430070China
| |
Collapse
|
2
|
Liu Z, Wang S, Zhang Y, Feng Y, Liu J, Zhu H. Artificial Intelligence in Food Safety: A Decade Review and Bibliometric Analysis. Foods 2023; 12:1242. [PMID: 36981168 PMCID: PMC10048131 DOI: 10.3390/foods12061242] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/06/2023] [Accepted: 03/09/2023] [Indexed: 03/17/2023] Open
Abstract
Artificial Intelligence (AI) technologies have been powerful solutions used to improve food yield, quality, and nutrition, increase safety and traceability while decreasing resource consumption, and eliminate food waste. Compared with several qualitative reviews on AI in food safety, we conducted an in-depth quantitative and systematic review based on the Core Collection database of WoS (Web of Science). To discover the historical trajectory and identify future trends, we analysed the literature concerning AI technologies in food safety from 2012 to 2022 by CiteSpace. In this review, we used bibliometric methods to describe the development of AI in food safety, including performance analysis, science mapping, and network analysis by CiteSpace. Among the 1855 selected articles, China and the United States contributed the most literature, and the Chinese Academy of Sciences released the largest number of relevant articles. Among all the journals in this field, PLoS ONE and Computers and Electronics in Agriculture ranked first and second in terms of annual publications and co-citation frequency. The present character, hot spots, and future research trends of AI technologies in food safety research were determined. Furthermore, based on our analyses, we provide researchers, practitioners, and policymakers with the big picture of research on AI in food safety across the whole process, from precision agriculture to precision nutrition, through 28 enlightening articles.
Collapse
Affiliation(s)
- Zhe Liu
- School of Management, Henan University of Technology, Zhengzhou 450001, China
| | - Shuzhe Wang
- School of Management, Henan University of Technology, Zhengzhou 450001, China
| | - Yudong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| | - Yichen Feng
- School of Management, Henan University of Technology, Zhengzhou 450001, China
| | - Jiajia Liu
- School of Management, Henan University of Technology, Zhengzhou 450001, China
| | - Hengde Zhu
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
3
|
Yan J, Wang X. Machine learning bridges omics sciences and plant breeding. TRENDS IN PLANT SCIENCE 2023; 28:199-210. [PMID: 36153276 DOI: 10.1016/j.tplants.2022.08.018] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/15/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
Some of the biological knowledge obtained from fundamental research will be implemented in applied plant breeding. To bridge basic research and breeding practice, machine learning (ML) holds great promise to translate biological knowledge and omics data into precision-designed plant breeding. Here, we review ML for multi-omics analysis in plants, including data dimensionality reduction, inference of gene-regulation networks, and gene discovery and prioritization. These applications will facilitate understanding trait regulation mechanisms and identifying target genes potentially applicable to knowledge-driven molecular design breeding. We also highlight applications of deep learning in plant phenomics and ML in genomic selection-assisted breeding, such as various ML algorithms that model the correlations among genotypes (genes), phenotypes (traits), and environments, to ultimately achieve data-driven genomic design breeding.
Collapse
Affiliation(s)
- Jun Yan
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China.
| |
Collapse
|
4
|
Chau T, Timilsena P, Li S. Gene Regulatory Network Modeling Using Single-Cell Multi-Omics in Plants. Methods Mol Biol 2023; 2698:259-275. [PMID: 37682480 DOI: 10.1007/978-1-0716-3354-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Single-cell multi-omics technology can be applied to plant cells to characterize gene expression and open chromatin regions in individual cells. In this chapter, we describe a computational pipeline for the analysis of single-cell data to construct gene regulatory networks. The major steps of this pipeline include the following: (1) normalize and integrate scRNA-seq and scATAC-seq data (2) identify cluster maker genes (3) perform motif finding for selected marker genes, and (4) identify regulatory networks with machine learning. The pipeline has been tested using data from the model species Arabidopsis and is generally applicable to other plant and animal species to characterize regulatory networks using single-cell multi-omics data.
Collapse
Affiliation(s)
- Tran Chau
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Blacksburg, VA, USA
| | - Prakash Timilsena
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Song Li
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Blacksburg, VA, USA.
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
5
|
Abdelsalam IM, Ghosh S, AlKafaas SS, Bedair H, Malloum A, ElKafas SS, Saad-Allah KM. Nanotechnology as a tool for abiotic stress mitigation in horticultural crops. Biologia (Bratisl) 2022. [DOI: 10.1007/s11756-022-01251-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
6
|
Silva JCF, Ferreira MA, Carvalho TFM, Silva FF, de A. Silveira S, Brommonschenkel SH, Fontes EPB. RLPredictiOme, a Machine Learning-Derived Method for High-Throughput Prediction of Plant Receptor-like Proteins, Reveals Novel Classes of Transmembrane Receptors. Int J Mol Sci 2022; 23:12176. [PMID: 36293031 PMCID: PMC9603095 DOI: 10.3390/ijms232012176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 10/08/2022] [Accepted: 10/09/2022] [Indexed: 11/16/2022] Open
Abstract
Cell surface receptors play essential roles in perceiving and processing external and internal signals at the cell surface of plants and animals. The receptor-like protein kinases (RLK) and receptor-like proteins (RLPs), two major classes of proteins with membrane receptor configuration, play a crucial role in plant development and disease defense. Although RLPs and RLKs share a similar single-pass transmembrane configuration, RLPs harbor short divergent C-terminal regions instead of the conserved kinase domain of RLKs. This RLP receptor structural design precludes sequence comparison algorithms from being used for high-throughput predictions of the RLP family in plant genomes, as has been extensively performed for RLK superfamily predictions. Here, we developed the RLPredictiOme, implemented with machine learning models in combination with Bayesian inference, capable of predicting RLP subfamilies in plant genomes. The ML models were simultaneously trained using six types of features, along with three stages to distinguish RLPs from non-RLPs (NRLPs), RLPs from RLKs, and classify new subfamilies of RLPs in plants. The ML models achieved high accuracy, precision, sensitivity, and specificity for predicting RLPs with relatively high probability ranging from 0.79 to 0.99. The prediction of the method was assessed with three datasets, two of which contained leucine-rich repeats (LRR)-RLPs from Arabidopsis and rice, and the last one consisted of the complete set of previously described Arabidopsis RLPs. In these validation tests, more than 90% of known RLPs were correctly predicted via RLPredictiOme. In addition to predicting previously characterized RLPs, RLPredictiOme uncovered new RLP subfamilies in the Arabidopsis genome. These include probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, and glycerophosphoryldiester phosphodiesterase (GDPD, GDPDL)-RLP subfamilies, yet to be characterized. Compared to the only Arabidopsis GDPDL-RLK, molecular evolution studies confirmed that the ectodomain of GDPDL-RLPs might have undergone a purifying selection with a predominance of synonymous substitutions. Expression analyses revealed that predicted GDPGL-RLPs display a basal expression level and respond to developmental and biotic signals. The results of these biological assays indicate that these subfamily members have maintained functional domains during evolution and may play relevant roles in development and plant defense. Therefore, RLPredictiOme provides a framework for genome-wide surveys of the RLP superfamily as a foundation to rationalize functional studies of surface receptors and their relationships with different biological processes.
Collapse
Affiliation(s)
- Jose Cleydson F. Silva
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Viçosa 36570-900, Brazil
| | - Marco Aurélio Ferreira
- Departament of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| | - Thales F. M. Carvalho
- Institute of Engineering, Science and Technology, Universidade Federal dos Vales do Jequitinhonha e Mucuri, Janaúba 39447-814, Brazil
| | - Fabyano F. Silva
- Departament of Animal Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| | - Sabrina de A. Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| | | | - Elizabeth P. B. Fontes
- Departament of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| |
Collapse
|
7
|
Khan MHU, Wang S, Wang J, Ahmar S, Saeed S, Khan SU, Xu X, Chen H, Bhat JA, Feng X. Applications of Artificial Intelligence in Climate-Resilient Smart-Crop Breeding. Int J Mol Sci 2022; 23:11156. [PMID: 36232455 PMCID: PMC9570104 DOI: 10.3390/ijms231911156] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 11/21/2022] Open
Abstract
Recently, Artificial intelligence (AI) has emerged as a revolutionary field, providing a great opportunity in shaping modern crop breeding, and is extensively used indoors for plant science. Advances in crop phenomics, enviromics, together with the other "omics" approaches are paving ways for elucidating the detailed complex biological mechanisms that motivate crop functions in response to environmental trepidations. These "omics" approaches have provided plant researchers with precise tools to evaluate the important agronomic traits for larger-sized germplasm at a reduced time interval in the early growth stages. However, the big data and the complex relationships within impede the understanding of the complex mechanisms behind genes driving the agronomic-trait formations. AI brings huge computational power and many new tools and strategies for future breeding. The present review will encompass how applications of AI technology, utilized for current breeding practice, assist to solve the problem in high-throughput phenotyping and gene functional analysis, and how advances in AI technologies bring new opportunities for future breeding, to make envirotyping data widely utilized in breeding. Furthermore, in the current breeding methods, linking genotype to phenotype remains a massive challenge and impedes the optimal application of high-throughput field phenotyping, genomics, and enviromics. In this review, we elaborate on how AI will be the preferred tool to increase the accuracy in high-throughput crop phenotyping, genotyping, and envirotyping data; moreover, we explore the developing approaches and challenges for multiomics big computing data integration. Therefore, the integration of AI with "omics" tools can allow rapid gene identification and eventually accelerate crop-improvement programs.
Collapse
Affiliation(s)
- Muhammad Hafeez Ullah Khan
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| | - Shoudong Wang
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| | - Jun Wang
- Zhejiang Lab, Hangzhou 310012, China
| | - Sunny Ahmar
- Institute of Biology, Biotechnology and Environmental Protection, Faculty of Natural Sciences, University of Silesia, Jagiellonska 28, 40-032 Katowice, Poland
| | - Sumbul Saeed
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Shahid Ullah Khan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | | | | | | | - Xianzhong Feng
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
- Zhejiang Lab, Hangzhou 310012, China
| |
Collapse
|
8
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
9
|
Wang P, Moore BM, Uygun S, Lehti-Shiu MD, Barry CS, Shiu SH. Optimising the use of gene expression data to predict plant metabolic pathway memberships. THE NEW PHYTOLOGIST 2021; 231:475-489. [PMID: 33749860 DOI: 10.1111/nph.17355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/13/2021] [Indexed: 06/12/2023]
Abstract
Plant metabolites from diverse pathways are important for plant survival, human nutrition and medicine. The pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilising > 600 tomato (Solanum lycopersicum) expression data combinations, three strategies for predicting memberships in 85 pathways were explored. Optimal predictions for different pathways require distinct data combinations indicative of pathway functions. Naive prediction (i.e. identifying pathways with the most similarly expressed genes) is error prone. In 52 pathways, unsupervised learning performed better than supervised approaches, possibly due to limited training data availability. Using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on expression levels. Using 36 experimental validated genes, the pathway-best model prediction accuracy is 58.3%, significantly better compared with that for predicting annotated genes without experimental evidence (37.0%) or random guess (1.2%), demonstrating the importance of data quality. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximise the accuracy of metabolic pathway membership assignment. The prediction framework outlined here can be applied to other species and serves as a baseline model for future comparisons.
Collapse
Affiliation(s)
- Peipei Wang
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Bethany M Moore
- Department of Botany, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | - Melissa D Lehti-Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Cornelius S Barry
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
10
|
Zenda T, Liu S, Dong A, Duan H. Advances in Cereal Crop Genomics for Resilience under Climate Change. Life (Basel) 2021; 11:502. [PMID: 34072447 PMCID: PMC8228855 DOI: 10.3390/life11060502] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 12/12/2022] Open
Abstract
Adapting to climate change, providing sufficient human food and nutritional needs, and securing sufficient energy supplies will call for a radical transformation from the current conventional adaptation approaches to more broad-based and transformative alternatives. This entails diversifying the agricultural system and boosting productivity of major cereal crops through development of climate-resilient cultivars that can sustainably maintain higher yields under climate change conditions, expanding our focus to crop wild relatives, and better exploitation of underutilized crop species. This is facilitated by the recent developments in plant genomics, such as advances in genome sequencing, assembly, and annotation, as well as gene editing technologies, which have increased the availability of high-quality reference genomes for various model and non-model plant species. This has necessitated genomics-assisted breeding of crops, including underutilized species, consequently broadening genetic variation of the available germplasm; improving the discovery of novel alleles controlling important agronomic traits; and enhancing creation of new crop cultivars with improved tolerance to biotic and abiotic stresses and superior nutritive quality. Here, therefore, we summarize these recent developments in plant genomics and their application, with particular reference to cereal crops (including underutilized species). Particularly, we discuss genome sequencing approaches, quantitative trait loci (QTL) mapping and genome-wide association (GWAS) studies, directed mutagenesis, plant non-coding RNAs, precise gene editing technologies such as CRISPR-Cas9, and complementation of crop genotyping by crop phenotyping. We then conclude by providing an outlook that, as we step into the future, high-throughput phenotyping, pan-genomics, transposable elements analysis, and machine learning hold much promise for crop improvements related to climate resilience and nutritional superiority.
Collapse
Affiliation(s)
- Tinashe Zenda
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Science, Faculty of Agriculture and Environmental Science, Bindura University of Science Education, Bindura P. Bag 1020, Zimbabwe
| | - Songtao Liu
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| | - Anyi Dong
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| | - Huijun Duan
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| |
Collapse
|
11
|
Serrano-Ron L, Cabrera J, Perez-Garcia P, Moreno-Risueno MA. Unraveling Root Development Through Single-Cell Omics and Reconstruction of Gene Regulatory Networks. FRONTIERS IN PLANT SCIENCE 2021; 12:661361. [PMID: 34017350 PMCID: PMC8129646 DOI: 10.3389/fpls.2021.661361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 03/25/2021] [Indexed: 05/30/2023]
Abstract
Over the last decades, research on postembryonic root development has been facilitated by "omics" technologies. Among these technologies, microarrays first, and RNA sequencing (RNA-seq) later, have provided transcriptional information on the underlying molecular processes establishing the basis of System Biology studies in roots. Cell fate specification and development have been widely studied in the primary root, which involved the identification of many cell type transcriptomes and the reconstruction of gene regulatory networks (GRN). The study of lateral root (LR) development has not been an exception. However, the molecular mechanisms regulating cell fate specification during LR formation remain largely unexplored. Recently, single-cell RNA-seq (scRNA-seq) studies have addressed the specification of tissues from stem cells in the primary root. scRNA-seq studies are anticipated to be a useful approach to decipher cell fate specification and patterning during LR formation. In this review, we address the different scRNA-seq strategies used both in plants and animals and how we could take advantage of scRNA-seq to unravel new regulatory mechanisms and reconstruct GRN. In addition, we discuss how to integrate scRNA-seq results with previous RNA-seq datasets and GRN. We also address relevant findings obtained through single-cell based studies and how LR developmental studies could be facilitated by scRNA-seq approaches and subsequent GRN inference. The use of single-cell approaches to investigate LR formation could help to decipher fundamental biological mechanisms such as cell memory, synchronization, polarization, or pluripotency.
Collapse
Affiliation(s)
| | | | | | - Miguel A. Moreno-Risueno
- Centro de Biotecnología y Genómica de Plantas (Universidad Politécnica de Madrid–Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria), Campus de Montegancedo, Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
12
|
Zaborowski AB, Walther D. Determinants of correlated expression of transcription factors and their target genes. Nucleic Acids Res 2020; 48:11347-11369. [PMID: 33104784 PMCID: PMC7672440 DOI: 10.1093/nar/gkaa927] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 10/01/2020] [Accepted: 10/06/2020] [Indexed: 11/14/2022] Open
Abstract
While transcription factors (TFs) are known to regulate the expression of their target genes (TGs), only a weak correlation of expression between TFs and their TGs has generally been observed. As lack of correlation could be caused by additional layers of regulation, the overall correlation distribution may hide the presence of a subset of regulatory TF-TG pairs with tight expression coupling. Using reported regulatory pairs in the plant Arabidopsis thaliana along with comprehensive gene expression information and testing a wide array of molecular features, we aimed to discern the molecular determinants of high expression correlation of TFs and their TGs. TF-family assignment, stress-response process involvement, short genomic distances of the TF-binding sites to the transcription start site of their TGs, few required protein-protein-interaction connections to establish physical interactions between the TF and polymerase-II, unambiguous TF-binding motifs, increased numbers of miRNA target-sites in TF-mRNAs, and a young evolutionary age of TGs were found particularly indicative of high TF-TG correlation. The modulating roles of post-transcriptional, post-translational processes, and epigenetic factors have been characterized as well. Our study reveals that regulatory pairs with high expression coupling are associated with specific molecular determinants.
Collapse
Affiliation(s)
- Adam B Zaborowski
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
13
|
Tong H, Madison I, Long TA, Williams CM. Computational solutions for modeling and controlling plant response to abiotic stresses: a review with focus on iron deficiency. CURRENT OPINION IN PLANT BIOLOGY 2020; 57:8-15. [PMID: 32619968 DOI: 10.1016/j.pbi.2020.05.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 05/15/2020] [Accepted: 05/23/2020] [Indexed: 06/11/2023]
Abstract
Computational solutions enable plant scientists to model protein-mediated stress responses and characterize novel gene functions that coordinate responses to a variety of abiotic stress conditions. Recently, density functional theory was used to study proteins active sites and elucidate enzyme conversion mechanisms involved in iron deficiency responsive signaling pathways. Computational approaches for protein homology modeling and the kinetic modeling of signaling pathways have also resolved the identity and function in proteins involved in iron deficiency signaling pathways. Significant changes in gene relationships under other stress conditions, such as heat or drought stress, have been recently identified using differential network analysis, suggesting that stress tolerance is achieved through asynchronous control. Moreover, the increasing development and use of statistical modeling and systematic modeling of transcriptomic data have provided significant insight into the gene regulatory mechanisms associated with abiotic stress responses. These types of in silico approaches have facilitated the plant science community's future goals of developing multi-scale models of responses to iron deficiency stress and other abiotic stress conditions.
Collapse
Affiliation(s)
- Haonan Tong
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
| | - Imani Madison
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Terri A Long
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| | - Cranos M Williams
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA.
| |
Collapse
|
14
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised Learning of Gene Regulatory Networks. ACTA ACUST UNITED AC 2020; 5:e20106. [PMID: 32207875 DOI: 10.1002/cppb.20106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Identifying the entirety of gene regulatory interactions in a biological system offers the possibility to determine the key molecular factors that affect important traits on the level of cells, tissues, and whole organisms. Despite the development of experimental approaches and technologies for identification of direct binding of transcription factors (TFs) to promoter regions of downstream target genes, computational approaches that utilize large compendia of transcriptomics data are still the predominant methods used to predict direct downstream targets of TFs, and thus reconstruct genome-wide gene-regulatory networks (GRNs). These approaches can broadly be categorized into unsupervised and supervised, based on whether data about known, experimentally verified gene-regulatory interactions are used in the process of reconstructing the underlying GRN. Here, we first describe the generic steps of supervised approaches for GRN reconstruction, since they have been recently shown to result in improved accuracy of the resulting networks? We also illustrate how they can be used with data from model organisms to obtain more accurate prediction of gene regulatory interactions. © 2020 The Authors. Basic Protocol 1: Construction of features used in supervised learning of gene regulatory interactions Basic Protocol 2: Learning the non-interacting TF-gene pairs Basic Protocol 3: Learning a classifier for gene regulatory interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Systems Biology and Mathematical Modelling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modelling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.,Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
| |
Collapse
|
15
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 2020; 6:21. [PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 06/09/2020] [Indexed: 02/07/2023] Open
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany. .,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany.
| |
Collapse
|
16
|
Song Q, Lee J, Akter S, Rogers M, Grene R, Li S. Prediction of condition-specific regulatory genes using machine learning. Nucleic Acids Res 2020; 48:e62. [PMID: 32329779 PMCID: PMC7293043 DOI: 10.1093/nar/gkaa264] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/19/2020] [Accepted: 04/20/2020] [Indexed: 12/31/2022] Open
Abstract
Recent advances in genomic technologies have generated data on large-scale protein-DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5-25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.
Collapse
Affiliation(s)
- Qi Song
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
| | - Jiyoung Lee
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
| | - Shamima Akter
- School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA
| | - Matthew Rogers
- Department of Statistics. Virginia Tech., Blacksburg, VA 24061, USA
| | - Ruth Grene
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
- School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA
| | - Song Li
- Graduate program in Genetics, Bioinformatics and Computational Biology. Virginia Tech., Blacksburg, VA 24061, USA
- School of Plant and Environmental Sciences. Virginia Tech., Blacksburg, VA 24061, USA
| |
Collapse
|
17
|
Togninalli M, Seren Ü, Freudenthal JA, Monroe JG, Meng D, Nordborg M, Weigel D, Borgwardt K, Korte A, Grimm DG. AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana. Nucleic Acids Res 2020; 48:D1063-D1068. [PMID: 31642487 PMCID: PMC7145550 DOI: 10.1093/nar/gkz925] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 09/26/2019] [Accepted: 10/08/2019] [Indexed: 12/23/2022] Open
Abstract
Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.
Collapse
Affiliation(s)
- Matteo Togninalli
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Ümit Seren
- Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Jan A Freudenthal
- Center for Computational and Theoretical Biology, University Würzburg, Würzburg, Germany
| | - J Grey Monroe
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dazhe Meng
- Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
- Google, Mountain View, USA
| | - Magnus Nordborg
- Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Detlef Weigel
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Karsten Borgwardt
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University Würzburg, Würzburg, Germany
| | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| |
Collapse
|
18
|
Kimotho RN, Baillo EH, Zhang Z. Transcription factors involved in abiotic stress responses in Maize ( Zea mays L.) and their roles in enhanced productivity in the post genomics era. PeerJ 2019; 7:e7211. [PMID: 31328030 PMCID: PMC6622165 DOI: 10.7717/peerj.7211] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 05/26/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Maize (Zea mays L.) is a principal cereal crop cultivated worldwide for human food, animal feed, and more recently as a source of biofuel. However, as a direct consequence of water insufficiency and climate change, frequent occurrences of both biotic and abiotic stresses have been reported in various regions around the world, and recently, this has become a constant threat in increasing global maize yields. Plants respond to abiotic stresses by utilizing the activities of transcription factors (TFs), which are families of genes coding for specific TF proteins. TF target genes form a regulon that is involved in the repression/activation of genes associated with abiotic stress responses. Therefore, it is of utmost importance to have a systematic study on each TF family, the downstream target genes they regulate, and the specific TF genes involved in multiple abiotic stress responses in maize and other staple crops. METHOD In this review, the main TF families, the specific TF genes and their regulons that are involved in abiotic stress regulation will be briefly discussed. Great emphasis will be given on maize abiotic stress improvement throughout this review, although other examples from different plants like rice, Arabidopsis, wheat, and barley will be used. RESULTS We have described in detail the main TF families in maize that take part in abiotic stress responses together with their regulons. Furthermore, we have also briefly described the utilization of high-efficiency technologies in the study and characterization of TFs involved in the abiotic stress regulatory networks in plants with an emphasis on increasing maize production. Examples of these technologies include next-generation sequencing, microarray analysis, machine learning, and RNA-Seq. CONCLUSION In conclusion, it is expected that all the information provided in this review will in time contribute to the use of TF genes in the research, breeding, and development of new abiotic stress tolerant maize cultivars.
Collapse
Affiliation(s)
- Roy Njoroge Kimotho
- Key Laboratory of Agricultural Water Resources, Hebei Laboratory of Agricultural Water Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, Hebei, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Elamin Hafiz Baillo
- Key Laboratory of Agricultural Water Resources, Hebei Laboratory of Agricultural Water Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, Hebei, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhengbin Zhang
- Key Laboratory of Agricultural Water Resources, Hebei Laboratory of Agricultural Water Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, Hebei, China
- University of Chinese Academy of Sciences, Beijing, China
- Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
19
|
Silva JCF, Teixeira RM, Silva FF, Brommonschenkel SH, Fontes EPB. Machine learning approaches and their current application in plant molecular biology: A systematic review. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2019; 284:37-47. [PMID: 31084877 DOI: 10.1016/j.plantsci.2019.03.020] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 02/28/2019] [Accepted: 03/26/2019] [Indexed: 05/19/2023]
Abstract
Machine learning (ML) is a field of artificial intelligence that has rapidly emerged in molecular biology, thus allowing the exploitation of Big Data concepts in plant genomics. In this context, the main challenges are given in terms of how to analyze massive datasets and extract new knowledge in all levels of cellular systems research. In summary, ML techniques allow complex interactions to be inferred in several biological systems. Despite its potential, ML has been underused due to complex computational algorithms and definition terms. Therefore, a systematic review to disentangle ML approaches is relevant for plant scientists and has been considered in this study. We presented the main steps for ML development (from data selection to evaluation of classification/prediction models) with a respective discussion approaching functional genomics mainly in terms of pathogen effector genes in plant immunity. Additionally, we also considered how to access public source databases under an ML framework towards advancing plant molecular biology and introduced novel powerful tools, such as deep learning.
Collapse
Affiliation(s)
- Jose Cleydson F Silva
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Ruan M Teixeira
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Fabyano F Silva
- Department of Animal Science, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Sergio H Brommonschenkel
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Plant Pathology Department /Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Elizabeth P B Fontes
- National Institute of Science and Technology in Plant-Pest Interactions, Bioagro, Universidade Federal de Viçosa, Av. PH Rolfs s/n, Centro, Viçosa, MG, 36570-000, Brazil; Department of Biochemistry and Molecular Biology/Bioagro, Universidade Federal de Viçosa, Viçosa, MG, Brazil.
| |
Collapse
|
20
|
Predicting miRNA-lncRNA interactions and recognizing their regulatory roles in stress response of plants. Math Biosci 2019; 312:67-76. [PMID: 31034845 DOI: 10.1016/j.mbs.2019.04.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 03/28/2019] [Accepted: 04/23/2019] [Indexed: 02/02/2023]
Abstract
It has been found that each non-coding RNA (ncRNA) can act not only through its target gene, but also interact with each other to act on biological traits, and this interaction is more common. Many studies focus mainly on the analysis of microRNA(miRNA) and message RNA (mRNA) interactions. In this study, we investigated miRNA and long non-coding RNA (lncRNA) interactions using support vector regression (SVR) for prediction of new target genes in Arabidopsis thaliana and identify some regulatory roles in stress response. The networks of miRNA-mRNA, miRNA-lncRNA and miRNA-mRNA-lncRNA were constructed. They were further analyzed and interpreted in R. We showed that miRNA with low sequence number, targeted lncRNA with high sequence number and miRNA with high sequence number targeted lncRNA with low sequence number. The experimental results showed that there is a regulatory relationship between miRNA-lncRNA. New RNA targets were predicted using SVR with new gene expression mechanism and the stress related functions were annotated.
Collapse
|
21
|
Haque S, Ahmad JS, Clark NM, Williams CM, Sozzani R. Computational prediction of gene regulatory networks in plant growth and development. CURRENT OPINION IN PLANT BIOLOGY 2019; 47:96-105. [PMID: 30445315 DOI: 10.1016/j.pbi.2018.10.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/05/2018] [Accepted: 10/18/2018] [Indexed: 05/22/2023]
Abstract
Plants integrate a wide range of cellular, developmental, and environmental signals to regulate complex patterns of gene expression. Recent advances in genomic technologies enable differential gene expression analysis at a systems level, allowing for improved inference of the network of regulatory interactions between genes. These gene regulatory networks, or GRNs, are used to visualize the causal regulatory relationships between regulators and their downstream target genes. Accordingly, these GRNs can represent spatial, temporal, and/or environmental regulations and can identify functional genes. This review summarizes recent computational approaches applied to different types of gene expression data to infer GRNs in the context of plant growth and development. Three stages of GRN inference are described: first, data collection and analysis based on the dataset type; second, network inference application based on data availability and proposed hypotheses; and third, validation based on in silico, in vivo, and in planta methods. In addition, this review relates data collection strategies to biological questions, organizes inference algorithms based on statistical methods and data types, discusses experimental design considerations, and provides guidelines for GRN inference with an emphasis on the benefits of integrative approaches, especially when a priori information is limited. Finally, this review concludes that computational frameworks integrating large-scale heterogeneous datasets are needed for a more accurate (e.g. fewer false interactions), detailed (e.g. discrimination between direct versus indirect interactions), and comprehensive (e.g. genetic regulation under various conditions and spatial locations) inference of GRNs.
Collapse
Affiliation(s)
- Samiul Haque
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA
| | - Jabeen S Ahmad
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Natalie M Clark
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA
| | - Cranos M Williams
- Electrical and Computer Engineering, North Carolina State University, Raleigh, USA.
| | - Rosangela Sozzani
- Plant and Microbial Biology, North Carolina State University, Raleigh, USA.
| |
Collapse
|
22
|
Mochida K, Koda S, Inoue K, Nishii R. Statistical and Machine Learning Approaches to Predict Gene Regulatory Networks From Transcriptome Datasets. FRONTIERS IN PLANT SCIENCE 2018; 9:1770. [PMID: 30555503 PMCID: PMC6281826 DOI: 10.3389/fpls.2018.01770] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2018] [Accepted: 11/14/2018] [Indexed: 05/20/2023]
Abstract
Statistical and machine learning (ML)-based methods have recently advanced in construction of gene regulatory network (GRNs) based on high-throughput biological datasets. GRNs underlie almost all cellular phenomena; hence, comprehensive GRN maps are essential tools to elucidate gene function, thereby facilitating the identification and prioritization of candidate genes for functional analysis. High-throughput gene expression datasets have yielded various statistical and ML-based algorithms to infer causal relationship between genes and decipher GRNs. This review summarizes the recent advancements in the computational inference of GRNs, based on large-scale transcriptome sequencing datasets of model plants and crops. We highlight strategies to select contextual genes for GRN inference, and statistical and ML-based methods for inferring GRNs based on transcriptome datasets from plants. Furthermore, we discuss the challenges and opportunities for the elucidation of GRNs based on large-scale datasets obtained from emerging transcriptomic applications, such as from population-scale, single-cell level, and life-course transcriptome analyses.
Collapse
Affiliation(s)
- Keiichi Mochida
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
- Microalgae Production Control Technology Laboratory, RIKEN Baton Zone Program, RIKEN Cluster for Science, Technology and Innovation Hub, Yokohama, Japan
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan
- Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan
- *Correspondence: Keiichi Mochida, Ryuei Nishii,
| | - Satoru Koda
- Graduate School of Mathematics, Kyushu University, Fukuoka, Japan
| | - Komaki Inoue
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Ryuei Nishii
- Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan
- *Correspondence: Keiichi Mochida, Ryuei Nishii,
| |
Collapse
|
23
|
Gutiérrez S, Fernández-Novales J, Diago MP, Tardaguila J. On-The-Go Hyperspectral Imaging Under Field Conditions and Machine Learning for the Classification of Grapevine Varieties. FRONTIERS IN PLANT SCIENCE 2018; 9:1102. [PMID: 30090110 PMCID: PMC6068396 DOI: 10.3389/fpls.2018.01102] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 07/09/2018] [Indexed: 05/22/2023]
Abstract
Grapevine varietal classification is an important plant phenotyping issue for grape growing and wine industry. This task has been achieved from destructive techniques like classic ampelography and DNA analysis under laboratory conditions. This work displays a new approach for the classification of a high number of grapevine (Vitis vinifera L.) varieties under field conditions using on-the-go hyperspectral imaging and different machine learning algorithms. On-the-go imaging was performed under natural illumination using a hyperspectral camera mounted on an all-terrain vehicle at 5 km/h. Spectra were acquired over two different leaf phenological stages on the canopy of 30 different varieties on a commercial vineyard located in La Rioja, Spain. A total of 1,200 spectral samples were generated. Support vector machines (SVM) and artificial neural networks (multilayer perceptrons, MLP) were used for the development of a large number of models, testing different algorithm parameters and spectral pre-processing techniques. Both classifiers yielded notable performance values and were able to train models with recall F1 scores and area under the receiver operating characteristic curve marks up to 0.99 for 5-fold cross validation. Statistical analyses supported that the best SVM kernel was linear and the best activation function for MLP was the hyperbolic tangent function. The prediction performance for individual varieties of MLP ranged from 0.94 to 0.99, displaying low levels of variability. In the case of SVM, slightly higher differences were obtained, ranging from 0.83 to 0.97 for individual varieties. These results support the possibility of deploying an on-the-go hyperspectral imaging system in the field capable of successfully classifying leaves from different grapevine varieties. This technology could thus be considered as a new useful non-destructive tool for plant phenotyping under field conditions.
Collapse
|
24
|
Redekar N, Pilot G, Raboy V, Li S, Saghai Maroof MA. Inference of Transcription Regulatory Network in Low Phytic Acid Soybean Seeds. FRONTIERS IN PLANT SCIENCE 2017; 8:2029. [PMID: 29250090 PMCID: PMC5714895 DOI: 10.3389/fpls.2017.02029] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/14/2017] [Indexed: 05/26/2023]
Abstract
A dominant loss of function mutation in myo-inositol phosphate synthase (MIPS) gene and recessive loss of function mutations in two multidrug resistant protein type-ABC transporter genes not only reduce the seed phytic acid levels in soybean, but also affect the pathways associated with seed development, ultimately resulting in low emergence. To understand the regulatory mechanisms and identify key genes that intervene in the seed development process in low phytic acid crops, we performed computational inference of gene regulatory networks in low and normal phytic acid soybeans using a time course transcriptomic data and multiple network inference algorithms. We identified a set of putative candidate transcription factors and their regulatory interactions with genes that have functions in myo-inositol biosynthesis, auxin-ABA signaling, and seed dormancy. We evaluated the performance of our unsupervised network inference method by comparing the predicted regulatory network with published regulatory interactions in Arabidopsis. Some contrasting regulatory interactions were observed in low phytic acid mutants compared to non-mutant lines. These findings provide important hypotheses on expression regulation of myo-inositol metabolism and phytohormone signaling in developing low phytic acid soybeans. The computational pipeline used for unsupervised network learning in this study is provided as open source software and is freely available at https://lilabatvt.github.io/LPANetwork/.
Collapse
Affiliation(s)
- Neelam Redekar
- Department of Crop and Soil Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - Guillaume Pilot
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Tech, Blacksburg, VA, United States
| | - Victor Raboy
- National Small Grains Germplasm Research Center, Agricultural Research Service (USDA), Aberdeen, ID, United States
| | - Song Li
- Department of Crop and Soil Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| | - M. A. Saghai Maroof
- Department of Crop and Soil Environmental Sciences, Virginia Tech, Blacksburg, VA, United States
| |
Collapse
|
25
|
Haak DC, Fukao T, Grene R, Hua Z, Ivanov R, Perrella G, Li S. Multilevel Regulation of Abiotic Stress Responses in Plants. FRONTIERS IN PLANT SCIENCE 2017; 8:1564. [PMID: 29033955 PMCID: PMC5627039 DOI: 10.3389/fpls.2017.01564] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 08/28/2017] [Indexed: 05/18/2023]
Abstract
The sessile lifestyle of plants requires them to cope with stresses in situ. Plants overcome abiotic stresses by altering structure/morphology, and in some extreme conditions, by compressing the life cycle to survive the stresses in the form of seeds. Genetic and molecular studies have uncovered complex regulatory processes that coordinate stress adaptation and tolerance in plants, which are integrated at various levels. Investigating natural variation in stress responses has provided important insights into the evolutionary processes that shape the integrated regulation of adaptation and tolerance. This review primarily focuses on the current understanding of how transcriptional, post-transcriptional, post-translational, and epigenetic processes along with genetic variation orchestrate stress responses in plants. We also discuss the current and future development of computational tools to identify biologically meaningful factors from high dimensional, genome-scale data and construct the signaling networks consisting of these components.
Collapse
Affiliation(s)
- David C. Haak
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Tech, BlacksburgVA, United States
| | - Takeshi Fukao
- Department of Crop and Soil Environmental Sciences, Virginia Tech, BlacksburgVA, United States
| | - Ruth Grene
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Tech, BlacksburgVA, United States
| | - Zhihua Hua
- Department of Environmental and Plant Biology, Interdisciplinary Program in Molecular and Cellular Biology, Ohio University, AthensOH, United States
| | - Rumen Ivanov
- Institut für Botanik, Heinrich-Heine-Universität DüsseldorfDüsseldorf, Germany
| | - Giorgio Perrella
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of GlasgowGlasgow, United Kingdom
| | - Song Li
- Department of Crop and Soil Environmental Sciences, Virginia Tech, BlacksburgVA, United States
| |
Collapse
|