1
|
Lell M, Gogna A, Kloesgen V, Avenhaus U, Dörnte J, Eckhoff WM, Eschholz T, Gils M, Kirchhoff M, Koch M, Kollers S, Pfeiffer N, Rapp M, Wimmer V, Wolf M, Reif J, Zhao Y. Breaking down data silos across companies to train genome-wide predictions: A feasibility study in wheat. PLANT BIOTECHNOLOGY JOURNAL 2025. [PMID: 40253615 DOI: 10.1111/pbi.70095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 03/07/2025] [Accepted: 04/07/2025] [Indexed: 04/22/2025]
Abstract
Big data, combined with artificial intelligence (AI) techniques, holds the potential to significantly enhance the accuracy of genome-wide predictions. Motivated by the success reported for wheat hybrids, we extended the scope to inbred lines by integrating phenotypic and genotypic data from four commercial wheat breeding programs. Acting as an academic data trustee, we merged these data with historical experimental series from previous public-private partnerships. The integrated data spanned 12 years, 168 environments, and provided a genomic prediction training set of up to ~9500 genotypes for grain yield, plant height and heading date. Despite the heterogeneous phenotypic and genotypic data, we were able to obtain high-quality data by implementing rigorous data curation, including SNP imputation. We utilized the data to compare genomic best linear unbiased predictions with convolutional neural network-based genomic prediction. Our analysis revealed that we could flexibly combine experimental series for genomic prediction, with prediction ability steadily improving as the training set sizes increased, peaking at around 4000 genotypes. As training set sizes were further increased, the gains in prediction ability decreased, approaching a plateau well below the theoretical limit defined by the square root of the heritability. Potential avenues, such as designed training sets or novel non-linear prediction approaches, could overcome this plateau and help to more fully exploit the high-value big data generated by breaking down data silos across companies.
Collapse
Affiliation(s)
- Moritz Lell
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Abhishek Gogna
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Vincent Kloesgen
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Ulrike Avenhaus
- W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany
| | - Jost Dörnte
- Deutsche Saatveredelung AG, Lippstadt, Germany
| | | | | | - Mario Gils
- Nordsaat Saatzucht GmbH, Langenstein, Germany
| | | | | | | | | | - Matthias Rapp
- W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany
| | | | | | - Jochen Reif
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| | - Yusheng Zhao
- Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany
| |
Collapse
|
2
|
Mascher M, Jayakodi M, Shim H, Stein N. Promises and challenges of crop translational genomics. Nature 2024; 636:585-593. [PMID: 39313530 PMCID: PMC7616746 DOI: 10.1038/s41586-024-07713-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/13/2024] [Indexed: 09/25/2024]
Abstract
Crop translational genomics applies breeding techniques based on genomic datasets to improve crops. Technological breakthroughs in the past ten years have made it possible to sequence the genomes of increasing numbers of crop varieties and have assisted in the genetic dissection of crop performance. However, translating research findings to breeding applications remains challenging. Here we review recent progress and future prospects for crop translational genomics in bringing results from the laboratory to the field. Genetic mapping, genomic selection and sequence-assisted characterization and deployment of plant genetic resources utilize rapid genotyping of large populations. These approaches have all had an impact on breeding for qualitative traits, where single genes with large phenotypic effects exert their influence. Characterization of the complex genetic architectures that underlie quantitative traits such as yield and flowering time, especially in newly domesticated crops, will require further basic research, including research into regulation and interactions of genes and the integration of genomic approaches and high-throughput phenotyping, before targeted interventions can be designed. Future priorities for translation include supporting genomics-assisted breeding in low-income countries and adaptation of crops to changing environments.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Hyeonah Shim
- Department of Agriculture, Forestry and Bioresources, Plant Genomics and Breeding Institute, Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
- Martin Luther University Halle-Wittenberg, Halle, Germany.
| |
Collapse
|
3
|
Rjiba IB, Tóth-Nagy G, Rostási Á, Gyurácz-Németh P, Sebestyén V. How should climate actions be planned? Model lessons from published action plans. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 370:122648. [PMID: 39378801 DOI: 10.1016/j.jenvman.2024.122648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/12/2024] [Accepted: 09/22/2024] [Indexed: 10/10/2024]
Abstract
To effectively protect against the increasingly pervasive effects of climate change, countries and cities around the world are tasked with formulating and implementing climate actions that effectively respond to the challenges ahead. However, choosing the optimal climate actions is complex, since it is necessary to consider many external impacts as early on as the planning phase. Our novel methodology uncovers and integrates into first-of-its-kind decision support framework the identified climate actions of 443 European cities (from 32 countries) and the city structure-related features that influence the basic success of strategy creation into a first-of-its-kind decision support framework. Depending on their budget, population density, development and energy consumption portfolio, the results highlight that the analyzed European cities need to adopt a different way of thinking. The research results lay the foundation for the decision support of evidence-based climate action planning and contribute towards strengthening the role of cities worldwide in the fight against climate change in the future.
Collapse
Affiliation(s)
- Iskander Ben Rjiba
- Sustainability Solutions Research Lab, University of Pannonia, Egyetem str. 10, Veszprém, H-8200, Hungary.
| | - Georgina Tóth-Nagy
- Sustainability Solutions Research Lab, University of Pannonia, Egyetem str. 10, Veszprém, H-8200, Hungary
| | - Ágnes Rostási
- Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Egyetem str. 10, Veszprém, H-8200, Hungary
| | - Petra Gyurácz-Németh
- Department of Tourism, University of Pannonia, Egyetem str. 10, Veszprém, H-8200, Hungary
| | - Viktor Sebestyén
- Sustainability Solutions Research Lab, University of Pannonia, Egyetem str. 10, Veszprém, H-8200, Hungary
| |
Collapse
|
4
|
Mansueto L, Kretzschmar T, Mauleon R, King GJ. Building a community-driven bioinformatics platform to facilitate Cannabis sativa multi-omics research. GIGABYTE 2024; 2024:gigabyte137. [PMID: 39469541 PMCID: PMC11515022 DOI: 10.46471/gigabyte.137] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 10/06/2024] [Indexed: 10/30/2024] Open
Abstract
Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API's utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently. Availability and implementation The web portal can be accessed at www.icgrc.info.
Collapse
Affiliation(s)
- Locedie Mansueto
- Southern Cross University, Military Road, Lismore New South Wales, 2480, Australia
| | - Tobias Kretzschmar
- Southern Cross University, Military Road, Lismore New South Wales, 2480, Australia
| | - Ramil Mauleon
- Southern Cross University, Military Road, Lismore New South Wales, 2480, Australia
- International Rice Research Institute, Pili Drive, Los Baños Laguna, 4031, Philippines
| | - Graham J. King
- Southern Cross University, Military Road, Lismore New South Wales, 2480, Australia
- Recombics, Alstonville, New South Wales, 2480, Australia
| |
Collapse
|
5
|
García Brizuela J, Scharfenberg C, Scheuner C, Hoedt F, König P, Kranz A, Leidel A, Martini D, Schneider G, Schneider J, Singson LS, von Waldow H, Wehrmeyer N, Usadel B, Lesch S, Specka X, Lange M, Arend D. A roadmap for a middleware as a federation service for integrative data retrieval of agricultural data. J Integr Bioinform 2024; 21:jib-2024-0027. [PMID: 39501626 PMCID: PMC11602230 DOI: 10.1515/jib-2024-0027] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 06/04/2024] [Indexed: 11/29/2024] Open
Abstract
Agriculture is confronted with several challenges such as climate change, the loss of biodiversity and stagnating productivity. The massive increasing amount of data and new digital technologies promise to overcome them, but they necessitate careful data integration and data management to make them usable. The FAIRagro consortium is part of the National Research Data Infrastructure (NFDI) in Germany and will develop FAIR compliant infrastructure services for the agrosystems science community, which will be integrated in the existing research data infrastructure service landscape. Here we present the initial steps of designing and implementing the FAIRagro middleware infrastructure to connect existing data infrastructures. The middleware will feature services for the seamless data integration across diverse infrastructures. Data and metadata are streamlined for research in agrosystems science by downstream processing in the central FAIRagro Search and Inventory Portal and the data integration and analysis workflow system "SciWIn".
Collapse
Affiliation(s)
- Jorge García Brizuela
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466Gatersleben, Germany, https://www.ipk-gatersleben.de/
| | - Carsten Scharfenberg
- Leibniz Centre for Agricultural Landscape Research (ZALF), D-15374Müncheberg, Germany, https://www.zalf.de/
| | - Carmen Scheuner
- Senckenberg Museum of Natural History Görlitz, D-02826Görlitz, Germany, https://museumgoerlitz.senckenberg.de/
| | - Florian Hoedt
- Johann Heinrich von Thünen-Institut, D-38116Braunschweig, Germany, https://www.thuenen.de/
| | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466Gatersleben, Germany, https://www.ipk-gatersleben.de/
| | - Angela Kranz
- Forschungszentrum Jülich GmbH (FZJ), IBG-4, D-52428Jülich, Germany, https://www.fz-juelich.de/en
| | - Antonia Leidel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466Gatersleben, Germany, https://www.ipk-gatersleben.de/
| | - Daniel Martini
- Kuratorium für Technik und Bauwesen in der Landwirtschaft (KTBL), D-64289Darmstadt, Germany, https://www.ktbl.de/
| | - Gabriel Schneider
- ZB MED – Information Centre for Life Sciences, D-50931Cologne, Germany, https://www.zbmed.de/
| | - Julian Schneider
- ZB MED – Information Centre for Life Sciences, D-50931Cologne, Germany, https://www.zbmed.de/
| | - Lea Sophie Singson
- Leibniz-institute for Information Infrastructure (FIZ Karlsruhe), D-76344Karlsruhe, Germany, https://www.fiz-karlsruhe.de/
| | - Harald von Waldow
- Johann Heinrich von Thünen-Institut, D-38116Braunschweig, Germany, https://www.thuenen.de/
| | - Nils Wehrmeyer
- Forschungszentrum Jülich GmbH (FZJ), IBG-4, D-52428Jülich, Germany, https://www.fz-juelich.de/en
| | - Björn Usadel
- Forschungszentrum Jülich GmbH (FZJ), IBG-4, D-52428Jülich, Germany, https://www.fz-juelich.de/en
| | - Stephan Lesch
- Senckenberg Museum of Natural History Görlitz, D-02826Görlitz, Germany, https://museumgoerlitz.senckenberg.de/
| | - Xenia Specka
- Leibniz Centre for Agricultural Landscape Research (ZALF), D-15374Müncheberg, Germany, https://www.zalf.de/
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466Gatersleben, Germany, https://www.ipk-gatersleben.de/
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466Gatersleben, Germany, https://www.ipk-gatersleben.de/
| |
Collapse
|
6
|
Resende RT, Hickey L, Amaral CH, Peixoto LL, Marcatti GE, Xu Y. Satellite-enabled enviromics to enhance crop improvement. MOLECULAR PLANT 2024; 17:848-866. [PMID: 38637991 DOI: 10.1016/j.molp.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/04/2024] [Accepted: 04/11/2024] [Indexed: 04/20/2024]
Abstract
Enviromics refers to the characterization of micro- and macroenvironments based on large-scale environmental datasets. By providing genotypic recommendations with predictive extrapolation at a site-specific level, enviromics could inform plant breeding decisions across varying conditions and anticipate productivity in a changing climate. Enviromics-based integration of statistics, envirotyping (i.e., determining environmental factors), and remote sensing could help unravel the complex interplay of genetics, environment, and management. To support this goal, exhaustive envirotyping to generate precise environmental profiles would significantly improve predictions of genotype performance and genetic gain in crops. Already, informatics management platforms aggregate diverse environmental datasets obtained using optical, thermal, radar, and light detection and ranging (LiDAR)sensors that capture detailed information about vegetation, surface structure, and terrain. This wealth of information, coupled with freely available climate data, fuels innovative enviromics research. While enviromics holds immense potential for breeding, a few obstacles remain, such as the need for (1) integrative methodologies to systematically collect field data to scale and expand observations across the landscape with satellite data; (2) state-of-the-art AI models for data integration, simulation, and prediction; (3) cyberinfrastructure for processing big data across scales and providing seamless interfaces to deliver forecasts to stakeholders; and (4) collaboration and data sharing among farmers, breeders, physiologists, geoinformatics experts, and programmers across research institutions. Overcoming these challenges is essential for leveraging the full potential of big data captured by satellites to transform 21st century agriculture and crop improvement through enviromics.
Collapse
Affiliation(s)
- Rafael T Resende
- Universidade Federal de Goiás (UFG), Agronomy Department, Plant Breeding Sector, Goiânia (GO) 74690-900, Brazil; TheCROP, a Precision-Breeding Startup: Enviromics, Phenomics, and Genomics, No Zip-code, Operating Virtually, Goiânia (GO) and Sete Lagoas (MG), Brazil.
| | - Lee Hickey
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD, Australia
| | - Cibele H Amaral
- Earth Lab, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80303, USA; Environmental Data Science Innovation & Inclusion Lab, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80303, USA
| | - Lucas L Peixoto
- Universidade Federal de Goiás (UFG), Agronomy Department, Plant Breeding Sector, Goiânia (GO) 74690-900, Brazil
| | - Gustavo E Marcatti
- TheCROP, a Precision-Breeding Startup: Enviromics, Phenomics, and Genomics, No Zip-code, Operating Virtually, Goiânia (GO) and Sete Lagoas (MG), Brazil; Universidade Federal de São João del-Rei, Forest Engineering Department, Campus Sete Lagoas, Sete Lagoas (MG) 35701-970, Brazil
| | - Yunbi Xu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China; BGI Bioverse, Shenzhen 518083, China.
| |
Collapse
|
7
|
Morales N, Anche MT, Kaczmar NS, Lepak N, Ni P, Romay MC, Santantonio N, Buckler ES, Gore MA, Mueller LA, Robbins KR. Spatio-temporal modeling of high-throughput multispectral aerial images improves agronomic trait genomic prediction in hybrid maize. Genetics 2024; 227:iyae037. [PMID: 38469622 PMCID: PMC11075545 DOI: 10.1093/genetics/iyae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 12/02/2023] [Accepted: 02/18/2024] [Indexed: 03/13/2024] Open
Abstract
Design randomizations and spatial corrections have increased understanding of genotypic, spatial, and residual effects in field experiments, but precisely measuring spatial heterogeneity in the field remains a challenge. To this end, our study evaluated approaches to improve spatial modeling using high-throughput phenotypes (HTP) via unoccupied aerial vehicle (UAV) imagery. The normalized difference vegetation index was measured by a multispectral MicaSense camera and processed using ImageBreed. Contrasting to baseline agronomic trait spatial correction and a baseline multitrait model, a two-stage approach was proposed. Using longitudinal normalized difference vegetation index data, plot level permanent environment effects estimated spatial patterns in the field throughout the growing season. Normalized difference vegetation index permanent environment were separated from additive genetic effects using 2D spline, separable autoregressive models, or random regression models. The Permanent environment were leveraged within agronomic trait genomic best linear unbiased prediction either modeling an empirical covariance for random effects, or by modeling fixed effects as an average of permanent environment across time or split among three growth phases. Modeling approaches were tested using simulation data and Genomes-to-Fields hybrid maize (Zea mays L.) field experiments in 2015, 2017, 2019, and 2020 for grain yield, grain moisture, and ear height. The two-stage approach improved heritability, model fit, and genotypic effect estimation compared to baseline models. Electrical conductance and elevation from a 2019 soil survey significantly improved model fit, while 2D spline permanent environment were most strongly correlated with the soil parameters. Simulation of field effects demonstrated improved specificity for random regression models. In summary, the use of longitudinal normalized difference vegetation index measurements increased experimental accuracy and understanding of field spatio-temporal heterogeneity.
Collapse
Affiliation(s)
- Nicolas Morales
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Mahlet T Anche
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Nicholas S Kaczmar
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Nicholas Lepak
- United States Department of Agriculture-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Pengzun Ni
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenhe District, Shenyang, Liaoning Province, PR China
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Nicholas Santantonio
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Edward S Buckler
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- United States Department of Agriculture-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Lukas A Mueller
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- Boyce Thompson Institute, Ithaca, NY 14853, USA
| | - Kelly R Robbins
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
8
|
Lakhan A, Hamouda H, Abdulkareem KH, Alyahya S, Mohammed MA. Digital healthcare framework for patients with disabilities based on deep federated learning schemes. Comput Biol Med 2024; 169:107845. [PMID: 38118307 DOI: 10.1016/j.compbiomed.2023.107845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 11/21/2023] [Accepted: 12/11/2023] [Indexed: 12/22/2023]
Abstract
Utilizing digital healthcare services for patients who use wheelchairs is a vital and effective means to enhance their healthcare. Digital healthcare integrates various healthcare facilities, including local laboratories and centralized hospitals, to provide healthcare services for individuals in wheelchairs. In digital healthcare, the Internet of Medical Things (IoMT) allows local wheelchairs to connect with remote digital healthcare services and generate sensors from wheelchairs to monitor and process healthcare. Recently, it has been observed that wheelchair patients, when older than thirty, suffer from high blood pressure, heart disease, body glucose, and others due to less activity because of their disabilities. However, existing wheelchair IoMT applications are straightforward and do not consider the healthcare of wheelchair patients with their diseases during their disabilities. This paper presents a novel digital healthcare framework for patients with disabilities based on deep-federated learning schemes. In the proposed framework, we offer the federated learning deep convolutional neural network schemes (FL-DCNNS) that consist of different sub-schemes. The offloading scheme collects the sensors from integrated wheelchair bio-sensors as smartwatches such as blood pressure, heartbeat, body glucose, and oxygen. The smartwatches worked with wearable devices for disabled patients in our framework. We present the federated learning-enabled laboratories for data training and share the updated weights with the data security to the centralized node for decision and prediction. We present the decision forest for centralized healthcare nodes to decide on aggregation with the different constraints: cost, energy, time, and accuracy. We implemented a deep CNN scheme in each laboratory to train and validate the model locally on the node with the consideration of resources. Simulation results show that FL-DCNNS obtained the optimal results on the sensor data and minimized the energy by 25%, time 19%, cost 28%, and improved the accuracy of disease prediction by 99% as compared to existing digital healthcare schemes for wheelchair patients.
Collapse
Affiliation(s)
- Abdullah Lakhan
- Department of Cybersecurity and Computer Science, Dawood University of Engineering and Technology, Karachi City 74800, Sindh, Pakistan.
| | - Hassen Hamouda
- Department of Business Administration, College of Science and Humanities at Alghat, Majmaah University, Al-Majmaah 11952, Saudi Arabia.
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq; College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq.
| | - Saleh Alyahya
- Department of Electrical Engineering, College of Engineering and Information Technology, Onaizah Colleges, Onaizah 2053, Saudi Arabia.
| | - Mazin Abed Mohammed
- Department of Artificial Intelligence, College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq.
| |
Collapse
|
9
|
Yang W, Feng H, Hu X, Song J, Guo J, Lu B. An Overview of High-Throughput Crop Phenotyping: Platform, Image Analysis, Data Mining, and Data Management. Methods Mol Biol 2024; 2787:3-38. [PMID: 38656479 DOI: 10.1007/978-1-0716-3778-4_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
In this chapter, we explore the application of high-throughput crop phenotyping facilities for phenotype data acquisition and the extraction of significant information from the collected data through image processing and data mining methods. Additionally, the construction and outlook of crop phenotype databases are introduced and the need for global cooperation and data sharing is emphasized. High-throughput crop phenotyping significantly improves accuracy and efficiency compared to traditional measurements, making significant contributions to overcoming bottlenecks in the phenotyping field and advancing crop genetics.
Collapse
Affiliation(s)
- Wanneng Yang
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China.
| | - Hui Feng
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| | - Xiao Hu
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| | - Jingyan Song
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| | - Jing Guo
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| | - Bingjie Lu
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
10
|
Papoutsoglou EA, Athanasiadis IN, Visser RGF, Finkers R. The benefits and struggles of FAIR data: the case of reusing plant phenotyping data. Sci Data 2023; 10:457. [PMID: 37443110 PMCID: PMC10345100 DOI: 10.1038/s41597-023-02364-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/03/2023] [Indexed: 07/15/2023] Open
Abstract
Plant phenotyping experiments are conducted under a variety of experimental parameters and settings for diverse purposes. The data they produce is heterogeneous, complicated, often poorly documented and, as a result, difficult to reuse. Meeting societal needs (nutrition, crop adaptation and stability) requires more efficient methods toward data integration and reuse. In this work, we examine what "making data FAIR" entails, and investigate the benefits and the struggles not only of reusing FAIR data, but also making data FAIR using genotype by environment and QTL by environment interactions for developmental traits in potato as a case study. We assume the role of a scientist discovering a phenotypic dataset on a FAIR data point, verifying the existence of related datasets with environmental data, acquiring both and integrating them. We report and discuss the challenges and the potential for reusability and reproducibility of FAIRifying existing datasets, using metadata standards such as MIAPPE, that were encountered in this process.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
- Taxonic B.V., De Meern, The Netherlands
| | - Ioannis N Athanasiadis
- Wageningen Data Competence Center and Geo-Information Science & Remote Sensing Lab, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard G F Visser
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard Finkers
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands.
- GenNovation B.V., Wageningen, The Netherlands.
| |
Collapse
|
11
|
Karabulut E, Erkoç K, Acı M, Aydın M, Barriball S, Braley J, Cassetta E, Craine EB, Diaz-Garcia L, Hershberger J, Meyering B, Miller AJ, Rubin MJ, Tesdell O, Schlautman B, Şakiroğlu M. Sainfoin ( Onobrychis spp.) crop ontology: supporting germplasm characterization and international research collaborations. FRONTIERS IN PLANT SCIENCE 2023; 14:1177406. [PMID: 37255566 PMCID: PMC10225502 DOI: 10.3389/fpls.2023.1177406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 04/18/2023] [Indexed: 06/01/2023]
Abstract
Sainfoin (Onobrychis spp.) is a perennial forage legume that is also attracting attention as a perennial pulse with potential for human consumption. The dual use of sainfoin underpins diverse research and breeding programs focused on improving sainfoin lines for forage and pulses, which is driving the generation of complex datasets describing high dimensional phenotypes in the post-omics era. To ensure that multiple user groups, for example, breeders selecting for forage and those selecting for edible seed, can utilize these rich datasets, it is necessary to develop common ontologies and accessible ontology platforms. One such platform, Crop Ontology, was created in 2008 by the Consortium of International Agricultural Research Centers (CGIAR) to host crop-specific trait ontologies that support standardized plant breeding databases. In the present study, we describe the sainfoin crop ontology (CO). An in-depth literature review was performed to develop a comprehensive list of traits measured and reported in sainfoin. Because the same traits can be measured in different ways, ultimately, a set of 98 variables (variable = plant trait + method of measurement + scale of measurement) used to describe variation in sainfoin were identified. Variables were formatted and standardized based on guidelines provided here for inclusion in the sainfoin CO. The 98 variables contained a total of 82 traits from four trait classes of which 24 were agronomic, 31 were morphological, 19 were seed and forage quality related, and 8 were phenological. In addition to the developed variables, we have provided a roadmap for developing and submission of new traits to the sainfoin CO.
Collapse
Affiliation(s)
- Ebrar Karabulut
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| | - Kübra Erkoç
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| | - Murat Acı
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
- The Land Institute, Salina, KS, United States
| | - Mahmut Aydın
- Department of Computer Engineering, Kafkas University, Kars, Türkiye
| | | | - Jackson Braley
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | | | | | - Luis Diaz-Garcia
- Department of Viticulture and Enology, University of California Davis, Davis, CA, United States
| | - Jenna Hershberger
- Plant and Environmental Sciences Department, Clemson University, Clemson, SC, United States
| | - Bo Meyering
- The Land Institute, Salina, KS, United States
| | - Allison J. Miller
- Donald Danforth Plant Science Center, St. Louis, MO, United States
- Department. of Biology, Saint Louis University, St. Louis, MO, United States
| | - Matthew J. Rubin
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | - Omar Tesdell
- Department of Geography, Birzeit University, Birzeit, West Bank, Palestine
| | | | - Muhammet Şakiroğlu
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| |
Collapse
|
12
|
Dipta B, Sood S, Devi R, Bhardwaj V, Mangal V, Thakur AK, Kumar V, Pandey N, Rathore A, Singh A. Digitalization of potato breeding program: Improving data collection and management. Heliyon 2023; 9:e12974. [PMID: 36747944 PMCID: PMC9898647 DOI: 10.1016/j.heliyon.2023.e12974] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 01/02/2023] [Accepted: 01/10/2023] [Indexed: 01/22/2023] Open
Abstract
A plant breeding program involves hundreds of experiments, each having number of entries, genealogy information, linked experimental design, lists of treatments, observed traits, and data analysis. The traditional method of arranging breeding program information and data recording and maintenance is not centralized and is always scattered in different file systems which is inconvenient for retrieving breeding information resulting in poor data management and the loss of crucial data. Data administration requires a significant amount of manpower and resources to maintain nurseries, trials, germplasm lines, and pedigree records. Further, data transcription in scattered spreadsheets and files leads to nomenclature and typing mistakes, which affects data analysis and selection decisions in breeding programs. The accurate data recording and management tools could improve the efficiency of breeding programs. Recent interventions in data management using computer-based breeding databases and informatics applications and tools have made the breeder's life easier. Because of its digital nature, the data obtained is improved even further, allowing for the acquisition of images, voice recording and other specific data kinds. Public breeding programs are far behind the industry in the use of data management tools and softwares. In this article, we have compiled the information on available data recording tools and breeding data management softwares with major emphasis on potato breeding data management.
Collapse
Affiliation(s)
- Bhawna Dipta
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Salej Sood
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India,Corresponding author. ;
| | - Rasna Devi
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Vinay Bhardwaj
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Vikas Mangal
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Ajay Kumar Thakur
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Vinod Kumar
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - N.K. Pandey
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Abhishek Rathore
- CGIAR Excellence in Breeding Platform (EiB), International Maize and Wheat Improvement Center (CIMMYT), India
| | - A.K. Singh
- Division of Horticultural Science, KAB-II, Pusa, New Delhi-110012, India
| |
Collapse
|
13
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
14
|
König P, Beier S, Mascher M, Stein N, Lange M, Scholz U. DivBrowse-interactive visualization and exploratory data analysis of variant call matrices. Gigascience 2022; 12:giad025. [PMID: 37083938 PMCID: PMC10120423 DOI: 10.1093/gigascience/giad025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 01/23/2023] [Accepted: 03/23/2023] [Indexed: 04/22/2023] Open
Abstract
BACKGROUND The sequencing of whole genomes is becoming increasingly affordable. In this context, large-scale sequencing projects are generating ever larger datasets of species-specific genomic diversity. As a consequence, more and more genomic data need to be made easily accessible and analyzable to the scientific community. FINDINGS We present DivBrowse, a web application for interactive visualization and exploratory analysis of genomic diversity data stored in Variant Call Format (VCF) files of any size. By seamlessly combining BLAST as an entry point together with interactive data analysis features such as principal component analysis in one graphical user interface, DivBrowse provides a novel and unique set of exploratory data analysis capabilities for genomic biodiversity datasets. The capability to integrate DivBrowse into existing web applications supports interoperability between different web applications. Built-in interactive computation of principal component analysis allows users to perform ad hoc analysis of the population structure based on specific genetic elements such as genes and exons. Data interoperability is supported by the ability to export genomic diversity data in VCF and General Feature Format 3 files. CONCLUSION DivBrowse offers a novel approach for interactive visualization and analysis of genomic diversity data and optionally also gene annotation data by including features like interactive calculation of variant frequencies and principal component analysis. The use of established standard file formats for data input supports interoperability and seamless deployment of application instances based on the data output of established bioinformatics pipelines.
Collapse
Affiliation(s)
- Patrick König
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany
| | - Sebastian Beier
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany
- Institute of Bio- and Geosciences, IBG-4, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Martin Mascher
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Nils Stein
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany
- Center for Integrated Breeding Research, Georg-August University, 37075 Göttingen, Germany
| | - Matthias Lange
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany
| | - Uwe Scholz
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Seeland, Germany
| |
Collapse
|
15
|
Feser M, König P, Fiebig A, Arend D, Lange M, Scholz U. On the way to plant data commons - a genotyping use case. J Integr Bioinform 2022; 19:jib-2022-0033. [PMID: 36065132 PMCID: PMC9800039 DOI: 10.1515/jib-2022-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 08/04/2022] [Accepted: 08/11/2022] [Indexed: 01/09/2023] Open
Abstract
Over the last years it has been observed that the progress in data collection in life science has created increasing demand and opportunities for advanced bioinformatics. This includes data management as well as the individual data analysis and often covers the entire data life cycle. A variety of tools have been developed to store, share, or reuse the data produced in the different domains such as genotyping. Especially imputation, as a subfield of genotyping, requires good Research Data Management (RDM) strategies to enable use and re-use of genotypic data. To aim for sustainable software, it is necessary to develop tools and surrounding ecosystems, which are reusable and maintainable. Reusability in the context of streamlined tools can e.g. be achieved by standardizing the input and output of the different tools and adapting to open and broadly used file formats. By using such established file formats, the tools can also be connected with others, improving the overall interoperability of the software. Finally, it is important to build strong communities that maintain the tools by developing and contributing new features and maintenance updates. In this article, concepts for this will be presented for an imputation service.
Collapse
Affiliation(s)
- Manuel Feser
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Anne Fiebig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466Seeland, Germany
| |
Collapse
|
16
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
17
|
Kotni P, van Hintum T, Maggioni L, Oppermann M, Weise S. EURISCO update 2023: the European Search Catalogue for Plant Genetic Resources, a pillar for documentation of genebank material. Nucleic Acids Res 2022; 51:D1465-D1469. [PMID: 36189883 PMCID: PMC9825528 DOI: 10.1093/nar/gkac852] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/13/2022] [Accepted: 09/23/2022] [Indexed: 01/30/2023] Open
Abstract
The European Search Catalogue for Plant Genetic Resources (EURISCO) is a central entry point for information on crop plant germplasm accessions from institutions in Europe and beyond. In total, it provides data on more than two million accessions, making an important contribution to unlocking the vast genetic diversity that lies deposited in >400 germplasm collections in 43 countries. EURISCO serves as the reference system for the Plant Genetic Resources Strategy for Europe and represents a significant approach for documenting and making available the world's agrobiological diversity. EURISCO is well established as a resource in this field and forms the basis for a wide range of research projects. In this paper, we present current developments of EURISCO, which is accessible at http://eurisco.ecpgr.org.
Collapse
Affiliation(s)
- Pragna Kotni
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany
| | - Theo van Hintum
- Centre for Genetic Resources, The Netherlands (CGN), Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Lorenzo Maggioni
- European Cooperative Programme for Plant Genetic Resources (ECPGR), c/o Alliance of Bioversity International and CIAT, Via di San Domenico 1, 00153 Rome, Italy
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466 Seeland, Germany
| | - Stephan Weise
- To whom correspondence should be addressed. Tel: +49 39482 5 744; Fax: +49 39482 5 155;
| |
Collapse
|
18
|
Droc G, Martin G, Guignon V, Summo M, Sempéré G, Durant E, Soriano A, Baurens FC, Cenci A, Breton C, Shah T, Aury JM, Ge XJ, Harrison PH, Yahiaoui N, D’Hont A, Rouard M. The banana genome hub: a community database for genomics in the Musaceae. HORTICULTURE RESEARCH 2022; 9:uhac221. [PMID: 36479579 PMCID: PMC9720444 DOI: 10.1093/hr/uhac221] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 09/22/2022] [Indexed: 06/17/2023]
Abstract
The Banana Genome Hub provides centralized access for genome assemblies, annotations, and the extensive related omics resources available for bananas and banana relatives. A series of tools and unique interfaces are implemented to harness the potential of genomics in bananas, leveraging the power of comparative analysis, while recognizing the differences between datasets. Besides effective genomic tools like BLAST and the JBrowse genome browser, additional interfaces enable advanced gene search and gene family analyses including multiple alignments and phylogenies. A synteny viewer enables the comparison of genome structures between chromosome-scale assemblies. Interfaces for differential expression analyses, metabolic pathways and GO enrichment were also added. A catalogue of variants spanning the banana diversity is made available for exploration, filtering, and export to a wide variety of software. Furthermore, we implemented new ways to graphically explore gene presence-absence in pangenomes as well as genome ancestry mosaics for cultivated bananas. Besides, to guide the community in future sequencing efforts, we provide recommendations for nomenclature of locus tags and a curated list of public genomic resources (assemblies, resequencing, high density genotyping) and upcoming resources-planned, ongoing or not yet public. The Banana Genome Hub aims at supporting the banana scientific community for basic, translational, and applied research and can be accessed at https://banana-genome-hub.southgreen.fr.
Collapse
Affiliation(s)
| | - Guillaume Martin
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Valentin Guignon
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | - Marilyne Summo
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Guilhem Sempéré
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- CIRAD, UMR INTERTRYP, F-34398 Montpellier, France
- INTERTRYP, Université de Montpellier, CIRAD, IRD, 34398 Montpellier, France
| | - Eloi Durant
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Syngenta Seeds SAS, Saint-Sauveur, 31790, France
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France
| | - Alexandre Soriano
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Franc-Christophe Baurens
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Alberto Cenci
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | - Catherine Breton
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | | | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Xue-Jun Ge
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510520, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou 510520, China
| | - Pat Heslop Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510520, China
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
| | - Nabila Yahiaoui
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Angélique D’Hont
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | | |
Collapse
|
19
|
Rolling WR, Senalik D, Iorizzo M, Ellison S, Van Deynze A, Simon PW. CarrotOmics: a genetics and comparative genomics database for carrot ( Daucus carota). Database (Oxford) 2022; 2022:6693759. [PMID: 36069936 PMCID: PMC9450951 DOI: 10.1093/database/baac079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 08/19/2022] [Accepted: 09/01/2022] [Indexed: 11/15/2022]
Abstract
Abstract
CarrotOmics (https://carrotomics.org/) is a comprehensive database for carrot (Daucus carota L.) breeding and research. CarrotOmics was developed using resources available at the MainLab Bioinformatics core (https://www.bioinfo.wsu.edu/) and is implemented using Tripal with Drupal modules. The database delivers access to download or visualize the carrot reference genome with gene predictions, gene annotations and sequence assembly. Other genomic resources include information for 11 224 genetic markers from 73 linkage maps or genotyping-by-sequencing and descriptions of 371 mapped loci. There are records for 1601 Apiales species (or subspecies) and descriptions of 9408 accessions from 11 germplasm collections representing more than 600 of these species. Additionally, 204 Apiales species have phenotypic information, totaling 28 517 observations from 10 041 biological samples. Resources on CarrotOmics are freely available, search functions are provided to find data of interest and video tutorials are available to describe the search functions and genomic tools. CarrotOmics is a timely resource for the Apiaceae research community and for carrot geneticists developing improved cultivars with novel traits addressing challenges including an expanding acreage in tropical climates, an evolving consumer interested in sustainably grown vegetables and a dynamic environment due to climate change. Data from CarrotOmics can be applied in genomic-assisted selection and genetic research to improve basic research and carrot breeding efficiency.
Database URL
https://carrotomics.org/
Collapse
Affiliation(s)
- William R Rolling
- Vegetable Crop Research Unit, USDA-ARS , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
- Department of Horticulture, University of Wisconsin-Madison , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
| | - Douglas Senalik
- Vegetable Crop Research Unit, USDA-ARS , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
- Department of Horticulture, University of Wisconsin-Madison , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
| | - Massimo Iorizzo
- Department of Horticultural Science and Plants for Human Health Institute, North Carolina State University , NC Research Campus, 600 Laureate Way, Kannapolis, NC 28081, USA
| | - Shelby Ellison
- Department of Horticulture, University of Wisconsin-Madison , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
| | - Allen Van Deynze
- College of Agricultural & Environmental Sciences, Seed Biotechnology Center, University of California-Davis , 150 Mrak Hall, One Shields Avenue, Davis, CA 95616, USA
| | - Philipp W Simon
- Vegetable Crop Research Unit, USDA-ARS , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
- Department of Horticulture, University of Wisconsin-Madison , Moore Hall, 1575 Linden Drive, Madison, WI 53706-1514, USA
| |
Collapse
|
20
|
Senger E, Osorio S, Olbricht K, Shaw P, Denoyes B, Davik J, Predieri S, Karhu S, Raubach S, Lippi N, Höfer M, Cockerton H, Pradal C, Kafkas E, Litthauer S, Amaya I, Usadel B, Mezzetti B. Towards smart and sustainable development of modern berry cultivars in Europe. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1238-1251. [PMID: 35751152 DOI: 10.1111/tpj.15876] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 06/15/2023]
Abstract
Fresh berries are a popular and important component of the human diet. The demand for high-quality berries and sustainable production methods is increasing globally, challenging breeders to develop modern berry cultivars that fulfill all desired characteristics. Since 1994, research projects have characterized genetic resources, developed modern tools for high-throughput screening, and published data in publicly available repositories. However, the key findings of different disciplines are rarely linked together, and only a limited range of traits and genotypes has been investigated. The Horizon2020 project BreedingValue will address these challenges by studying a broader panel of strawberry, raspberry and blueberry genotypes in detail, in order to recover the lost genetic diversity that has limited the aroma and flavor intensity of recent cultivars. We will combine metabolic analysis with sensory panel tests and surveys to identify the key components of taste, flavor and aroma in berries across Europe, leading to a high-resolution map of quality requirements for future berry cultivars. Traits linked to berry yields and the effect of environmental stress will be investigated using modern image analysis methods and modeling. We will also use genetic analysis to determine the genetic basis of complex traits for the development and optimization of modern breeding technologies, such as molecular marker arrays, genomic selection and genome-wide association studies. Finally, the results, raw data and metadata will be made publicly available on the open platform Germinate in order to meet FAIR data principles and provide the basis for sustainable research in the future.
Collapse
Affiliation(s)
- Elisa Senger
- Institute of Bio- and Geosciences, IBG-4 Bioinformatics, BioSC, CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Sonia Osorio
- Departamento de Biología Molecular y Bioquímica, Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga-Consejo Superior de Investigaciones Científicas, Campus de Teatinos, Málaga, Spain
| | | | - Paul Shaw
- Department of Information and Computational Sciences, The James Hutton Institute, Invergowrie, Scotland, UK
| | - Béatrice Denoyes
- Université de Bordeaux, UMR BFP, INRAE, Villenave d'Ornon, France
| | - Jahn Davik
- Department of Molecular Plant Biology, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Stefano Predieri
- Bio-Agrofood Department, Institute for Bioeconomy, IBE-CNR, Italian National Research Council, Bologna, Italy
| | - Saila Karhu
- Natural Resources Institute Finland (Luke), Turku, Finland
| | - Sebastian Raubach
- Department of Information and Computational Sciences, The James Hutton Institute, Invergowrie, Scotland, UK
| | - Nico Lippi
- Bio-Agrofood Department, Institute for Bioeconomy, IBE-CNR, Italian National Research Council, Bologna, Italy
| | - Monika Höfer
- Institute of Breeding Research on Fruit Crops, Federal Research Centre for Cultivated Plants (JKI), Dresden, Germany
| | - Helen Cockerton
- Genetics, Genomics and Breeding Department, NIAB, East Malling, UK
| | - Christophe Pradal
- CIRAD and UMR AGAP Institute, Montpellier, France
- INRIA and LIRMM, University Montpellier, CNRS, Montpellier, France
| | - Ebru Kafkas
- Department of Horticulture, Faculty of Agriculture, Çukurova University, Balcalı, Adana, Turkey
| | | | - Iraida Amaya
- Unidad Asociada deI + D + i IFAPA-CSIC Biotecnología y Mejora en Fresa, Málaga, Spain
- Laboratorio de Genómica y Biotecnología, Centro IFAPA de Málaga, Instituto Andaluz de Investigación y Formación Agraria y Pesquera, Málaga, Spain
| | - Björn Usadel
- Institute of Bio- and Geosciences, IBG-4 Bioinformatics, BioSC, CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany
| | - Bruno Mezzetti
- Department of Agricultural, Food and Environmental Sciences, Università Politecnica delle Marche, Ancona, Italy
| |
Collapse
|
21
|
Morales N, Ogbonna AC, Ellerbrock BJ, Bauchet GJ, Tantikanjana T, Tecle IY, Powell AF, Lyon D, Menda N, Simoes CC, Saha S, Hosmani P, Flores M, Panitz N, Preble RS, Agbona A, Rabbi I, Kulakow P, Peteti P, Kawuki R, Esuma W, Kanaabi M, Chelangat DM, Uba E, Olojede A, Onyeka J, Shah T, Karanja M, Egesi C, Tufan H, Paterne A, Asfaw A, Jannink JL, Wolfe M, Birkett CL, Waring DJ, Hershberger JM, Gore MA, Robbins KR, Rife T, Courtney C, Poland J, Arnaud E, Laporte MA, Kulembeka H, Salum K, Mrema E, Brown A, Bayo S, Uwimana B, Akech V, Yencho C, de Boeck B, Campos H, Swennen R, Edwards JD, Mueller LA. Breedbase: a digital ecosystem for modern plant breeding. G3 GENES|GENOMES|GENETICS 2022; 12:6564228. [PMID: 35385099 PMCID: PMC9258556 DOI: 10.1093/g3journal/jkac078] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 02/14/2022] [Indexed: 01/17/2023]
Abstract
Modern breeding methods integrate next-generation sequencing and phenomics to identify plants with the best characteristics and greatest genetic merit for use as parents in subsequent breeding cycles to ultimately create improved cultivars able to sustain high adoption rates by farmers. This data-driven approach hinges on strong foundations in data management, quality control, and analytics. Of crucial importance is a central database able to (1) track breeding materials, (2) store experimental evaluations, (3) record phenotypic measurements using consistent ontologies, (4) store genotypic information, and (5) implement algorithms for analysis, prediction, and selection decisions. Because of the complexity of the breeding process, breeding databases also tend to be complex, difficult, and expensive to implement and maintain. Here, we present a breeding database system, Breedbase (https://breedbase.org/, last accessed 4/18/2022). Originally initiated as Cassavabase (https://cassavabase.org/, last accessed 4/18/2022) with the NextGen Cassava project (https://www.nextgencassava.org/, last accessed 4/18/2022), and later developed into a crop-agnostic system, it is presently used by dozens of different crops and projects. The system is web based and is available as open source software. It is available on GitHub (https://github.com/solgenomics/, last accessed 4/18/2022) and packaged in a Docker image for deployment (https://hub.docker.com/u/breedbase, last accessed 4/18/2022). The Breedbase system enables breeding programs to better manage and leverage their data for decision making within a fully integrated digital ecosystem.
Collapse
Affiliation(s)
- Nicolas Morales
- Boyce Thompson Institute , Ithaca, NY 14853, USA
- Cornell University , Ithaca, NY 14853, USA
| | - Alex C Ogbonna
- Boyce Thompson Institute , Ithaca, NY 14853, USA
- Cornell University , Ithaca, NY 14853, USA
| | | | | | | | | | | | - David Lyon
- Boyce Thompson Institute , Ithaca, NY 14853, USA
| | - Naama Menda
- Boyce Thompson Institute , Ithaca, NY 14853, USA
| | | | - Surya Saha
- Boyce Thompson Institute , Ithaca, NY 14853, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | - Ezenwanyi Uba
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | - Adeyemi Olojede
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | - Joseph Onyeka
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | | | | | - Chiedozie Egesi
- Boyce Thompson Institute , Ithaca, NY 14853, USA
- IITA Ibadan , 200001 Ibadan, Nigeria
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | - Hale Tufan
- Cornell University , Ithaca, NY 14853, USA
| | | | | | - Jean-Luc Jannink
- Cornell University , Ithaca, NY 14853, USA
- USDA-ARS , Ithaca, NY 14853, USA
| | | | - Clay L Birkett
- Cornell University , Ithaca, NY 14853, USA
- USDA-ARS , Ithaca, NY 14853, USA
| | - David J Waring
- Cornell University , Ithaca, NY 14853, USA
- USDA-ARS , Ithaca, NY 14853, USA
| | | | | | | | - Trevor Rife
- Kansas State University , Manhattan, KS 66506, USA
| | | | - Jesse Poland
- Kansas State University , Manhattan, KS 66506, USA
| | | | | | | | | | | | | | | | | | | | - Craig Yencho
- North Carolina State University (NCSU) , Raleigh, NC 27695, USA
| | | | | | | | | | | |
Collapse
|
22
|
Bradbury PJ, Casstevens T, Jensen SE, Johnson LC, Miller ZR, Monier B, Romay MC, Song B, Buckler ES. The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation. Bioinformatics 2022; 38:3698-3702. [PMID: 35748708 PMCID: PMC9344836 DOI: 10.1093/bioinformatics/btac410] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 02/28/2022] [Accepted: 06/22/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Pangenomes provide novel insights for population and quantitative genetics, genomics, and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data. RESULTS The Practical Haplotype Graph (PHG) is a pangenome pipeline, database (PostGRES & SQLite), data model (Java, Kotlin, or R), and Breeding API (BrAPI) web service. The PHG has already been able to accurately represent diversity in four major crops including maize, one of the most genomically diverse species, with up to 1000-fold data compression. Using simulated data, we show that, at even 0.1X coverage, with appropriate reads and sequence alignment, imputation results in extremely accurate haplotype reconstruction. The PHG is a platform and environment for the understanding and application of genomic diversity. AVAILABILITY All resources listed here are freely available. The PHG Docker used to generate the simulation results is https://hub.docker.com/ as maizegenetics/phg:0.0.27. PHG source code is at https://bitbucket.org/bucklerlab/practicalhaplotypegraph/src/master/. The code used for the analysis of simulated data is at https://bitbucket.org/bucklerlab/phg-manuscript/src/master/. The PHG database of NAM parent haplotypes is in the CyVerse data store (https://de.cyverse.org/de/) and named /iplant/home/shared/panzea/panGenome/PHG_db_maize/phg_v5Assemblies_20200608.db. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P J Bradbury
- United States Department of Agriculture-Agricultural Research Service, Robert W. Holley Center, Ithaca, NY 14853 USA
| | - T Casstevens
- Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA
| | - S E Jensen
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - L C Johnson
- Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA
| | - Z R Miller
- Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA
| | - B Monier
- Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA
| | - M C Romay
- Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA
| | - B Song
- Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA
| | - E S Buckler
- United States Department of Agriculture-Agricultural Research Service, Robert W. Holley Center, Ithaca, NY 14853 USA.,Institute for Genomic Diversity,Cornell University, Ithaca, NY 14853 USA.,Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
23
|
Raubach S, Schreiber M, Shaw PD. GridScore: a tool for accurate, cross-platform phenotypic data collection and visualization. BMC Bioinformatics 2022; 23:214. [PMID: 35668357 PMCID: PMC9169276 DOI: 10.1186/s12859-022-04755-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/30/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Plant breeding and crop research rely on experimental phenotyping trials. These trials generate data for large numbers of traits and plant varieties that needs to be captured efficiently and accurately to support further research and downstream analysis. Traditionally scored by hand, phenotypic data is nowadays collected using spreadsheets or specialized apps. While many solutions exist, which increase efficiency and reduce errors, none offer the same familiarity as printed field plans which have been used for decades and offer an intuitive overview over the trial setup, previously recorded data and plots still requiring scoring. RESULTS We introduce GridScore which utilizes cutting-edge web technologies to reproduce the familiarity of printed field plans while enhancing the phenotypic data collection process by adding advanced features like georeferencing, image tagging and speech recognition. GridScore is a cross-platform open-source plant phenotyping app that combines barcode-based systems with a guided data collection approach while offering a top-down view onto the data collected in a field layout. GridScore is compared to existing tools across a wide spectrum of criteria including support for barcodes, multiple platforms, and visualizations. CONCLUSION Compared to its competition, GridScore shows strong performance across the board offering a complete manual phenotyping experience.
Collapse
Affiliation(s)
- Sebastian Raubach
- Department of Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland.
| | - Miriam Schreiber
- Department of Life Science, University of Dundee, Dundee, Scotland
| | - Paul D Shaw
- Department of Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland
| |
Collapse
|
24
|
Beier S, Fiebig A, Pommier C, Liyanage I, Lange M, Kersey PJ, Weise S, Finkers R, Koylass B, Cezard T, Courtot M, Contreras-Moreira B, Naamati G, Dyer S, Scholz U. Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR. F1000Res 2022; 11. [PMID: 35811804 PMCID: PMC9218589 DOI: 10.12688/f1000research.109080.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
Collapse
Affiliation(s)
- Sebastian Beier
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
- Institute of Bio- and Geosciences, Bioinformatics (IBG-4), Forschungszentrum Jülich GmbH, Jülich, 52425, Germany
| | - Anne Fiebig
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Cyril Pommier
- BioinfOmics, Plant bioinformatics facility, Université Paris-Saclay, INRAE, Versailles, France
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matthias Lange
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | | | - Stephan Weise
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
- Gennovation B.V., Wageningen, The Netherlands
| | - Baron Koylass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Timothee Cezard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Bruno Contreras-Moreira
- Laboratorio de Biología Computacional y Estructural, Estación Experimental Aula Dei-CSIC, Zaragoza, 50059, Spain
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Uwe Scholz
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| |
Collapse
|
25
|
Danilevicz MF, Gill M, Anderson R, Batley J, Bennamoun M, Bayer PE, Edwards D. Plant Genotype to Phenotype Prediction Using Machine Learning. Front Genet 2022; 13:822173. [PMID: 35664329 PMCID: PMC9159391 DOI: 10.3389/fgene.2022.822173] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/07/2022] [Indexed: 12/13/2022] Open
Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
- *Correspondence: David Edwards,
| |
Collapse
|
26
|
Liyanage I, Burdett T, Droesbeke B, Erdos K, Fernandez R, Gray A, Haseeb M, Jupp S, Penim F, Pommier C, Rocca-Serra P, Courtot M, Coppens F. ELIXIR biovalidator for semantic validation of life science metadata. Bioinformatics 2022; 38:3141-3142. [PMID: 35380605 PMCID: PMC9154242 DOI: 10.1093/bioinformatics/btac195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/25/2022] [Accepted: 04/01/2022] [Indexed: 01/14/2023] Open
Abstract
SUMMARY To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents. AVAILABILITY AND IMPLEMENTATION Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| | - Karoly Erdos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Rolando Fernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh EH14 4AS, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Flavia Penim
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Cyril Pommier
- INRAE, BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, 78026 Versailles, France,INRAE, URGI, Université Paris-Saclay, 78026 Versailles, France
| | - Philippe Rocca-Serra
- Department of Engineering Science, University of Oxford e-Research Centre, University of Oxford, Oxford OX1 3QG, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK,Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada,To whom correspondence should be addressed.
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| |
Collapse
|
27
|
Ninomiya S. High-throughput field crop phenotyping: current status and challenges. BREEDING SCIENCE 2022; 72:3-18. [PMID: 36045897 PMCID: PMC8987842 DOI: 10.1270/jsbbs.21069] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 12/16/2021] [Indexed: 05/03/2023]
Abstract
In contrast to the rapid advances made in plant genotyping, plant phenotyping is considered a bottleneck in plant science. This has promoted high-throughput plant phenotyping (HTP) studies, resulting in an exponential increase in phenotyping-related publications. The development of HTP was originally intended for use as indoor HTP technologies for model plant species under controlled environments. However, this subsequently shifted to HTP for use in crops in fields. Although HTP in fields is much more difficult to conduct due to unstable environmental conditions compared to HTP in controlled environments, recent advances in HTP technology have allowed these difficulties to be overcome, allowing for rapid, efficient, non-destructive, non-invasive, quantitative, repeatable, and objective phenotyping. Recent HTP developments have been accelerated by the advances in data analysis, sensors, and robot technologies, including machine learning, image analysis, three dimensional (3D) reconstruction, image sensors, laser sensors, environmental sensors, and drones, along with high-speed computational resources. This article provides an overview of recent HTP technologies, focusing mainly on canopy-based phenotypes of major crops, such as canopy height, canopy coverage, canopy biomass, and canopy stressed appearance, in addition to crop organ detection and counting in the fields. Current topics in field HTP are also presented, followed by a discussion on the low rates of adoption of HTP in practical breeding programs.
Collapse
Affiliation(s)
- Seishi Ninomiya
- Graduate School of Agriculture and Life Sciences, The University of Tokyo, Nishitokyo, Tokyo 188-0002, Japan
- Plant Phenomics Research Center, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
28
|
Sun D, Robbins K, Morales N, Shu Q, Cen H. Advances in optical phenotyping of cereal crops. TRENDS IN PLANT SCIENCE 2022; 27:191-208. [PMID: 34417079 DOI: 10.1016/j.tplants.2021.07.015] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 07/22/2021] [Accepted: 07/24/2021] [Indexed: 06/13/2023]
Abstract
Optical sensors and sensing-based phenotyping techniques have become mainstream approaches in high-throughput phenotyping for improving trait selection and genetic gains in crops. We review recent progress and contemporary applications of optical sensing-based phenotyping (OSP) techniques in cereal crops and highlight optical sensing principles for spectral response and sensor specifications. Further, we group phenotypic traits determined by OSP into four categories - morphological, biochemical, physiological, and performance traits - and illustrate appropriate sensors for each extraction. In addition to the current status, we discuss the challenges of OSP and provide possible solutions. We propose that optical sensing-based traits need to be explored further, and that standardization of the language of phenotyping and worldwide collaboration between phenotyping researchers and other fields need to be established.
Collapse
Affiliation(s)
- Dawei Sun
- College of Biosystems Engineering and Food Science, and State Key Laboratory of Modern Optical Instrumentation, Zhejiang University, Hangzhou 310058, PR China; Key Laboratory of Spectroscopy Sensing, Ministry of Agriculture and Rural Affairs, Hangzhou 310058, PR China
| | - Kelly Robbins
- Section of Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Nicolas Morales
- Section of Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Qingyao Shu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Zhejiang University, Hangzhou, PR China; State Key Laboratory of Rice Biology, Zhejiang University, Hangzhou 310058, PR China
| | - Haiyan Cen
- College of Biosystems Engineering and Food Science, and State Key Laboratory of Modern Optical Instrumentation, Zhejiang University, Hangzhou 310058, PR China; Key Laboratory of Spectroscopy Sensing, Ministry of Agriculture and Rural Affairs, Hangzhou 310058, PR China.
| |
Collapse
|
29
|
Paux E, Lafarge S, Balfourier F, Derory J, Charmet G, Alaux M, Perchet G, Bondoux M, Baret F, Barillot R, Ravel C, Sourdille P, Le Gouis J, on behalf of the BREEDWHEAT Consortium. Breeding for Economically and Environmentally Sustainable Wheat Varieties: An Integrated Approach from Genomics to Selection. BIOLOGY 2022; 11:149. [PMID: 35053148 PMCID: PMC8773325 DOI: 10.3390/biology11010149] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022]
Abstract
There is currently a strong societal demand for sustainability, quality, and safety in bread wheat production. To address these challenges, new and innovative knowledge, resources, tools, and methods to facilitate breeding are needed. This starts with the development of high throughput genomic tools including single nucleotide polymorphism (SNP) arrays, high density molecular marker maps, and full genome sequences. Such powerful tools are essential to perform genome-wide association studies (GWAS), to implement genomic and phenomic selection, and to characterize the worldwide diversity. This is also useful to breeders to broaden the genetic basis of elite varieties through the introduction of novel sources of genetic diversity. Improvement in varieties particularly relies on the detection of genomic regions involved in agronomical traits including tolerance to biotic (diseases and pests) and abiotic (drought, nutrient deficiency, high temperature) stresses. When enough resolution is achieved, this can result in the identification of candidate genes that could further be characterized to identify relevant alleles. Breeding must also now be approached through in silico modeling to simulate plant development, investigate genotype × environment interactions, and introduce marker-trait linkage information in the models to better implement genomic selection. Breeders must be aware of new developments and the information must be made available to the world wheat community to develop new high-yielding varieties that can meet the challenge of higher wheat production in a sustainable and fluctuating agricultural context. In this review, we compiled all knowledge and tools produced during the BREEDWHEAT project to show how they may contribute to face this challenge in the coming years.
Collapse
Affiliation(s)
- Etienne Paux
- UMR GDEC Genetics, Diversity & Ecophysiology of Cereals, INRAE—Université Clermont-Auvergne, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France; (E.P.); (F.B.); (G.C.); (C.R.); (P.S.)
| | - Stéphane Lafarge
- Limagrain, Chappes Research Center, Route d’Ennezat, 63720 Chappes, France; (S.L.); (J.D.)
| | - François Balfourier
- UMR GDEC Genetics, Diversity & Ecophysiology of Cereals, INRAE—Université Clermont-Auvergne, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France; (E.P.); (F.B.); (G.C.); (C.R.); (P.S.)
| | - Jérémy Derory
- Limagrain, Chappes Research Center, Route d’Ennezat, 63720 Chappes, France; (S.L.); (J.D.)
| | - Gilles Charmet
- UMR GDEC Genetics, Diversity & Ecophysiology of Cereals, INRAE—Université Clermont-Auvergne, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France; (E.P.); (F.B.); (G.C.); (C.R.); (P.S.)
| | - Michael Alaux
- Université Paris-Saclay—INRAE, URGI, 78026 Versailles, France;
- Université Paris-Saclay—INRAE, BioinfOmics, Plant Bioinformatics Facility, 78026 Versailles, France
| | - Geoffrey Perchet
- Vegepolys Valley, Maison du Végétal, 26 Rue Jean Dixmeras, 49066 Angers, France;
| | - Marion Bondoux
- INRAE—Transfert, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France;
| | - Frédéric Baret
- UMR EMMAH, INRAE—Université d’Avignon et des Pays de Vaucluse, 84914 Avignon, France;
| | | | - Catherine Ravel
- UMR GDEC Genetics, Diversity & Ecophysiology of Cereals, INRAE—Université Clermont-Auvergne, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France; (E.P.); (F.B.); (G.C.); (C.R.); (P.S.)
| | - Pierre Sourdille
- UMR GDEC Genetics, Diversity & Ecophysiology of Cereals, INRAE—Université Clermont-Auvergne, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France; (E.P.); (F.B.); (G.C.); (C.R.); (P.S.)
| | - Jacques Le Gouis
- UMR GDEC Genetics, Diversity & Ecophysiology of Cereals, INRAE—Université Clermont-Auvergne, 5, Chemin de Beaulieu, 63000 Clermont-Ferrand, France; (E.P.); (F.B.); (G.C.); (C.R.); (P.S.)
| | | |
Collapse
|
30
|
Noor A. Improving bioinformatics software quality through incorporation of software engineering practices. PeerJ Comput Sci 2022; 8:e839. [PMID: 35111923 PMCID: PMC8771759 DOI: 10.7717/peerj-cs.839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND Bioinformatics software is developed for collecting, analyzing, integrating, and interpreting life science datasets that are often enormous. Bioinformatics engineers often lack the software engineering skills necessary for developing robust, maintainable, reusable software. This study presents review and discussion of the findings and efforts made to improve the quality of bioinformatics software. METHODOLOGY A systematic review was conducted of related literature that identifies core software engineering concepts for improving bioinformatics software development: requirements gathering, documentation, testing, and integration. The findings are presented with the aim of illuminating trends within the research that could lead to viable solutions to the struggles faced by bioinformatics engineers when developing scientific software. RESULTS The findings suggest that bioinformatics engineers could significantly benefit from the incorporation of software engineering principles into their development efforts. This leads to suggestion of both cultural changes within bioinformatics research communities as well as adoption of software engineering disciplines into the formal education of bioinformatics engineers. Open management of scientific bioinformatics development projects can result in improved software quality through collaboration amongst both bioinformatics engineers and software engineers. CONCLUSIONS While strides have been made both in identification and solution of issues of particular import to bioinformatics software development, there is still room for improvement in terms of shifts in both the formal education of bioinformatics engineers as well as the culture and approaches of managing scientific bioinformatics research and development efforts.
Collapse
|
31
|
Sempéré G, Larmande P, Rouard M. Managing High-Density Genotyping Data with Gigwa. Methods Mol Biol 2022; 2443:415-427. [PMID: 35037218 DOI: 10.1007/978-1-0716-2067-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Next generation sequencing technologies enabled high-density genotyping for large numbers of samples. Nowadays SNP calling pipelines produce up to millions of such markers, but which need to be filtered in various ways according to the type of analyses. One of the main challenges still lies in the management of an increasing volume of genotyping files that are difficult to handle for many applications. Here, we provide a practical guide for efficiently managing large genomic variation data using Gigwa, a user-friendly, scalable and versatile application that may be deployed either remotely on web servers or on a local machine.
Collapse
Affiliation(s)
- Guilhem Sempéré
- CIRAD, UMR INTERTRYP, Montpellier, France
- INTERTRYP, Univ Montpellier, CIRAD, IRD, Montpellier, France
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
| | - Pierre Larmande
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France.
- DIADE, Univ Montpellier, IRD, Montpellier, France.
| | - Mathieu Rouard
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, Montpellier, France
| |
Collapse
|
32
|
Larmande P, Tagny Ngompe G, Venkatesan A, Ruiz M. AgroLD: A Knowledge Graph Database for Plant Functional Genomics. Methods Mol Biol 2022; 2443:527-540. [PMID: 35037225 DOI: 10.1007/978-1-0716-2067-0_28] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent advances in high-throughput technologies have resulted in tremendous increase in the amount of data in the agronomic domain. There is an urgent need to effectively integrate complementary information to understand the biological system in its entirety. We have developed AgroLD, a knowledge graph that exploits the Semantic Web technology and some of the relevant standard domain ontologies, to integrate information on plant species and in this way facilitating the formulation of new scientific hypotheses. This chapter outlines some integration results of the project, which initially focused on genomics, proteomics and phenomics.
Collapse
Affiliation(s)
- Pierre Larmande
- DIADE, IRD, CIRAD, Univ. Montpellier, Montpellier, France.
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France.
| | - Gildas Tagny Ngompe
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- AGAP, CIRAD, INRAE, Univ. Montpellier, av Agropolis, Montpellier, France
| | | | - Manuel Ruiz
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- AGAP, CIRAD, INRAE, Univ. Montpellier, av Agropolis, Montpellier, France
| |
Collapse
|
33
|
Langstroff A, Heuermann MC, Stahl A, Junker A. Opportunities and limits of controlled-environment plant phenotyping for climate response traits. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:1-16. [PMID: 34302493 PMCID: PMC8741719 DOI: 10.1007/s00122-021-03892-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 06/17/2021] [Indexed: 05/19/2023]
Abstract
Rising temperatures and changing precipitation patterns will affect agricultural production substantially, exposing crops to extended and more intense periods of stress. Therefore, breeding of varieties adapted to the constantly changing conditions is pivotal to enable a quantitatively and qualitatively adequate crop production despite the negative effects of climate change. As it is not yet possible to select for adaptation to future climate scenarios in the field, simulations of future conditions in controlled-environment (CE) phenotyping facilities contribute to the understanding of the plant response to special stress conditions and help breeders to select ideal genotypes which cope with future conditions. CE phenotyping facilities enable the collection of traits that are not easy to measure under field conditions and the assessment of a plant's phenotype under repeatable, clearly defined environmental conditions using automated, non-invasive, high-throughput methods. However, extrapolation and translation of results obtained under controlled environments to field environments is ambiguous. This review outlines the opportunities and challenges of phenotyping approaches under controlled environments complementary to conventional field trials. It gives an overview on general principles and introduces existing phenotyping facilities that take up the challenge of obtaining reliable and robust phenotypic data on climate response traits to support breeding of climate-adapted crops.
Collapse
Affiliation(s)
- Anna Langstroff
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich Buff-Ring 26, 35392, Giessen, Germany
| | - Marc C Heuermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, OT Gatersleben, 06466, Seeland, Germany
| | - Andreas Stahl
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University Giessen, Heinrich Buff-Ring 26, 35392, Giessen, Germany
- Institute for Resistance Research and Stress Tolerance, Federal Research Centre for Cultivated Plants, Julius Kühn-Institut (JKI), Erwin-Baur-Strasse 27, 06484, Quedlinburg, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, OT Gatersleben, 06466, Seeland, Germany.
| |
Collapse
|
34
|
Danilevicz MF, Bayer PE, Nestor BJ, Bennamoun M, Edwards D. Resources for image-based high-throughput phenotyping in crops and data sharing challenges. PLANT PHYSIOLOGY 2021; 187:699-715. [PMID: 34608963 PMCID: PMC8561249 DOI: 10.1093/plphys/kiab301] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 05/26/2021] [Indexed: 05/06/2023]
Abstract
High-throughput phenotyping (HTP) platforms are capable of monitoring the phenotypic variation of plants through multiple types of sensors, such as red green and blue (RGB) cameras, hyperspectral sensors, and computed tomography, which can be associated with environmental and genotypic data. Because of the wide range of information provided, HTP datasets represent a valuable asset to characterize crop phenotypes. As HTP becomes widely employed with more tools and data being released, it is important that researchers are aware of these resources and how they can be applied to accelerate crop improvement. Researchers may exploit these datasets either for phenotype comparison or employ them as a benchmark to assess tool performance and to support the development of tools that are better at generalizing between different crops and environments. In this review, we describe the use of image-based HTP for yield prediction, root phenotyping, development of climate-resilient crops, detecting pathogen and pest infestation, and quantitative trait measurement. We emphasize the need for researchers to share phenotypic data, and offer a comprehensive list of available datasets to assist crop breeders and tool developers to leverage these resources in order to accelerate crop breeding.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Benjamin J. Nestor
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Mohammed Bennamoun
- Department of Computer Science and Software Engineering, University of Western Australia, Perth, Western Australia 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
- Author for communication:
| |
Collapse
|
35
|
Johnson D, Batista D, Cochrane K, Davey RP, Etuk A, Gonzalez-Beltran A, Haug K, Izzo M, Larralde M, Lawson TN, Minotto A, Moreno P, Nainala VC, O'Donovan C, Pireddu L, Roger P, Shaw F, Steinbeck C, Weber RJM, Sansone SA, Rocca-Serra P. ISA API: An open platform for interoperable life science experimental metadata. Gigascience 2021; 10:giab060. [PMID: 34528664 PMCID: PMC8444265 DOI: 10.1093/gigascience/giab060] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 03/19/2021] [Accepted: 08/23/2021] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed. RESULTS In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community. CONCLUSIONS The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
Collapse
Affiliation(s)
- David Johnson
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Department of Informatics and Media, Uppsala University, Box 513, 75120 Uppsala, Sweden
| | - Dominique Batista
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Keeva Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Robert P Davey
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Anthony Etuk
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Alejandra Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
- Science and Technology Facilities Council, Scientific Computing Department, Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK
| | - Kenneth Haug
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Genome Research Limited, Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden, CB10 1RQ, UK
| | - Massimiliano Izzo
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Martin Larralde
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Thomas N Lawson
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Alice Minotto
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Venkata Chandrasekhar Nainala
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Luca Pireddu
- Distributed Computing Group, CRS4: Center for Advanced Studies, Research & Development in Sardinia, Pula 09050, Italy
| | - Pierrick Roger
- CEA, LIST, Laboratory for Data Analysis and Systems’ Intelligence, MetaboHUB, Gif-Sur-Yvette F-91191, France
| | - Felix Shaw
- Earlham Institute, Data infrastructure and algorithms, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Christoph Steinbeck
- Cheminformatics and Computational Metabolomics, Institute for Analytical Chemistry, Lessingstr. 8, 07743 Jena, Germany
| | - Ralf J M Weber
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
- Phenome Centre Birmingham, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| |
Collapse
|
36
|
Mayer G, Müller W, Schork K, Uszkoreit J, Weidemann A, Wittig U, Rey M, Quast C, Felden J, Glöckner FO, Lange M, Arend D, Beier S, Junker A, Scholz U, Schüler D, Kestler HA, Wibberg D, Pühler A, Twardziok S, Eils J, Eils R, Hoffmann S, Eisenacher M, Turewicz M. Implementing FAIR data management within the German Network for Bioinformatics Infrastructure (de.NBI) exemplified by selected use cases. Brief Bioinform 2021; 22:bbab010. [PMID: 33589928 PMCID: PMC8425304 DOI: 10.1093/bib/bbab010] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 12/21/2020] [Accepted: 01/06/2021] [Indexed: 12/21/2022] Open
Abstract
This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.
Collapse
Affiliation(s)
- Gerhard Mayer
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
| | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Karin Schork
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Julian Uszkoreit
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Andreas Weidemann
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Maja Rey
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | | | - Janine Felden
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
| | - Frank Oliver Glöckner
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
- Alfred Wegener Institute - Helmholtz Center for Polar- and Marine Research, Bremerhaven, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Hans A Kestler
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Daniel Wibberg
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Alfred Pühler
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Sven Twardziok
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Jürgen Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Roland Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
- Heidelberg University Hospital and BioQuant, Health Data Science Unit, Heidelberg, Germany
| | - Steve Hoffmann
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Martin Eisenacher
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Michael Turewicz
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| |
Collapse
|
37
|
Sanderson LA, Caron CT, Tan RL, Bett KE. A PostgreSQL Tripal solution for large-scale genotypic and phenotypic data. Database (Oxford) 2021; 2021:baab051. [PMID: 34389844 PMCID: PMC8363843 DOI: 10.1093/database/baab051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/11/2021] [Accepted: 08/03/2021] [Indexed: 11/15/2022]
Abstract
Researchers are seeking cost-effective solutions for management and analysis of large-scale genotypic and phenotypic data. Open-source software is uniquely positioned to fill this need through user-focused, crowd-sourced development. Tripal, an open-source toolkit for developing biological data web portals, uses the GMOD Chado database schema to achieve flexible, ontology-driven storage in PostgreSQL. Tripal also aids research-focused web portals in providing data according to findable, accessible, interoperable, reusable (FAIR) principles. We describe here a fully relational PostgreSQL solution to handle large-scale genotypic and phenotypic data that is implemented as a collection of freely available, open-source modules. These Tripal extension modules provide a holistic approach for importing, storage, display and analysis within a relational database schema. Furthermore, they embody the Tripal approach to FAIR data by providing multiple search tools and ensuring metadata is fully described and interoperable. Our solution focuses on data integrity, as well as optimizing performance to provide a fully functional system that is currently being used in the production of Tripal portals for crop species. We fully describe the implementation of our solution and discuss why a PostgreSQL-powered web portal provides an efficient environment for researcher-driven genotypic and phenotypic data analysis.
Collapse
Affiliation(s)
- Lacey-Anne Sanderson
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| | - Carolyn T Caron
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| | - Reynold L Tan
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| | - Kirstin E Bett
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| |
Collapse
|
38
|
Staton M, Cannon E, Sanderson LA, Wegrzyn J, Anderson T, Buehler S, Cobo-Simón I, Faaberg K, Grau E, Guignon V, Gunoskey J, Inderski B, Jung S, Lager K, Main D, Poelchau M, Ramnath R, Richter P, West J, Ficklin S. Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Brief Bioinform 2021; 22:6318561. [PMID: 34251419 PMCID: PMC8574961 DOI: 10.1093/bib/bbab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/01/2022] Open
Abstract
Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.
Collapse
Affiliation(s)
| | - Ethalinda Cannon
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA USA
| | | | | | | | | | | | - Kay Faaberg
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Emily Grau
- University of Connecticut, Storrs, CT USA
| | | | | | | | - Sook Jung
- Washington State University, Pullman, WA USA
| | - Kelly Lager
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Dorrie Main
- Washington State University, Pullman, WA USA
| | - Monica Poelchau
- USDA-ARS, National Agricultural Library, Beltsville, MD, USA
| | | | | | - Joe West
- University of Tennessee, Knoxville, TN USA
| | | |
Collapse
|
39
|
Andrés-Hernández L, Halimi RA, Mauleon R, Mayes S, Baten A, King GJ. Challenges for FAIR-compliant description and comparison of crop phenotype data with standardized controlled vocabularies. Database (Oxford) 2021; 2021:baab028. [PMID: 33991093 PMCID: PMC8122365 DOI: 10.1093/database/baab028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 04/14/2021] [Accepted: 04/30/2021] [Indexed: 12/04/2022]
Abstract
Crop phenotypic data underpin many pre-breeding efforts to characterize variation within germplasm collections. Although there has been an increase in the global capacity for accumulating and comparing such data, a lack of consistency in the systematic description of metadata often limits integration and sharing. We therefore aimed to understand some of the challenges facing findable, accesible, interoperable and reusable (FAIR) curation and annotation of phenotypic data from minor and underutilized crops. We used bambara groundnut (Vigna subterranea) as an exemplar underutilized crop to assess the ability of the Crop Ontology system to facilitate curation of trait datasets, so that they are accessible for comparative analysis. This involved generating a controlled vocabulary Trait Dictionary of 134 terms. Systematic quantification of syntactic and semantic cohesiveness of the full set of 28 crop-specific COs identified inconsistencies between trait descriptor names, a relative lack of cross-referencing to other ontologies and a flat ontological structure for classifying traits. We also evaluated the Minimal Information About a Phenotyping Experiment and FAIR compliance of bambara trait datasets curated within the CropStoreDB schema. We discuss specifications for a more systematic and generic approach to trait controlled vocabularies, which would benefit from representation of terms that adhere to Open Biological and Biomedical Ontologies principles. In particular, we focus on the benefits of reuse of existing definitions within pre- and post-composed axioms from other domains in order to facilitate the curation and comparison of datasets from a wider range of crops. Database URL: https://www.cropstoredb.org/cs_bambara.html.
Collapse
Affiliation(s)
- Liliana Andrés-Hernández
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Razlin Azman Halimi
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Ramil Mauleon
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Sean Mayes
- School of Biosciences, University of Nottingham, Sutton Bonington, Leicestershire, LE12 5RD,Nottingham, Nottingham, UK
| | - Abdul Baten
- Institute of Precision Medicine & Bioinformatics, Sydney Local Health District, Royal Prince Alfred Hospital, Missenden Road, Camperdown, NSW 2050, Australia
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| |
Collapse
|
40
|
Williamson HF, Brettschneider J, Caccamo M, Davey RP, Goble C, Kersey PJ, May S, Morris RJ, Ostler R, Pridmore T, Rawlings C, Studholme D, Tsaftaris SA, Leonelli S. Data management challenges for artificial intelligence in plant and agricultural research. F1000Res 2021; 10:324. [PMID: 36873457 PMCID: PMC9975417 DOI: 10.12688/f1000research.52204.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2021] [Indexed: 09/14/2024] Open
Abstract
Artificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain.
Collapse
Affiliation(s)
- Hugh F. Williamson
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK
| | | | - Mario Caccamo
- NIAB, National Research Institute of Brewing, East Malling, UK
| | | | - Carole Goble
- Department of Computer Science, University of Manchester, Manchester, UK
| | | | - Sean May
- School of Biosciences, University of Nottingham, Loughborough, UK
| | | | - Richard Ostler
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | - Tony Pridmore
- School of Computer Science, University of Nottingham, Nottingham, UK
| | - Chris Rawlings
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | | | - Sotirios A. Tsaftaris
- Institute of Digital Communications, University of Edinburgh, Edinburgh, UK
- Alan Turing Institute, London, UK
| | - Sabina Leonelli
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
41
|
Williamson HF, Brettschneider J, Caccamo M, Davey RP, Goble C, Kersey PJ, May S, Morris RJ, Ostler R, Pridmore T, Rawlings C, Studholme D, Tsaftaris SA, Leonelli S. Data management challenges for artificial intelligence in plant and agricultural research. F1000Res 2021; 10:324. [PMID: 36873457 PMCID: PMC9975417 DOI: 10.12688/f1000research.52204.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
Artificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain.
Collapse
Affiliation(s)
- Hugh F. Williamson
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK
| | | | - Mario Caccamo
- NIAB, National Research Institute of Brewing, East Malling, UK
| | | | - Carole Goble
- Department of Computer Science, University of Manchester, Manchester, UK
| | | | - Sean May
- School of Biosciences, University of Nottingham, Loughborough, UK
| | | | - Richard Ostler
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | - Tony Pridmore
- School of Computer Science, University of Nottingham, Nottingham, UK
| | - Chris Rawlings
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | | | - Sotirios A. Tsaftaris
- Institute of Digital Communications, University of Edinburgh, Edinburgh, UK
- Alan Turing Institute, London, UK
| | - Sabina Leonelli
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
42
|
Abstract
Technological developments have revolutionized measurements on plant genotypes and phenotypes, leading to routine production of large, complex data sets. This has led to increased efforts to extract meaning from these measurements and to integrate various data sets. Concurrently, machine learning has rapidly evolved and is now widely applied in science in general and in plant genotyping and phenotyping in particular. Here, we review the application of machine learning in the context of plant science and plant breeding. We focus on analyses at different phenotype levels, from biochemical to yield, and in connecting genotypes to these. In this way, we illustrate how machine learning offers a suite of methods that enable researchers to find meaningful patterns in relevant plant data.
Collapse
Affiliation(s)
- Aalt Dirk Jan van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Gert Kootstra
- Farm Technology, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Willem Kruijer
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen 6708 PB, the Netherlands
| |
Collapse
|
43
|
Arias-Baldrich C, Silva MC, Bergeretti F, Chaves I, Miguel C, Saibo NJM, Sobral D, Faria D, Barros PM. CorkOakDB-The Cork Oak Genome Database Portal. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:6056470. [PMID: 33382885 PMCID: PMC7774466 DOI: 10.1093/database/baaa114] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 12/04/2020] [Accepted: 12/11/2020] [Indexed: 11/13/2022]
Abstract
Quercus suber (cork oak) is an evergreen tree native to the Mediterranean basin, which plays a key role in the ecology and economy of this area. Over the last decades, this species has gone through an observable decline, mostly due to environmental factors. Deciphering the mechanisms of cork oak's response to the environment and getting a deep insight into its biology are crucial to counteract biotic and abiotic stresses compromising the stability of a unique ecosystem. In the light of these setbacks, the publication of the genome in 2018 was a major step towards understanding the genetic make-up of this species. In an effort to integrate this information in a comprehensive, accessible and intuitive format, we have developed The Cork Oak Genome Database Portal (CorkOakDB). The CorkOakDB is supported by the BioData.pt e-infrastructure, the Portuguese ELIXIR node for biological data. The portal gives public access to search and explore the curated genomic and transcriptomic data on this species. Moreover, CorkOakDB provides a user-friendly interface and functional tools to help the research community take advantage of the increased accessibility to genomic information. A study case is provided to highlight the functionalities of the portal. CorkOakDB guarantees the update, curation and data collection, aiming to collect data besides the genetic/genomic information, in order to become the main repository in cork oak research. Database URL: http://corkoakdb.org/.
Collapse
Affiliation(s)
- Cirenia Arias-Baldrich
- Instituto Gulbenkian de Ciência, Rua da Quinta Grande, Oeiras 2780-156, Lisboa, Portugal.,Department of Biological and Medical Sciences, Oxford Brookes University, Headington Campus, Oxford OX3 0BP, UK
| | - Marta Contreiras Silva
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade NOVA de Lisboa, Av. da República, Oeiras 2780-157, Lisboa, Portugal
| | - Filippo Bergeretti
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade NOVA de Lisboa, Av. da República, Oeiras 2780-157, Lisboa, Portugal
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade NOVA de Lisboa, Av. da República, Oeiras 2780-157, Lisboa, Portugal.,Instituto de Biologia Experimental Tecnológica (iBET), Av. da República, 2780-157 Oeiras, Lisboa, Portugal
| | - Célia Miguel
- Instituto de Biologia Experimental Tecnológica (iBET), Av. da República, 2780-157 Oeiras, Lisboa, Portugal.,Biosystems & Integrative Sciences Institute (BioISI), Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Nelson J M Saibo
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade NOVA de Lisboa, Av. da República, Oeiras 2780-157, Lisboa, Portugal
| | - Daniel Sobral
- Instituto Gulbenkian de Ciência, Rua da Quinta Grande, Oeiras 2780-156, Lisboa, Portugal.,UCIBIO, Departamento de Ciências da Vida, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Campus de Caparica, Caparica 2825-149, Setúbal, Portugal
| | - Daniel Faria
- Instituto Gulbenkian de Ciência, Rua da Quinta Grande, Oeiras 2780-156, Lisboa, Portugal.,INESC-ID- Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento, Rua Alves Redol, Lisboa 1000-029, Portugal
| | - Pedro M Barros
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade NOVA de Lisboa, Av. da República, Oeiras 2780-157, Lisboa, Portugal
| |
Collapse
|
44
|
Pommier C, Garnett T, Lawrence-Dill CJ, Pridmore T, Watt M, Pieruschka R, Ghamkhar K. Editorial: Phenotyping; From Plant, to Data, to Impact and Highlights of the International Plant Phenotyping Symposium - IPPS 2018. FRONTIERS IN PLANT SCIENCE 2020; 11:618342. [PMID: 33343612 PMCID: PMC7746651 DOI: 10.3389/fpls.2020.618342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 11/10/2020] [Indexed: 06/12/2023]
Affiliation(s)
- Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, France
- Université Paris-Saclay, INRAE, BioinfOmics, Plant Bioinformatics Facility, Versailles, France
| | - Trevor Garnett
- The Plant Accelerator, Australian Plant Phenomics Facility, School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, SA, Australia
| | | | - Tony Pridmore
- University of Nottingham, Nottingham, United Kingdom
| | - Michelle Watt
- Faculty of Science, School of BioSciences, University of Melbourne, Parkville, VIC, Australia
| | - Roland Pieruschka
- Forschungszentrum Jülich, BG-2: Plant Sciences, Institute for Bio- and Geosciences, Jülich, Germany
| | - Kioumars Ghamkhar
- Grasslands Research Centre, AgResearch, Palmerston North, New Zealand
| |
Collapse
|
45
|
Morales N, Bauchet GJ, Tantikanjana T, Powell AF, Ellerbrock BJ, Tecle IY, Mueller LA. High density genotype storage for plant breeding in the Chado schema of Breedbase. PLoS One 2020; 15:e0240059. [PMID: 33175872 PMCID: PMC7657515 DOI: 10.1371/journal.pone.0240059] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/17/2020] [Indexed: 12/24/2022] Open
Abstract
Modern breeding programs routinely use genome-wide information for selecting individuals to advance. The large volumes of genotypic information required present a challenge for data storage and query efficiency. Major use cases require genotyping data to be linked with trait phenotyping data. In contrast to phenotyping data that are often stored in relational database schemas, next-generation genotyping data are traditionally stored in non-relational storage systems due to their extremely large scope. This study presents a novel data model implemented in Breedbase (https://breedbase.org/) for uniting relational phenotyping data and non-relational genotyping data within the open-source PostgreSQL database engine. Breedbase is an open-source, web-database designed to manage all of a breeder's informatics needs: management of field experiments, phenotypic and genotypic data collection and storage, and statistical analyses. The genotyping data is stored in a PostgreSQL data-type known as binary JavaScript Object Notation (JSONb), where the JSON structures closely follow the Variant Call Format (VCF) data model. The Breedbase genotyping data model can handle different ploidy levels, structural variants, and any genotype encoded in VCF. JSONb is both compressed and indexed, resulting in a space and time efficient system. Furthermore, file caching maximizes data retrieval performance. Integration of all breeding data within the Chado database schema retains referential integrity that may be lost when genotyping and phenotyping data are stored in separate systems. Benchmarking demonstrates that the system is fast enough for computation of a genomic relationship matrix (GRM) and genome wide association study (GWAS) for datasets involving 1,325 diploid Zea mays, 314 triploid Musa acuminata, and 924 diploid Manihot esculenta samples genotyped with 955,690, 142,119, and 287,952 genotype-by-sequencing (GBS) markers, respectively.
Collapse
Affiliation(s)
- Nicolas Morales
- Plant Breeding and Genetics, Cornell University, Ithaca, NY, United States of America
- Boyce Thompson Institute, Ithaca, NY, United States of America
| | | | | | | | | | - Isaak Y. Tecle
- Boyce Thompson Institute, Ithaca, NY, United States of America
| | | |
Collapse
|
46
|
Arnaud E, Laporte MA, Kim S, Aubert C, Leonelli S, Miro B, Cooper L, Jaiswal P, Kruseman G, Shrestha R, Buttigieg PL, Mungall CJ, Pietragalla J, Agbona A, Muliro J, Detras J, Hualla V, Rathore A, Das RR, Dieng I, Bauchet G, Menda N, Pommier C, Shaw F, Lyon D, Mwanzia L, Juarez H, Bonaiuti E, Chiputwa B, Obileye O, Auzoux S, Yeumo ED, Mueller LA, Silverstein K, Lafargue A, Antezana E, Devare M, King B. The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems. PATTERNS (NEW YORK, N.Y.) 2020; 1:100105. [PMID: 33205138 PMCID: PMC7660444 DOI: 10.1016/j.patter.2020.100105] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/28/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022]
Abstract
Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams.
Collapse
Affiliation(s)
- Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Soonho Kim
- Markets, Trade and Institutions Division (MTID), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Céline Aubert
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Sabina Leonelli
- Department of Sociology, Philosophy and Anthropology & Exeter Centre for the Study of the Life Sciences (Egenis), University of Exeter, Exeter, UK
| | - Berta Miro
- Agrifood Policy Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Gideon Kruseman
- Socio-Economics Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, Mexico
| | - Rosemary Shrestha
- Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, México
| | - Pier Luigi Buttigieg
- Helmholtz Metadata Collaboration, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Afolabi Agbona
- Cassava Breeding Program, International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Jeffrey Detras
- Bioinformatics Cluster, Strategic Innovation Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Vilma Hualla
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Abhishek Rathore
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Roma Rani Das
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Ibnou Dieng
- Biometrics Unit, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Guillaume Bauchet
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Naama Menda
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Cyril Pommier
- BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Felix Shaw
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - David Lyon
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Leroy Mwanzia
- Performance, Innovation and Strategic Analysis, International Center for Tropical Agriculture (CIAT), Regional Office for Africa, Nairobi, Kenya
| | - Henry Juarez
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Enrico Bonaiuti
- Monitoring, Evaluation and Learning Team, International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut, Lebanon
| | - Brian Chiputwa
- Research Methods Group (RMG), World Agroforestry (ICRAF), Nairobi, Kenya
| | - Olatunbosun Obileye
- Data Management Section, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Sandrine Auzoux
- UPR AIDA, The French Agricultural Research Centre for International Development (CIRAD), Sainte-Clotilde, Réunion, France
- Université de Montpellier, Montpellier, France
| | - Esther Dzalé Yeumo
- Unité Délégation à l’Information Scientifique et Technique - DIST, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Lukas A. Mueller
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | | | | | - Erick Antezana
- Bayer Crop Science SA-NV, Diegem, Belgium
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Medha Devare
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Brian King
- CGIAR Platform for Big Data in Agriculture, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|
47
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020. [PMID: 32171029 DOI: 10.15454/1yxvzv] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
48
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020. [PMID: 32171029 DOI: 10.15454/ah6u4a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
49
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek‐Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte M, Michotey C, Oppermann M, Ostler R, Poorter H, Ramírez‐Gonzalez R, Ramšak Ž, Reif JC, Rocca‐Serra P, Sansone S, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam‐Blondon A, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. THE NEW PHYTOLOGIST 2020; 227:260-273. [PMID: 32171029 PMCID: PMC7317793 DOI: 10.1111/nph.16544] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 02/24/2020] [Indexed: 05/21/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
|
50
|
Khadka K, Earl HJ, Raizada MN, Navabi A. A Physio-Morphological Trait-Based Approach for Breeding Drought Tolerant Wheat. FRONTIERS IN PLANT SCIENCE 2020; 11:715. [PMID: 32582249 PMCID: PMC7286286 DOI: 10.3389/fpls.2020.00715] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 05/06/2020] [Indexed: 05/18/2023]
Abstract
In the past, there have been drought events in different parts of the world, which have negatively influenced the productivity and production of various crops including wheat (Triticum aestivum L.), one of the world's three important cereal crops. Breeding new high yielding drought-tolerant wheat varieties is a research priority specifically in regions where climate change is predicted to result in more drought conditions. Commonly in breeding for drought tolerance, grain yield is the basis for selection, but it is a complex, late-stage trait, affected by many factors aside from drought. A strategy that evaluates genotypes for physiological responses to drought at earlier growth stages may be more targeted to drought and time efficient. Such an approach may be enabled by recent advances in high-throughput phenotyping platforms (HTPPs). In addition, the success of new genomic and molecular approaches rely on the quality of phenotypic data which is utilized to dissect the genetics of complex traits such as drought tolerance. Therefore, the first objective of this review is to describe the growth-stage based physio-morphological traits that could be targeted by breeders to develop drought-tolerant wheat genotypes. The second objective is to describe recent advances in high throughput phenotyping of drought tolerance related physio-morphological traits primarily under field conditions. We discuss how these strategies can be integrated into a comprehensive breeding program to mitigate the impacts of climate change. The review concludes that there is a need for comprehensive high throughput phenotyping of physio-morphological traits that is growth stage-based to improve the efficiency of breeding drought-tolerant wheat.
Collapse
Affiliation(s)
- Kamal Khadka
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | | | | | | |
Collapse
|