1
|
Arnaud E, Menda N, Tran T, Asiimwe A, Kanaabi M, Meghar K, Forsythe L, Kawuki R, Ellebrock B, Kayondo IS, Agbona A, Zhang X, Mendes T, Laporte MA, Nakitto M, Ssali RT, Asfaw A, Uwimana B, Ogbete CE, Makunde G, Maraval I, Mueller LA, Bouniol A, Fauvelle E, Dufour D. Connecting data for consumer preferences, food quality, and breeding in support of market-oriented breeding of root, tuber, and banana crops. J Sci Food Agric 2024; 104:4514-4526. [PMID: 37226655 DOI: 10.1002/jsfa.12710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/25/2023] [Accepted: 05/24/2023] [Indexed: 05/26/2023]
Abstract
The 5-year project 'Breeding roots, tubers and banana products for end user preferences' (RTBfoods) focused on collecting consumers' preferences on 12 food products to guide breeding programmes. It involved multidisciplinary teams from Africa, Latin America, and Europe. Diverse data types were generated on preferred qualities of users (farmers, family and entrepreneurial processors, traders or retailers, and consumers). Country-based target product profiles were produced with a comprehensive market analysis, disaggregating gender's role and preferences, providing prioritised lists of traits for the development of new plant varieties. We describe the approach taken to create, in the roots, tubers, and banana breeding databases, a centralised and meaningful open access to sensory information on food products and genotypes. Biochemical, instrumental textural, and sensory analysis data are then directly connected to the specific plant record while user survey data, bearing personal information, were analysed, anonymised, and uploaded in a repository. Names and descriptions of food quality traits were added into the Crop Ontology for labelling data in the databases, along with the various methods of measurement used by the project. The development and application of standard operating procedures, data templates, and adapted trait ontologies improved the data quality and its format, enabling the linking of these to the plant material studied when uploaded in the breeding databases or in repositories. Some modifications to the database model were necessary to accommodate the food sensory traits and sensory panel trials. © 2023 The Authors. Journal of The Science of Food and Agriculture published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.
Collapse
Affiliation(s)
- Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Naama Menda
- Boyce Thompson Institute (BTI), Ithaca, NY, USA
| | - Thierry Tran
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
- International Centre for Tropical Agriculture (CIAT), Cali, Colombia
| | - Amos Asiimwe
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
| | - Michael Kanaabi
- National Crops Resources Research Institute (NaCRRI), Kampala, Uganda
| | - Karima Meghar
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
| | - Lora Forsythe
- Natural Resources Institute (NRI), Faculty of Engineering & Science, Livelihoods and Institutions Department, University of Greenwich, London, UK
| | - Robert Kawuki
- National Crops Resources Research Institute (NaCRRI), Kampala, Uganda
| | | | | | - Afolabi Agbona
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Xiaofei Zhang
- International Centre for Tropical Agriculture (CIAT), Cali, Colombia
| | | | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | | | | | - Asrat Asfaw
- International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | - Brigitte Uwimana
- International Institute of Tropical Agriculture (IITA), Kampala, Uganda
| | | | | | - Isabelle Maraval
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
| | | | - Alexandre Bouniol
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
- CIRAD, UMR QualiSud, Cotonou, Bénin
| | - Eglantine Fauvelle
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
| | - Dominique Dufour
- CIRAD, UMR QualiSud, Montpellier, France
- University of Montpellier, Avignon Université, CIRAD, Institut Agro, IRD, Université de La Réunion, Montpellier, France
| |
Collapse
|
2
|
Shen Z, Shen E, Yang K, Fan Z, Zhu QH, Fan L, Ye CY. BreedingAIDB: A database integrating crop genome-to-phenotype paired data with machine learning tools applicable to breeding. Plant Commun 2024:100894. [PMID: 38571312 DOI: 10.1016/j.xplc.2024.100894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/04/2024] [Accepted: 04/02/2024] [Indexed: 04/05/2024]
Affiliation(s)
- Zijie Shen
- Hainan Institute, Zhejiang University, Sanya 572025, China; Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Enhui Shen
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Kun Yang
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Zuoqian Fan
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Canberra, ACT 2601, Australia
| | - Longjiang Fan
- Hainan Institute, Zhejiang University, Sanya 572025, China; Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
| | - Chu-Yu Ye
- Institute of Crop Science & Institute of Bioinformatics, College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
3
|
Vargas-Rojas L, Ting TC, Rainey KM, Reynolds M, Wang DR. AgTC and AgETL: open-source tools to enhance data collection and management for plant science research. Front Plant Sci 2024; 15:1265073. [PMID: 38450403 PMCID: PMC10915008 DOI: 10.3389/fpls.2024.1265073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/30/2024] [Indexed: 03/08/2024]
Abstract
Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
Collapse
Affiliation(s)
- Luis Vargas-Rojas
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - To-Chia Ting
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Katherine M. Rainey
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Matthew Reynolds
- Wheat Physiology Group, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Diane R. Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
4
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. Front Plant Sci 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
5
|
Clarke JL, Cooper LD, Poelchau MF, Berardini TZ, Elser J, Farmer AD, Ficklin S, Kumari S, Laporte MA, Nelson RT, Sadohara R, Selby P, Thessen AE, Whitehead B, Sen TZ. Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium. Database (Oxford) 2023; 2023:baad076. [PMID: 37971715 PMCID: PMC10653126 DOI: 10.1093/database/baad076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]
Abstract
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.
Collapse
Affiliation(s)
- Jennifer L Clarke
- Department of Statistics and Department of Food Science and Technology, University of Nebraska–Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Laurel D Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Monica F Poelchau
- USDA, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Ave, Beltsville 20705, USA
| | - Tanya Z Berardini
- The Arabidopsis Information Resource and Phoenix Bioinformatic, 39899 Balentine Drive, Suite 200, Newark, CA, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Andrew D Farmer
- National Center for Genome Resources, 2935 Rodeo Park Dr. E., Santa Fe, NM 87505, USA
| | - Stephen Ficklin
- Department of Horticulture, Washington State University, 249 Clark Hall, PO Box 646414, Pullman, WA 99164, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Marie-Angélique Laporte
- Digital Inclusion, Bioversity International, Parc Scientifique Agropolis II, 1990 Bd de la Lironde, Montpellier 34397, France
| | - Rex T Nelson
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Iowa State University, 716 Farmhouse Lane, Ames, IA 50011, USA
| | - Rie Sadohara
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, 1066 Bogue St, East Lansing, MI 48824, USA
| | - Peter Selby
- School of Integrative Plant Science, College of Agriculture and Life Sciences, Cornell University, 215 Garden Avenue, Ithaca, NY 14850, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz, 1890 N. Revere Court, Mailstop F600, Aurora CO 80045, USA
| | - Brandon Whitehead
- Data Science and Informatics, Manaaki Whenua—Landcare Research, Ltd., Riddet Road, Massey University, Palmerston North 4472, New Zealand
| | - Taner Z Sen
- USDA, Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany 94710, USA
- Department of Bioengineering, University of California, 306 Stanley Hall, Berkeley, CA 94720, USA
| |
Collapse
|
6
|
Sparks AH, Ponte EMD, Alves KS, Foster ZSL, Grünwald NJ. Openness and Computational Reproducibility in Plant Pathology: Where We Stand and a Way Forward. Phytopathology 2023; 113:1159-1170. [PMID: 36624724 DOI: 10.1094/phyto-10-21-0430-per] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Open research practices have been highlighted extensively during the last 10 years in many fields of scientific study as essential standards needed to promote transparency and reproducibility of scientific results. Scientific claims can only be evaluated based on how protocols, materials, equipment, and methods were described; data were collected and prepared; and analyses were conducted. Openly sharing protocols, data, and computational code is central to current scholarly dissemination and communication, but in many fields, including plant pathology, adoption of these practices has been slow. We randomly selected 450 articles published from 2012 to 2021 across 21 journals representative of the plant pathology discipline and assigned them scores reflecting their openness and computational reproducibility. We found that most of the articles did not follow protocols for open science and failed to share data or code in a reproducible way. We propose that use of open-source tools facilitates computationally reproducible work and analyses, benefitting not just readers but the authors as well. Finally, we provide ideas and suggest tools to promote open, reproducible computational research practices among plant pathologists. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY 4.0 International license.
Collapse
Affiliation(s)
- Adam H Sparks
- Department of Primary Industries and Regional Development, Perth, WA 6000, Australia
- University of Southern Queensland, Centre for Crop Health, Toowoomba, Qld 4350, Australia
| | | | - Kaique S Alves
- Departmento de Fitopatologia, Universidade Federal de Viçosa, Brazil
| | - Zachary S L Foster
- Horticultural Crops Disease and Pest Management Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Corvallis, OR 97330, U.S.A
| | - Niklaus J Grünwald
- Horticultural Crops Disease and Pest Management Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Corvallis, OR 97330, U.S.A
| |
Collapse
|
7
|
Karabulut E, Erkoç K, Acı M, Aydın M, Barriball S, Braley J, Cassetta E, Craine EB, Diaz-Garcia L, Hershberger J, Meyering B, Miller AJ, Rubin MJ, Tesdell O, Schlautman B, Şakiroğlu M. Sainfoin ( Onobrychis spp.) crop ontology: supporting germplasm characterization and international research collaborations. Front Plant Sci 2023; 14:1177406. [PMID: 37255566 PMCID: PMC10225502 DOI: 10.3389/fpls.2023.1177406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 04/18/2023] [Indexed: 06/01/2023]
Abstract
Sainfoin (Onobrychis spp.) is a perennial forage legume that is also attracting attention as a perennial pulse with potential for human consumption. The dual use of sainfoin underpins diverse research and breeding programs focused on improving sainfoin lines for forage and pulses, which is driving the generation of complex datasets describing high dimensional phenotypes in the post-omics era. To ensure that multiple user groups, for example, breeders selecting for forage and those selecting for edible seed, can utilize these rich datasets, it is necessary to develop common ontologies and accessible ontology platforms. One such platform, Crop Ontology, was created in 2008 by the Consortium of International Agricultural Research Centers (CGIAR) to host crop-specific trait ontologies that support standardized plant breeding databases. In the present study, we describe the sainfoin crop ontology (CO). An in-depth literature review was performed to develop a comprehensive list of traits measured and reported in sainfoin. Because the same traits can be measured in different ways, ultimately, a set of 98 variables (variable = plant trait + method of measurement + scale of measurement) used to describe variation in sainfoin were identified. Variables were formatted and standardized based on guidelines provided here for inclusion in the sainfoin CO. The 98 variables contained a total of 82 traits from four trait classes of which 24 were agronomic, 31 were morphological, 19 were seed and forage quality related, and 8 were phenological. In addition to the developed variables, we have provided a roadmap for developing and submission of new traits to the sainfoin CO.
Collapse
Affiliation(s)
- Ebrar Karabulut
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| | - Kübra Erkoç
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| | - Murat Acı
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
- The Land Institute, Salina, KS, United States
| | - Mahmut Aydın
- Department of Computer Engineering, Kafkas University, Kars, Türkiye
| | | | - Jackson Braley
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | | | | | - Luis Diaz-Garcia
- Department of Viticulture and Enology, University of California Davis, Davis, CA, United States
| | - Jenna Hershberger
- Plant and Environmental Sciences Department, Clemson University, Clemson, SC, United States
| | - Bo Meyering
- The Land Institute, Salina, KS, United States
| | - Allison J. Miller
- Donald Danforth Plant Science Center, St. Louis, MO, United States
- Department. of Biology, Saint Louis University, St. Louis, MO, United States
| | - Matthew J. Rubin
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | - Omar Tesdell
- Department of Geography, Birzeit University, Birzeit, West Bank, Palestine
| | | | - Muhammet Şakiroğlu
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| |
Collapse
|
8
|
Dipta B, Sood S, Devi R, Bhardwaj V, Mangal V, Thakur AK, Kumar V, Pandey N, Rathore A, Singh A. Digitalization of potato breeding program: Improving data collection and management. Heliyon 2023; 9:e12974. [PMID: 36747944 PMCID: PMC9898647 DOI: 10.1016/j.heliyon.2023.e12974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 01/02/2023] [Accepted: 01/10/2023] [Indexed: 01/22/2023] Open
Abstract
A plant breeding program involves hundreds of experiments, each having number of entries, genealogy information, linked experimental design, lists of treatments, observed traits, and data analysis. The traditional method of arranging breeding program information and data recording and maintenance is not centralized and is always scattered in different file systems which is inconvenient for retrieving breeding information resulting in poor data management and the loss of crucial data. Data administration requires a significant amount of manpower and resources to maintain nurseries, trials, germplasm lines, and pedigree records. Further, data transcription in scattered spreadsheets and files leads to nomenclature and typing mistakes, which affects data analysis and selection decisions in breeding programs. The accurate data recording and management tools could improve the efficiency of breeding programs. Recent interventions in data management using computer-based breeding databases and informatics applications and tools have made the breeder's life easier. Because of its digital nature, the data obtained is improved even further, allowing for the acquisition of images, voice recording and other specific data kinds. Public breeding programs are far behind the industry in the use of data management tools and softwares. In this article, we have compiled the information on available data recording tools and breeding data management softwares with major emphasis on potato breeding data management.
Collapse
Affiliation(s)
- Bhawna Dipta
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Salej Sood
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India,Corresponding author. ;
| | - Rasna Devi
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Vinay Bhardwaj
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Vikas Mangal
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Ajay Kumar Thakur
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Vinod Kumar
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - N.K. Pandey
- ICAR-Central Potato Research Institute (CPRI), Shimla, Himachal Pradesh-171001, India
| | - Abhishek Rathore
- CGIAR Excellence in Breeding Platform (EiB), International Maize and Wheat Improvement Center (CIMMYT), India
| | - A.K. Singh
- Division of Horticultural Science, KAB-II, Pusa, New Delhi-110012, India
| |
Collapse
|
9
|
Hoyt CT, Balk M, Callahan TJ, Domingo-fernández D, Haendel MA, Hegde HB, Himmelstein DS, Karis K, Kunze J, Lubiana T, Matentzoglu N, Mcmurry J, Moxon S, Mungall CJ, Rutz A, Unni DR, Willighagen E, Winston D, Gyori BM. Unifying the identification of biomedical entities with the Bioregistry. Sci Data 2022; 9:714. [DOI: 10.1038/s41597-022-01807-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 10/26/2022] [Indexed: 11/21/2022] Open
Abstract
AbstractThe standardized identification of biomedical entities is a cornerstone of interoperability, reuse, and data integration in the life sciences. Several registries have been developed to catalog resources maintaining identifiers for biomedical entities such as small molecules, proteins, cell lines, and clinical trials. However, existing registries have struggled to provide sufficient coverage and metadata standards that meet the evolving needs of modern life sciences researchers. Here, we introduce the Bioregistry, an integrative, open, community-driven metaregistry that synthesizes and substantially expands upon 23 existing registries. The Bioregistry addresses the need for a sustainable registry by leveraging public infrastructure and automation, and employing a progressive governance model centered around open code and open data to foster community contribution. The Bioregistry can be used to support the standardized annotation of data, models, ontologies, and scientific literature, thereby promoting their interoperability and reuse. The Bioregistry can be accessed through https://bioregistry.io and its source code and data are available under the MIT and CC0 Licenses at https://github.com/biopragmatics/bioregistry.
Collapse
|
10
|
Morales N, Ogbonna AC, Ellerbrock BJ, Bauchet GJ, Tantikanjana T, Tecle IY, Powell AF, Lyon D, Menda N, Simoes CC, Saha S, Hosmani P, Flores M, Panitz N, Preble RS, Agbona A, Rabbi I, Kulakow P, Peteti P, Kawuki R, Esuma W, Kanaabi M, Chelangat DM, Uba E, Olojede A, Onyeka J, Shah T, Karanja M, Egesi C, Tufan H, Paterne A, Asfaw A, Jannink JL, Wolfe M, Birkett CL, Waring DJ, Hershberger JM, Gore MA, Robbins KR, Rife T, Courtney C, Poland J, Arnaud E, Laporte MA, Kulembeka H, Salum K, Mrema E, Brown A, Bayo S, Uwimana B, Akech V, Yencho C, de Boeck B, Campos H, Swennen R, Edwards JD, Mueller LA. Breedbase: a digital ecosystem for modern plant breeding. G3 Genes|Genomes|Genetics 2022; 12:6564228. [PMID: 35385099 PMCID: PMC9258556 DOI: 10.1093/g3journal/jkac078] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 02/14/2022] [Indexed: 01/17/2023]
Abstract
Modern breeding methods integrate next-generation sequencing and phenomics to identify plants with the best characteristics and greatest genetic merit for use as parents in subsequent breeding cycles to ultimately create improved cultivars able to sustain high adoption rates by farmers. This data-driven approach hinges on strong foundations in data management, quality control, and analytics. Of crucial importance is a central database able to (1) track breeding materials, (2) store experimental evaluations, (3) record phenotypic measurements using consistent ontologies, (4) store genotypic information, and (5) implement algorithms for analysis, prediction, and selection decisions. Because of the complexity of the breeding process, breeding databases also tend to be complex, difficult, and expensive to implement and maintain. Here, we present a breeding database system, Breedbase (https://breedbase.org/, last accessed 4/18/2022). Originally initiated as Cassavabase (https://cassavabase.org/, last accessed 4/18/2022) with the NextGen Cassava project (https://www.nextgencassava.org/, last accessed 4/18/2022), and later developed into a crop-agnostic system, it is presently used by dozens of different crops and projects. The system is web based and is available as open source software. It is available on GitHub (https://github.com/solgenomics/, last accessed 4/18/2022) and packaged in a Docker image for deployment (https://hub.docker.com/u/breedbase, last accessed 4/18/2022). The Breedbase system enables breeding programs to better manage and leverage their data for decision making within a fully integrated digital ecosystem.
Collapse
Affiliation(s)
- Nicolas Morales
- Boyce Thompson Institute , Ithaca, NY 14853, USA
- Cornell University , Ithaca, NY 14853, USA
| | - Alex C Ogbonna
- Boyce Thompson Institute , Ithaca, NY 14853, USA
- Cornell University , Ithaca, NY 14853, USA
| | | | | | | | | | | | - David Lyon
- Boyce Thompson Institute , Ithaca, NY 14853, USA
| | - Naama Menda
- Boyce Thompson Institute , Ithaca, NY 14853, USA
| | | | - Surya Saha
- Boyce Thompson Institute , Ithaca, NY 14853, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | - Ezenwanyi Uba
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | - Adeyemi Olojede
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | - Joseph Onyeka
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | | | | | - Chiedozie Egesi
- Boyce Thompson Institute , Ithaca, NY 14853, USA
- IITA Ibadan , 200001 Ibadan, Nigeria
- National Root Crops Research Institute (NRCRI) , 463109 Umudike, Nigeria
| | - Hale Tufan
- Cornell University , Ithaca, NY 14853, USA
| | | | | | - Jean-Luc Jannink
- Cornell University , Ithaca, NY 14853, USA
- USDA-ARS , Ithaca, NY 14853, USA
| | | | - Clay L Birkett
- Cornell University , Ithaca, NY 14853, USA
- USDA-ARS , Ithaca, NY 14853, USA
| | - David J Waring
- Cornell University , Ithaca, NY 14853, USA
- USDA-ARS , Ithaca, NY 14853, USA
| | | | | | | | - Trevor Rife
- Kansas State University , Manhattan, KS 66506, USA
| | | | - Jesse Poland
- Kansas State University , Manhattan, KS 66506, USA
| | | | | | | | | | | | | | | | | | | | - Craig Yencho
- North Carolina State University (NCSU) , Raleigh, NC 27695, USA
| | | | | | | | | | | |
Collapse
|
11
|
Hassall KL, Coleman K, Dixit PN, Granger SJ, Zhang Y, Sharp RT, Wu L, Whitmore AP, Richter GM, Collins AL, Milne AE. Exploring the effects of land management change on productivity, carbon and nutrient balance: Application of an Ensemble Modelling Approach to the upper River Taw observatory, UK. Sci Total Environ 2022; 824:153824. [PMID: 35182632 PMCID: PMC9022088 DOI: 10.1016/j.scitotenv.2022.153824] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/31/2022] [Accepted: 02/08/2022] [Indexed: 06/14/2023]
Abstract
Agriculture is challenged to produce healthy food and to contribute to cleaner energy whilst mitigating climate change and protecting ecosystems. To achieve this, policy-driven scenarios need to be evaluated with available data and models to explore trade-offs with robust accounting for the uncertainty in predictions. We developed a novel model ensemble using four complementary state-of-the-art agroecosystems models to explore the impacts of land management change. The ensemble was used to simulate key agricultural and environmental outputs under various scenarios for the upper River Taw observatory, UK. Scenarios assumed (i) reducing livestock production whilst simultaneously increasing the area of arable where it is feasible to cultivate (PG2A), (ii) reducing livestock production whilst simultaneously increasing bioenergy production in areas of the catchment that are amenable to growing bioenergy crops (PG2BE) and (iii) increasing both arable and bioenergy production (PG2A + BE). Our ensemble approach combined model uncertainty using the tower property of expectation and the law of total variance. Results show considerable uncertainty for predicted nutrient losses with different models partitioning the uncertainty into different pathways. Bioenergy crops were predicted to produce greatest yields from Miscanthus in lowland and from SRC-willow (cv. Endurance) in uplands. Each choice of management is associated with trade-offs; e.g. PG2A results in a significant increase of edible calories (6736 Mcal ha-1) but reduced soil C (-4.32 t C ha-1). Model ensembles in the agroecosystem context are difficult to implement due to challenges of model availability and input and output alignment. Despite these challenges, we show that ensemble modelling is a powerful approach for applications such as ours, offering benefits such as capturing structural as well as data uncertainty and allowing greater combinations of variables to be explored. Furthermore, the ensemble provides a robust means for combining uncertainty at different scales and enables us to identify weaknesses in system understanding.
Collapse
Affiliation(s)
- Kirsty L Hassall
- Computational and Analytical Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | - Kevin Coleman
- Sustainable Agriculture Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | - Prakash N Dixit
- Sustainable Agriculture Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | - Steve J Granger
- Sustainable Agriculture Sciences department, Rothamsted Research, North Wyke, Oakhampton EX20 2SB, UK.
| | - Yusheng Zhang
- Sustainable Agriculture Sciences department, Rothamsted Research, North Wyke, Oakhampton EX20 2SB, UK.
| | - Ryan T Sharp
- Sustainable Agriculture Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | - Lianhai Wu
- Sustainable Agriculture Sciences department, Rothamsted Research, North Wyke, Oakhampton EX20 2SB, UK.
| | - Andrew P Whitmore
- Sustainable Agriculture Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | - Goetz M Richter
- Sustainable Agriculture Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | - Adrian L Collins
- Sustainable Agriculture Sciences department, Rothamsted Research, North Wyke, Oakhampton EX20 2SB, UK.
| | - Alice E Milne
- Sustainable Agriculture Sciences Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| |
Collapse
|
12
|
Abstract
The deployment of various networks (e.g., Internet of Things [IoT] and mobile networks), databases (e.g., nutrition tables and food compositional databases), and social media (e.g., Instagram and Twitter) generates huge amounts of food data, which present researchers with an unprecedented opportunity to study various problems and applications in food science and industry via data-driven computational methods. However, these multi-source heterogeneous food data appear as information silos, leading to difficulty in fully exploiting these food data. The knowledge graph provides a unified and standardized conceptual terminology in a structured form, and thus can effectively organize these food data to benefit various applications. In this review, we provide a brief introduction to knowledge graphs and the evolution of food knowledge organization mainly from food ontology to food knowledge graphs. We then summarize seven representative applications of food knowledge graphs, such as new recipe development, diet-disease correlation discovery, and personalized dietary recommendation. We also discuss future directions in this field, such as multimodal food knowledge graph construction and food knowledge graphs for human health.
Collapse
Affiliation(s)
- Weiqing Min
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chunlin Liu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Leyi Xu
- Soochow University, Suzhou, Jiangsu 215006, China
| | - Shuqiang Jiang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
13
|
Kruseman G. A Flexible, Extensible, Machine-Readable, Human-Intelligible, and Ontology-Agnostic Metadata Schema (OIMS). Front Sustain Food Syst 2022. [DOI: 10.3389/fsufs.2022.767863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper presents a lightweight, flexible, extensible, machine readable and human-intelligible metadata schema that does not depend on a specific ontology. The metadata schema for metadata of data files is based on the concept of data lakes where data is stored as they are. The purpose of the schema is to enhance data interoperability. The lack of interoperability of messy socio-economic datasets that contain a mixture of structured, semi-structured, and unstructured data means that many datasets are underutilized. Adding a minimum set of rich metadata and describing new and existing data dictionaries in a standardized way goes a long way to make these high-variety datasets interoperable and reusable and hence allows timely and actionable information to be gleaned from those datasets. The presented metadata schema OIMS can help to standardize the description of metadata. The paper introduces overall concepts of metadata, discusses design principles of metadata schemes, and presents the structure and an applied example of OIMS.
Collapse
|
14
|
Ali B, Dahlhaus P. Roles of Selective Agriculture Practices in Sustainable Agricultural Performance: A Systematic Review. Sustainability 2022; 14:3185. [DOI: 10.3390/su14063185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Feeding the growing global population while improving the Earth’s economic, environmental, and social values is a challenge recognised in both the United Nations Sustainable Development Goals and the United Nations Framework Convention on Climate Change. Sustaining global agricultural performance requires regular revision of current farming models, attitudes, and practices. In systematically reviewing the international literature through the lens of the sustainability framework, this paper specifically identifies precision conservation agriculture (PCA), digital agriculture (DA), and resilient agriculture (RA) practices as being of value in meeting future challenges. Each of these adaptations carries significantly positive relationships with sustaining agricultural performance, as well as positively mediating and/or moderating each other. While it is clear from the literature that adopting PCA, DA, and RA would substantially improve the sustainability of agricultural performance, the uptake of these adaptations generally lags. More in-depth social science research is required to understand the value propositions that would encourage uptake of these adaptations and the barriers that prevent them. Recommendations are made to explore the specific knowledge gap that needs to be understood to motivate agriculture practitioners to adopt these changes in practice.
Collapse
|
15
|
Filter M, Nauta M, Pires SM, Guillier L, Buschhardt T. Towards efficient use of data, models and tools in food microbiology. Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
16
|
Min H, Alemi F, Hane CA, Nori VS. Improving the Accuracy of Predictive Models for Outcomes of Antidepressants by Using an Ontological Adjustment Approach. Applied Sciences 2022; 12:1479. [DOI: 10.3390/app12031479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
For patients with rare comorbidities, there are insufficient observations to accurately estimate the effectiveness of treatment. At the same time, all diagnosis, including rare diagnosis, are part of the International Classification of Disease (ICD). Grouping ICD into broader concepts (i.e., ontology adjustment) can not only increase accuracy of estimating antidepressant effectiveness for patients with rare conditions but also prevent overfitting in big data analysis. In this study, 3,678,082 depressed patients treated with antidepressants were obtained from OptumLabs® Data Warehouse (OLDW). For rare diagnoses, adjustments were made by using the likelihood ratio of the immediate broader concept in the ICD hierarchies. The accuracy of models in training (90%) and test (10%) sets was examined using the area under the receiver operating curves (AROC). The gap in training and test AROC shows how much random noise was modeled. If the gap is large, then the parameters of the model, including the reported effectiveness of the antidepressant for patients with rare conditions, are suspect. There was, on average, a 9.0% reduction in the AROC gap after using the ontological adjustment. Therefore, ontology adjustment can reduce model overfitting, leading to better parameter estimates from the training set.
Collapse
|
17
|
Gorman L, Browne WJ, Woods CJ, Eisler MC, van Wijk MT, Dowsey AW, Hammond J. What's Stopping Knowledge Synthesis? A Systematic Review of Recent Practices in Research on Smallholder Diversity. Front Sustain Food Syst 2021. [DOI: 10.3389/fsufs.2021.727425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A systematic review of recent publications was conducted to assess the extent to which contemporary micro-level research on smallholders facilitates data re-use and knowledge synthesis. Following PRISMA standards for systematic review, 1,182 articles were identified (published between 2018 and 2020), and 261 articles were selected for review in full. The themes investigated were: (i) data management, including data source, variables collected, granularity, and availability of the data; (ii) the statistical methods used, including analytical approach and reproducibility; and (iii) the interpretation of results, including the scope and objectives of the study, development issues addressed, scale of recommendations made relative to the scale of the sample, and the audience for recommendations. It was observed that household surveys were the most common data source and tended to be representative at the local (community) level. There was little harmonization of the variables collected between studies. Over three quarters of the studies (77%) drew on data which was not in the public domain, 14% published newly open data, and 9% drew on datasets which were already open. Other than descriptive statistics, linear and logistic regression methods were the most common analytical method used (64% of articles). In the vast majority of those articles, regression was used as an explanatory tool, as opposed to a predictive tool. More than half of the articles (59%) made claims or recommendations which extended beyond the coverage of their datasets. In combination these two common practices may lead to erroneous understanding: the tendency to rely upon simple regressions to explain context-specific and complex associations; and the tendency to generalize beyond the remit of the data collected. We make four key recommendations: (1) increased data sharing and variable harmonization would enable data to be re-used between studies; (2) providing detailed meta-data on sampling frames and study-context would enable more powerful meta-analyses; (3) methodological openness and predictive modeling could help test the transferability of approaches; (4) more precise language in study conclusions could help decision makers understand the relevance of findings for policy planning. Following these practices could leverage greater benefits from the substantial investment already made in data collection on smallholder farms.
Collapse
|
18
|
Holmgren SD, Boyles RR, Cronk RD, Duncan CG, Kwok RK, Lunn RM, Osborn KC, Thessen AE, Schmitt CP. Catalyzing Knowledge-Driven Discovery in Environmental Health Sciences through a Community-Driven Harmonized Language. Int J Environ Res Public Health 2021; 18:8985. [PMID: 34501574 PMCID: PMC8430534 DOI: 10.3390/ijerph18178985] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/13/2021] [Accepted: 08/19/2021] [Indexed: 01/10/2023]
Abstract
Harmonized language is critical for helping researchers to find data, collecting scientific data to facilitate comparison, and performing pooled and meta-analyses. Using standard terms to link data to knowledge systems facilitates knowledge-driven analysis, allows for the use of biomedical knowledge bases for scientific interpretation and hypothesis generation, and increasingly supports artificial intelligence (AI) and machine learning. Due to the breadth of environmental health sciences (EHS) research and the continuous evolution in scientific methods, the gaps in standard terminologies, vocabularies, ontologies, and related tools hamper the capabilities to address large-scale, complex EHS research questions that require the integration of disparate data and knowledge sources. The results of prior workshops to advance a harmonized environmental health language demonstrate that future efforts should be sustained and grounded in scientific need. We describe a community initiative whose mission was to advance integrative environmental health sciences research via the development and adoption of a harmonized language. The products, outcomes, and recommendations developed and endorsed by this community are expected to enhance data collection and management efforts for NIEHS and the EHS community, making data more findable and interoperable. This initiative will provide a community of practice space to exchange information and expertise, be a coordination hub for identifying and prioritizing activities, and a collaboration platform for the development and adoption of semantic solutions. We encourage anyone interested in advancing this mission to engage in this community.
Collapse
Affiliation(s)
- Stephanie D. Holmgren
- Office of Data Science, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA;
| | | | | | - Christopher G. Duncan
- Genes, Environment, and Health Branch, Division of Extramural Research and Training, NIEHS, Durham, NC 27709, USA;
| | - Richard K. Kwok
- Epidemiology Branch, Division of Intramural Research, NIEHS, Durham, NC 27709, USA;
- Office of the Director, NIEHS, Bethesda, MD 20892, USA
| | - Ruth M. Lunn
- Integrative Health Assessment Branch, Division of the National Toxicology Program, NIEHS, Durham, NC 27709, USA;
| | | | - Anne E. Thessen
- Environmental and Molecular Toxicology Department, Oregon State University, Corvallis, OR 97331, USA;
| | - Charles P. Schmitt
- Office of Data Science, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA;
| |
Collapse
|
19
|
Bozada T, Borden J, Workman J, Del Cid M, Malinowski J, Luechtefeld T. Sysrev: A FAIR Platform for Data Curation and Systematic Evidence Review. Front Artif Intell 2021; 4:685298. [PMID: 34423285 PMCID: PMC8374944 DOI: 10.3389/frai.2021.685298] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 07/13/2021] [Indexed: 11/16/2022] Open
Abstract
Well-curated datasets are essential to evidence based decision making and to the integration of artificial intelligence with human reasoning across disciplines. However, many sources of data remain siloed, unstructured, and/or unavailable for complementary and secondary research. Sysrev was developed to address these issues. First, Sysrev was built to aid in systematic evidence reviews (SER), where digital documents are evaluated according to a well defined process, and where Sysrev provides an easy to access, publicly available and free platform for collaborating in SER projects. Secondly, Sysrev addresses the issue of unstructured, siloed, and inaccessible data in the context of generalized data extraction, where human and machine learning algorithms are combined to extract insights and evidence for better decision making across disciplines. Sysrev uses FAIR - Findability, Accessibility, Interoperability, and Reuse of digital assets - as primary principles in design. Sysrev was developed primarily because of an observed need to reduce redundancy, reduce inefficient use of human time and increase the impact of evidence based decision making. This publication is an introduction to Sysrev as a novel technology, with an overview of the features, motivations and use cases of the tool. Methods: Sysrev. com is a FAIR motivated web platform for data curation and SER. Sysrev allows users to create data curation projects called "sysrevs" wherein users upload documents, define review tasks, recruit reviewers, perform review tasks, and automate review tasks. Conclusion: Sysrev is a web application designed to facilitate data curation and SERs. Thousands of publicly accessible Sysrev projects have been created, accommodating research in a wide variety of disciplines. Described use cases include data curation, managed reviews, and SERs.
Collapse
Affiliation(s)
| | | | | | | | | | - Thomas Luechtefeld
- Insilica LLC, Bethesda, MD, United States
- Toxtrack LLC, Baltimore, MD, United States
| |
Collapse
|
20
|
Abstract
An ontology is a formal representation of domain knowledge, which can be interpreted by machines. In recent years, ontologies have become a major tool for domain knowledge representation and a core component of many knowledge management systems, decision-support systems and other intelligent systems, inter alia, in the context of agriculture. A review of the existing literature on agricultural ontologies, however, reveals that most of the studies, which propose agricultural ontologies, are lacking an explicit evaluation procedure. This is undesired because without well-structured evaluation processes, it is difficult to consider the value of ontologies to research and practice. Moreover, it is difficult to rely on such ontologies and share them on the Semantic Web or between semantic-aware applications. With the growing number of ontology-based agricultural systems and the increasing popularity of the Semantic Web, it becomes essential that such evaluation methods are applied during the ontology development process. Our work contributes to the literature on agricultural ontologies by presenting a framework that guides the selection of suitable evaluation methods, which seems to be missing from most existing studies on agricultural ontologies. The framework supports the matching of appropriate evaluation methods for a given ontology based on the ontology’s purpose.
Collapse
|
21
|
Andrés-Hernández L, Halimi RA, Mauleon R, Mayes S, Baten A, King GJ. Challenges for FAIR-compliant description and comparison of crop phenotype data with standardized controlled vocabularies. Database (Oxford) 2021; 2021:baab028. [PMID: 33991093 PMCID: PMC8122365 DOI: 10.1093/database/baab028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 04/14/2021] [Accepted: 04/30/2021] [Indexed: 12/04/2022]
Abstract
Crop phenotypic data underpin many pre-breeding efforts to characterize variation within germplasm collections. Although there has been an increase in the global capacity for accumulating and comparing such data, a lack of consistency in the systematic description of metadata often limits integration and sharing. We therefore aimed to understand some of the challenges facing findable, accesible, interoperable and reusable (FAIR) curation and annotation of phenotypic data from minor and underutilized crops. We used bambara groundnut (Vigna subterranea) as an exemplar underutilized crop to assess the ability of the Crop Ontology system to facilitate curation of trait datasets, so that they are accessible for comparative analysis. This involved generating a controlled vocabulary Trait Dictionary of 134 terms. Systematic quantification of syntactic and semantic cohesiveness of the full set of 28 crop-specific COs identified inconsistencies between trait descriptor names, a relative lack of cross-referencing to other ontologies and a flat ontological structure for classifying traits. We also evaluated the Minimal Information About a Phenotyping Experiment and FAIR compliance of bambara trait datasets curated within the CropStoreDB schema. We discuss specifications for a more systematic and generic approach to trait controlled vocabularies, which would benefit from representation of terms that adhere to Open Biological and Biomedical Ontologies principles. In particular, we focus on the benefits of reuse of existing definitions within pre- and post-composed axioms from other domains in order to facilitate the curation and comparison of datasets from a wider range of crops. Database URL: https://www.cropstoredb.org/cs_bambara.html.
Collapse
Affiliation(s)
- Liliana Andrés-Hernández
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Razlin Azman Halimi
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Ramil Mauleon
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Sean Mayes
- School of Biosciences, University of Nottingham, Sutton Bonington, Leicestershire, LE12 5RD,Nottingham, Nottingham, UK
| | - Abdul Baten
- Institute of Precision Medicine & Bioinformatics, Sydney Local Health District, Royal Prince Alfred Hospital, Missenden Road, Camperdown, NSW 2050, Australia
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| |
Collapse
|
22
|
Williamson HF, Brettschneider J, Caccamo M, Davey RP, Goble C, Kersey PJ, May S, Morris RJ, Ostler R, Pridmore T, Rawlings C, Studholme D, Tsaftaris SA, Leonelli S. Data management challenges for artificial intelligence in plant and agricultural research. F1000Res 2021; 10:324. [PMID: 36873457 PMCID: PMC9975417 DOI: 10.12688/f1000research.52204.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
Artificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain.
Collapse
Affiliation(s)
- Hugh F Williamson
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK
| | | | - Mario Caccamo
- NIAB, National Research Institute of Brewing, East Malling, UK
| | | | - Carole Goble
- Department of Computer Science, University of Manchester, Manchester, UK
| | | | - Sean May
- School of Biosciences, University of Nottingham, Loughborough, UK
| | | | - Richard Ostler
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | - Tony Pridmore
- School of Computer Science, University of Nottingham, Nottingham, UK
| | - Chris Rawlings
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpendem, UK
| | | | - Sotirios A Tsaftaris
- Institute of Digital Communications, University of Edinburgh, Edinburgh, UK.,Alan Turing Institute, London, UK
| | - Sabina Leonelli
- Exeter Centre for the Study of the Life Sciences & Institute for Data Science and Artificial Intelligence, University of Exeter, Exeter, UK.,Alan Turing Institute, London, UK
| |
Collapse
|