1
|
Sterner B, Elliott S, Gilbert EE, Franz NM. Unified and pluralistic ideals for data sharing and reuse in biodiversity. Database (Oxford) 2023; 2023:baad048. [PMID: 37465916 PMCID: PMC10354506 DOI: 10.1093/database/baad048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 05/30/2023] [Accepted: 06/27/2023] [Indexed: 07/20/2023]
Abstract
How should billions of species observations worldwide be shared and made reusable? Many biodiversity scientists assume the ideal solution is to standardize all datasets according to a single, universal classification and aggregate them into a centralized, global repository. This ideal has known practical and theoretical limitations, however, which justifies investigating alternatives. To support better community deliberation and normative evaluation, we develop a novel conceptual framework showing how different organizational models, regulative ideals and heuristic strategies are combined to form shared infrastructures supporting data reuse. The framework is anchored in a general definition of data pooling as an activity of making a taxonomically standardized body of information available for community reuse via digital infrastructure. We describe and illustrate unified and pluralistic ideals for biodiversity data pooling and show how communities may advance toward these ideals using different heuristic strategies. We present evidence for the strengths and limitations of the unification and pluralistic ideals based on systemic relationships of power, responsibility and benefit they establish among stakeholders, and we conclude the pluralistic ideal is better suited for biodiversity data.
Collapse
Affiliation(s)
- Beckett Sterner
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| | - Steve Elliott
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| | - Edward E Gilbert
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| | - Nico M Franz
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| |
Collapse
|
2
|
Juanes Cortés B, Vera-Ramos JA, Lovering RC, Gaudet P, Laegreid A, Logie C, Schulz S, Roldán-García MDM, Kuiper M, Fernández-Breis JT. Formalization of gene regulation knowledge using ontologies and gene ontology causal activity models. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2021; 1864:194766. [PMID: 34710644 DOI: 10.1016/j.bbagrm.2021.194766] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 09/13/2021] [Accepted: 10/11/2021] [Indexed: 02/02/2023]
Abstract
Gene regulation computational research requires handling and integrating large amounts of heterogeneous data. The Gene Ontology has demonstrated that ontologies play a fundamental role in biological data interoperability and integration. Ontologies help to express data and knowledge in a machine processable way, which enables complex querying and advanced exploitation of distributed data. Contributing to improve data interoperability in gene regulation is a major objective of the GREEKC Consortium, which aims to develop a standardized gene regulation knowledge commons. GREEKC proposes the use of ontologies and semantic tools for developing interoperable gene regulation knowledge models, which should support data annotation. In this work, we study how such knowledge models can be generated from cartoons of gene regulation scenarios. The proposed method consists of generating descriptions in natural language of the cartoons; extracting the entities from the texts; finding those entities in existing ontologies to reuse as much content as possible, especially from well known and maintained ontologies such as the Gene Ontology, the Sequence Ontology, the Relations Ontology and ChEBI; and implementation of the knowledge models. The models have been implemented using Protégé, a general ontology editor, and Noctua, the tool developed by the Gene Ontology Consortium for the development of causal activity models to capture more comprehensive annotations of genes and link their activities in a causal framework for Gene Ontology Annotations. We applied the method to two gene regulation scenarios and illustrate how to apply the models generated to support the annotation of data from research articles.
Collapse
Affiliation(s)
- Belén Juanes Cortés
- Departamento de Informatica y Sistemas, University of Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Campus de Espinardo, 30100 Murcia, Spain.
| | - José Antonio Vera-Ramos
- Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerpl. 2, Graz, Austria.
| | - Ruth C Lovering
- Institute of Cardiovascular Science, Faculty of Pop Health Sciences, University College London, Rayne Building, 5 University Street, London WC1E 6JF, United Kingdom.
| | - Pascale Gaudet
- Swiss Institute of Bioinformatics, 1, rue Michel Servet, 1211 Geneva 4, Switzerland.
| | - Astrid Laegreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Gastrosenteret, 431.03.046, Øya, Prinsesse Kristinas gate 1, Trondheim, Norway.
| | - Colin Logie
- Faculty of Science, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 28, 6525, GA, Nijmegen, the Netherlands.
| | - Stefan Schulz
- Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerpl. 2, Graz, Austria.
| | - María Del Mar Roldán-García
- Departamento de Lenguajes y Ciencias de la Computación, University of Málaga,Bulevard Louis Pasteur 35, 29071 Málaga, Spain; ITIS Software, University of Málaga, Calle Arquitecto Francisco Peñalosa s/n, 29071 Málaga,Spain; Biomedical Research Institute of Málaga (IBIMA), University of Málaga, Calle Doctor Miguel Díaz Recio, 28, 29010 Málaga, Spain.
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, Realfagbygget, Høgskoleringen 5, 7034 Trondheim, Norway.
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informatica y Sistemas, University of Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Campus de Espinardo, 30100 Murcia, Spain.
| |
Collapse
|
3
|
Sterner BW, Gilbert EE, Franz NM. Decentralized but Globally Coordinated Biodiversity Data. Front Big Data 2021; 3:519133. [PMID: 33693407 PMCID: PMC7931950 DOI: 10.3389/fdata.2020.519133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 08/31/2020] [Indexed: 11/22/2022] Open
Abstract
Centralized biodiversity data aggregation is too often failing societal needs due to pervasive and systemic data quality deficiencies. We argue for a novel approach that embodies the spirit of the Web (“small pieces loosely joined”) through the decentralized coordination of data across scientific languages and communities. The upfront cost of decentralization can be offset by the long-term benefit of achieving sustained expert engagement, higher-quality data products, and ultimately more societal impact for biodiversity data. Our decentralized approach encourages the emergence and evolution of multiple self-identifying communities of practice that are regionally, taxonomically, or institutionally localized. Each community is empowered to control the social and informational design and versioning of their local data infrastructures and signals. With no single aggregator to exert centralized control over biodiversity data, decentralization generates loosely connected networks of mid-level aggregators. Global coordination is nevertheless feasible through automatable data sharing agreements that enable efficient propagation and translation of biodiversity data across communities. The decentralized model also poses novel integration challenges, among which the explicit and continuous articulation of conflicting systematic classifications and phylogenies remain the most challenging. We discuss the development of available solutions, challenges, and outline next steps: the global effort of coordination should focus on developing shared languages for data signal translation, as opposed to homogenizing the data signal itself.
Collapse
Affiliation(s)
- Beckett W Sterner
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Edward E Gilbert
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Nico M Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|