2
|
Deck J, Guralnick R, Walls R, Blum S, Haendel M, Matsunaga A, Wieczorek J. Meeting report: Identifying practical applications of ontologies for biodiversity informatics. Stand Genomic Sci 2015. [PMCID: PMC4511409 DOI: 10.1186/s40793-015-0014-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This report describes the outcomes of a recent workshop, building on a series of workshops from the last three years with the goal if integrating genomics and biodiversity research, with a more specific goal here to express terms in Darwin Core and Audubon Core, where class constructs have been historically underspecified, into a Biological Collections Ontology (BCO) framework. For the purposes of this workshop, the BCO provided the context for fully defining classes as well as object and data properties, including domain and range information, for both the Darwin Core and Audubon Core. In addition, the workshop participants reviewed technical specifications and approaches for annotating instance data with BCO terms. Finally, we laid out proposed activities for the next 3 to 18 months to continue this work.
Collapse
|
3
|
Guralnick RP, Cellinese N, Deck J, Pyle RL, Kunze J, Penev L, Walls R, Hagedorn G, Agosti D, Wieczorek J, Catapano T, Page RDM. Community next steps for making globally unique identifiers work for biocollections data. Zookeys 2015:133-54. [PMID: 25901117 PMCID: PMC4400380 DOI: 10.3897/zookeys.494.9352] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2015] [Accepted: 03/17/2015] [Indexed: 11/12/2022] Open
Abstract
Biodiversity data is being digitized and made available online at a rapidly increasing rate but current practices typically do not preserve linkages between these data, which impedes interoperation, provenance tracking, and assembly of larger datasets. For data associated with biocollections, the biodiversity community has long recognized that an essential part of establishing and preserving linkages is to apply globally unique identifiers at the point when data are generated in the field and to persist these identifiers downstream, but this is seldom implemented in practice. There has neither been coalescence towards one single identifier solution (as in some other domains), nor even a set of recommended best practices and standards to support multiple identifier schemes sharing consistent responses. In order to further progress towards a broader community consensus, a group of biocollections and informatics experts assembled in Stockholm in October 2014 to discuss community next steps to overcome current roadblocks. The workshop participants divided into four groups focusing on: identifier practice in current field biocollections; identifier application for legacy biocollections; identifiers as applied to biodiversity data records as they are published and made available in semantically marked-up publications; and cross-cutting identifier solutions that bridge across these domains. The main outcome was consensus on key issues, including recognition of differences between legacy and new biocollections processes, the need for identifier metadata profiles that can report information on identifier persistence missions, and the unambiguous indication of the type of object associated with the identifier. Current identifier characteristics are also summarized, and an overview of available schemes and practices is provided.
Collapse
Affiliation(s)
- Robert P Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611-2710 USA
| | - Nico Cellinese
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611-2710 USA
| | - John Deck
- Berkeley Natural History Museums, University of California, Berkeley, California, USA
| | - Richard L Pyle
- Department of Natural Sciences, Bernice P. Bishop Museum, Honolulu, HI USA 96817
| | - John Kunze
- California Digital Library, University of California Office of the President, Oakland, CA USA
| | - Lyubomir Penev
- Institute of Biodiversity and Ecosystem Research, Bulgarian Academy of Sciences, and Pensoft Publishers, Sofia, Bulgaria
| | - Ramona Walls
- iPlant Collaborative, University of Arizona,Tucson, AZ 85721
| | - Gregor Hagedorn
- Museum für Naturkunde, Leibniz-Institut für Evolutions- und Biodiversitätsforschung, Invalidenstraße 43, 10115 Berlin, Germany
| | | | - John Wieczorek
- Museum of Vertebrate Zoology, University of California, Berkeley, CA USA. United States of America. 94720-3160
| | | | - Roderic D M Page
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow Glasgow, G12 8QQ. UK
| |
Collapse
|
4
|
Walls RL, Guralnick R, Deck J, Buntzman A, Buttigieg PL, Davies N, Denslow MW, Gallery RE, Parnell JJ, Osumi-Sutherland D, Robbins RJ, Rocca-Serra P, Wieczorek J, Zheng J. Meeting report: advancing practical applications of biodiversity ontologies. Stand Genomic Sci 2014. [PMCID: PMC4334987 DOI: 10.1186/1944-3277-9-17] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
We describe the outcomes of three recent workshops aimed at advancing development of the Biological Collections Ontology (BCO), the Population and Community Ontology (PCO), and tools to annotate data using those and other ontologies. The first workshop gathered use cases to help grow the PCO, agreed upon a format for modeling challenging concepts such as ecological niche, and developed ontology design patterns for defining collections of organisms and population-level phenotypes. The second focused on mapping datasets to ontology terms and converting them to Resource Description Framework (RDF), using the BCO. To follow-up, a BCO hackathon was held concurrently with the 16th Genomics Standards Consortium Meeting, during which we converted additional datasets to RDF, developed a Material Sample Core for the Global Biodiversity Information Framework, created a Web Ontology Language (OWL) file for importing Darwin Core classes and properties into BCO, and developed a workflow for converting biodiversity data among formats.
Collapse
|
5
|
Guralnick R, Conlin T, Deck J, Stucky BJ, Cellinese N. The trouble with triplets in biodiversity informatics: a data-driven case against current identifier practices. PLoS One 2014; 9:e114069. [PMID: 25470125 PMCID: PMC4254916 DOI: 10.1371/journal.pone.0114069] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 10/31/2014] [Indexed: 11/28/2022] Open
Abstract
The biodiversity informatics community has discussed aspirations and approaches for assigning globally unique identifiers (GUIDs) to biocollections for nearly a decade. During that time, and despite misgivings, the de facto standard identifier has become the “Darwin Core Triplet”, which is a concatenation of values for institution code, collection code, and catalog number associated with biocollections material. Our aim is not to rehash the challenging discussions regarding which GUID system in theory best supports the biodiversity informatics use case of discovering and linking digital data across the Internet, but how well we can link those data together at this moment, utilizing the current identifier schemes that have already been deployed. We gathered Darwin Core Triplets from a subset of VertNet records, along with vertebrate records from GenBank and the Barcode of Life Data System, in order to determine how Darwin Core Triplets are deployed “in the wild”. We asked if those triplets follow the recommended structure and whether they provide an easy and unambiguous means to track from specimen records to genetic sequence records. We show that Darwin Core Triplets are often riddled with semantic and syntactic errors when deployed and curated in practice, despite specifications about how to construct them. Our results strongly suggest that Darwin Core Triplets that have not been carefully curated are not currently serving a useful role for relinking data. We briefly consider needed next steps to overcome current limitations.
Collapse
Affiliation(s)
- Robert Guralnick
- Department of Ecology and Evolutionary Biology and the CU Museum of Natural History, University of Colorado, Boulder, Colorado, United States of America
- * E-mail:
| | - Tom Conlin
- CU Museum of Natural History, University of Colorado, Boulder, Colorado, United States of America
| | - John Deck
- Berkeley Natural History Museums, University of California, Berkeley, California, United States of America
| | - Brian J. Stucky
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America
| | - Nico Cellinese
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|