1
|
Weaver WN, Ruhfel BR, Lough KJ, Smith SA. Herbarium specimen label transcription reimagined with large language models: Capabilities, productivity, and risks. Am J Bot 2023; 110:e16256. [PMID: 37938801 DOI: 10.1002/ajb2.16256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 11/10/2023]
Affiliation(s)
- William N Weaver
- University of Michigan, Department of Ecology and Evolutionary Biology, 1105 N University Ave, Ann Arbor, 48109, MI, USA
| | - Brad R Ruhfel
- University of Michigan, Department of Ecology and Evolutionary Biology, 1105 N University Ave, Ann Arbor, 48109, MI, USA
- University of Michigan Herbarium, Research Museums Center, 3600 Varsity Drive, Ann Arbor, 48108, MI, USA
| | - Kyle J Lough
- University of Michigan, Department of Ecology and Evolutionary Biology, 1105 N University Ave, Ann Arbor, 48109, MI, USA
- University of Michigan Herbarium, Research Museums Center, 3600 Varsity Drive, Ann Arbor, 48108, MI, USA
| | - Stephen A Smith
- University of Michigan, Department of Ecology and Evolutionary Biology, 1105 N University Ave, Ann Arbor, 48109, MI, USA
- University of Michigan Herbarium, Research Museums Center, 3600 Varsity Drive, Ann Arbor, 48108, MI, USA
| |
Collapse
|
2
|
Dove S, Böhm M, Freeman R, McRae L, Murrell DJ. Quantifying reliability and data deficiency in global vertebrate population trends using the Living Planet Index. Glob Chang Biol 2023; 29:4966-4982. [PMID: 37376728 DOI: 10.1111/gcb.16841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 04/15/2023] [Indexed: 06/29/2023]
Abstract
Global biodiversity is facing a crisis, which must be solved through effective policies and on-the-ground conservation. But governments, NGOs, and scientists need reliable indicators to guide research, conservation actions, and policy decisions. Developing reliable indicators is challenging because the data underlying those tools is incomplete and biased. For example, the Living Planet Index tracks the changing status of global vertebrate biodiversity, but taxonomic, geographic and temporal gaps and biases are present in the aggregated data used to calculate trends. However, without a basis for real-world comparison, there is no way to directly assess an indicator's accuracy or reliability. Instead, a modelling approach can be used. We developed a model of trend reliability, using simulated datasets as stand-ins for the "real world", degraded samples as stand-ins for indicator datasets (e.g., the Living Planet Database), and a distance measure to quantify reliability by comparing partially sampled to fully sampled trends. The model revealed that the proportion of species represented in the database is not always indicative of trend reliability. Important factors are the number and length of time series, as well as their mean growth rates and variance in their growth rates, both within and between time series. We found that many trends in the Living Planet Index need more data to be considered reliable, particularly trends across the global south. In general, bird trends are the most reliable, while reptile and amphibian trends are most in need of additional data. We simulated three different solutions for reducing data deficiency, and found that collating existing data (where available) is the most efficient way to improve trend reliability, whereas revisiting previously studied populations is a quick and efficient way to improve trend reliability until new long-term studies can be completed and made available.
Collapse
Affiliation(s)
- Shawn Dove
- Centre for Biodiversity and Environment Research, University College London, London, UK
- Institute of Zoology, Zoological Society of London, London, UK
| | - Monika Böhm
- Institute of Zoology, Zoological Society of London, London, UK
- Global Center for Species Survival, Indianapolis Zoo, Indianapolis, Indiana, USA
| | - Robin Freeman
- Institute of Zoology, Zoological Society of London, London, UK
| | - Louise McRae
- Institute of Zoology, Zoological Society of London, London, UK
| | - David J Murrell
- Centre for Biodiversity and Environment Research, University College London, London, UK
| |
Collapse
|
3
|
Powell C, Shaw J. Performant barcode decoding for herbarium specimen images using vector-assisted region proposals (VARP). Appl Plant Sci 2021; 9:APS311436. [PMID: 34141497 PMCID: PMC8202828 DOI: 10.1002/aps3.11436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 05/06/2021] [Indexed: 06/12/2023]
Abstract
PREMISE The scale and associated costs of herbarium digitization make process automation appealing. One such process for many workflows is the association of specimen image files with barcode values stored with the specimen. Here, an innovation is presented that improves the speed and accuracy of decoding barcodes from specimen images. METHODS AND RESULTS Geometric features common in barcodes are used to identify the regions of specimen images that are likely to contain a barcode. The proposed regions are then combined into a significantly reduced composite image that is decoded using traditional barcode reading libraries. Tested against existing solutions, this method demonstrated the highest success rate (96.5%) and the second fastest processing time (617 ms). CONCLUSIONS This method was developed to support a larger effort to automate specimen image post-processing in real-time, highlighting the importance of execution time. Although initially designed for herbarium digitization, this method may be useful for other high-resolution applications.
Collapse
Affiliation(s)
- Caleb Powell
- Department of Biology, Geology and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Joey Shaw
- Department of Biology, Geology and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| |
Collapse
|
4
|
Powell C, Krakowiak A, Fuller R, Rylander E, Gillespie E, Krosnick S, Ruhfel B, Morris AB, Shaw J. Estimating herbarium specimen digitization rates: Accounting for human experience. Appl Plant Sci 2021; 9:e11415. [PMID: 33968496 PMCID: PMC8085955 DOI: 10.1002/aps3.11415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 02/02/2021] [Indexed: 06/12/2023]
Abstract
PREMISE Herbaria are invaluable sources for understanding the natural world, and in recent years there has been a concerted effort to digitize these collections. To organize such efforts, a method for estimating the necessary labor is desired. This work analyzes digitization productivity reports of 105 participants from eight herbaria, deriving generalized labor estimates that account for human experience. METHODS AND RESULTS Individuals' rates of digitization were grouped based on cumulative time performing each task and then used to estimate a series of generalized labor projection models. In most cases, productivity was shown to improve with experience, suggesting longer technician retention can reduce labor requirements by 20%. CONCLUSIONS Using student labor is a common tactic for digitization efforts, and the resulting outreach exposes future professionals to natural history collections. However, overcoming the learning curve should be considered when estimating the labor necessary to digitize a collection.
Collapse
Affiliation(s)
- Caleb Powell
- Department of Biology, Geology, and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Alaina Krakowiak
- Department of Biology, Geology, and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Rachel Fuller
- Department of Biology, Geology, and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Erica Rylander
- Department of Biology, Geology, and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Emily Gillespie
- Department of Biological SciencesButler University4600 Sunset AvenueIndianapolisIndiana46208USA
| | - Shawn Krosnick
- Department of BiologyTennessee Tech University1 William L. Jones DriveCookevilleTennessee38505USA
| | - Brad Ruhfel
- University of Michigan HerbariumUniversity of Michigan3600 Varsity DriveAnn ArborMichigan48108USA
| | - Ashley B. Morris
- Department of BiologyFurman University3300 Poinsett HighwayGreenvilleSouth Carolina29613USA
| | - Joey Shaw
- Department of Biology, Geology, and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| |
Collapse
|
5
|
Abstract
Centralized biodiversity data aggregation is too often failing societal needs due to pervasive and systemic data quality deficiencies. We argue for a novel approach that embodies the spirit of the Web (“small pieces loosely joined”) through the decentralized coordination of data across scientific languages and communities. The upfront cost of decentralization can be offset by the long-term benefit of achieving sustained expert engagement, higher-quality data products, and ultimately more societal impact for biodiversity data. Our decentralized approach encourages the emergence and evolution of multiple self-identifying communities of practice that are regionally, taxonomically, or institutionally localized. Each community is empowered to control the social and informational design and versioning of their local data infrastructures and signals. With no single aggregator to exert centralized control over biodiversity data, decentralization generates loosely connected networks of mid-level aggregators. Global coordination is nevertheless feasible through automatable data sharing agreements that enable efficient propagation and translation of biodiversity data across communities. The decentralized model also poses novel integration challenges, among which the explicit and continuous articulation of conflicting systematic classifications and phylogenies remain the most challenging. We discuss the development of available solutions, challenges, and outline next steps: the global effort of coordination should focus on developing shared languages for data signal translation, as opposed to homogenizing the data signal itself.
Collapse
Affiliation(s)
- Beckett W Sterner
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Edward E Gilbert
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Nico M Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| |
Collapse
|
6
|
Martellos S, Bartolucci F, Conti F, Galasso G, Moro A, Pennesi R, Peruzzi L, Pittao E, Nimis PL. FlorItaly - the portal to the Flora of Italy. PhytoKeys 2020; 156:55-71. [PMID: 32913408 PMCID: PMC7455585 DOI: 10.3897/phytokeys.156.54023] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 06/23/2020] [Indexed: 05/27/2023]
Abstract
Digital data concerning the flora of Italy are largely fragmented among different resources hosted on different platforms, and often with different data standards, which are neither connected by a common access point, nor by web services, thus constituting a relevant obstacle to data access and usage. Taxonomic incongruences add a further complication. This paper describes "FlorItaly", an online information system which allows to access and query updated information on the checklist of the flora of Italy, aiming at becoming an aggregator for Italian botanical resources. "FlorItaly" was developed in a collaborative effort by more than 50 taxonomists, with the support of the Italian Botanical Society, and of Project "Dryades" (University of Trieste), to provide a better and reliable organization of botanical knowledge in Italy, as well as a relevant simplification for data retrieval, and a further stimulus towards a more collaborative approach in botanical research.
Collapse
Affiliation(s)
- Stefano Martellos
- Department of Life Sciences, University of Trieste, Trieste, ItalyUniversity of TriesteTriesteItaly
| | | | - Fabio Conti
- University of Camerino, Camerino, ItalyUniversity of CamerinoCamerinoItaly
| | - Gabriele Galasso
- Museo di Storia Naturale di Milano, Milan, ItalyMuseo di Storia Naturale di MilanoMilanoItaly
| | - Andrea Moro
- Department of Life Sciences, University of Trieste, Trieste, ItalyUniversity of TriesteTriesteItaly
| | - Riccardo Pennesi
- University of Camerino, Camerino, ItalyUniversity of CamerinoCamerinoItaly
| | | | - Elena Pittao
- Department of Life Sciences, University of Trieste, Trieste, ItalyUniversity of TriesteTriesteItaly
| | - Pier Luigi Nimis
- Department of Life Sciences, University of Trieste, Trieste, ItalyUniversity of TriesteTriesteItaly
| |
Collapse
|
7
|
Rowley JJL, Callaghan CT. The FrogID dataset: expert-validated occurrence records of Australia's frogs collected by citizen scientists. Zookeys 2020; 912:139-151. [PMID: 32123502 PMCID: PMC7040047 DOI: 10.3897/zookeys.912.38253] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Accepted: 01/09/2020] [Indexed: 11/12/2022] Open
Abstract
This dataset represents expert-validated occurrence records of calling frogs across Australia collected via the national citizen science project FrogID (http://www.frogid.net.au). FrogID relies on participants recording calling frogs using smartphone technology, after which point the frogs are identified by expert validators, resulting in a database of georeferenced frog species records. This dataset represents one full year of the project (10 November 2017-9 November 2018), including 54,864 records of 172 species, 71% of the known frog species in Australia. This is the first instalment of the dataset, and we anticipate providing updated datasets on an annual basis.
Collapse
Affiliation(s)
- Jodi J L Rowley
- Australian Museum Research Institute, Australian Museum, 1 William Street, Sydney, New South Wales 2010, Australia Australian Museum Research Institute Sydney Australia.,Centre for Ecosystem Science, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia University of New South Wales Sydney Australia
| | - Corey T Callaghan
- Centre for Ecosystem Science, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia University of New South Wales Sydney Australia
| |
Collapse
|
8
|
Hackett RA, Belitz MW, Gilbert EE, Monfils AK. A data management workflow of biodiversity data from the field to data users. Appl Plant Sci 2019; 7:e11310. [PMID: 31890356 PMCID: PMC6923704 DOI: 10.1002/aps3.11310] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 10/21/2019] [Indexed: 06/10/2023]
Abstract
PREMISE Heterogeneity of biodiversity data from the collections, research, and management communities presents challenges for data findability, accessibility, interoperability, and reusability. Workflows designed with data collection, standards, dissemination, and reuse in mind will generate better information across geopolitical, administrative, and institutional boundaries. Here, we present our data workflow as a case study of how we collected, shared, and used data from multiple sources. METHODS In 2012, we initiated the collection of biodiversity data relating to Michigan prairie fens, including data on plant communities and the federally endangered Poweshiek skipperling (Oarisma poweshiek). RESULTS Over 23,000 occurrence records were compiled in a database following Darwin Core standards. The records were linked with media and biological, chemical, and geometric measurements. We published the data as Global Biodiversity Information Facility data sets and in Symbiota SEINet portals. DISCUSSION We highlight data collection techniques that optimized transcription time, including the use of predetermined and controlled vocabulary, Darwin Core terms, and data dictionaries. The validity and longevity of our data were supported by voucher specimens, metadata with measurement records, and published manuscripts detailing methods and data sets. Key to our data dissemination was cooperation among partners and the utilization of dynamic tools. To increase data interoperability, we need flexible and customizable data collection templates, coding, and enhanced communication among communities using biodiversity data.
Collapse
Affiliation(s)
- Rachel A. Hackett
- Department of BiologyInstitute for Great Lakes ResearchCentral Michigan UniversityBioscience Building 2100, 1455 Calumet CourtMount PleasantMichigan48859USA
- Michigan Natural Features InventoryMichigan State University ExtensionP.O. Box 13036LansingMichigan48901‐3036USA
| | - Michael W. Belitz
- Department of BiologyInstitute for Great Lakes ResearchCentral Michigan UniversityBioscience Building 2100, 1455 Calumet CourtMount PleasantMichigan48859USA
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFlorida32611USA
| | | | - Anna K. Monfils
- Department of BiologyInstitute for Great Lakes ResearchCentral Michigan UniversityBioscience Building 2100, 1455 Calumet CourtMount PleasantMichigan48859USA
| |
Collapse
|
9
|
Powell C, Motley J, Qin H, Shaw J. A born-digital field-to-database solution for collections-based research using collNotes and collBook. Appl Plant Sci 2019; 7:e11284. [PMID: 31467807 PMCID: PMC6711348 DOI: 10.1002/aps3.11284] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 07/16/2019] [Indexed: 06/01/2023]
Abstract
PREMISE The digitization of natural history collections includes transcribing specimen label data into standardized formats. Born-digital specimen data initially gathered in digital formats do not need to be transcribed, enabling their efficient integration into digitized collections. Modernizing field collection methods for born-digital workflows requires the development of new tools and processes. METHODS AND RESULTS collNotes, a mobile application, was developed for Android and iOS to supplement traditional field journals. Designed for efficiency in the field, collNotes avoids redundant data entries and does not require cellular service. collBook, a companion desktop application, refines field notes into database-ready formats and produces specimen labels. CONCLUSIONS collNotes and collBook can be used in combination as a field-to-database solution for gathering born-digital voucher specimen data for plants and fungi. Both programs are open source and use common file types simplifying either program's integration into existing workflows.
Collapse
Affiliation(s)
- Caleb Powell
- Department of BiologyGeology and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Jacob Motley
- Computational Science and EngineeringGeorgia Institute of Technology Online84 5th St NWAtlantaGeorgia30308USA
| | - Hong Qin
- Department of BiologyGeology and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| | - Joey Shaw
- Department of BiologyGeology and Environmental ScienceUniversity of Tennessee at Chattanooga615 McCallie AvenueChattanoogaTennessee37403USA
| |
Collapse
|
10
|
Hobern D, Baptiste B, Copas K, Guralnick R, Hahn A, van Huis E, Kim ES, McGeoch M, Naicker I, Navarro L, Noesgaard D, Price M, Rodrigues A, Schigel D, Sheffield CA, Wieczorek J. Connecting data and expertise: a new alliance for biodiversity knowledge. Biodivers Data J 2019; 7:e33679. [PMID: 30886531 PMCID: PMC6420472 DOI: 10.3897/bdj.7.e33679] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 03/04/2019] [Indexed: 11/12/2022] Open
Abstract
There has been major progress over the last two decades in digitising historical knowledge of biodiversity and in making biodiversity data freely and openly accessible. Interlocking efforts bring together international partnerships and networks, national, regional and institutional projects and investments and countless individual contributors, spanning diverse biological and environmental research domains, government agencies and non-governmental organisations, citizen science and commercial enterprise. However, current efforts remain inefficient and inadequate to address the global need for accurate data on the world's species and on changing patterns and trends in biodiversity. Significant challenges include imbalances in regional engagement in biodiversity informatics activity, uneven progress in data mobilisation and sharing, the lack of stable persistent identifiers for data records, redundant and incompatible processes for cleaning and interpreting data and the absence of functional mechanisms for knowledgeable experts to curate and improve data. Recognising the need for greater alignment between efforts at all scales, the Global Biodiversity Information Facility (GBIF) convened the second Global Biodiversity Informatics Conference (GBIC2) in July 2018 to propose a coordination mechanism for developing shared roadmaps for biodiversity informatics. GBIC2 attendees reached consensus on the need for a global alliance for biodiversity knowledge, learning from examples such as the Global Alliance for Genomics and Health (GA4GH) and the open software communities under the Apache Software Foundation. These initiatives provide models for multiple stakeholders with decentralised funding and independent governance to combine resources and develop sustainable solutions that address common needs. This paper summarises the GBIC2 discussions and presents a set of 23 complementary ambitions to be addressed by the global community in the context of the proposed alliance. The authors call on all who are responsible for describing and monitoring natural systems, all who depend on biodiversity data for research, policy or sustainable environmental management and all who are involved in developing biodiversity informatics solutions to register interest at https://biodiversityinformatics.org/ and to participate in the next steps to establishing a collaborative alliance. The supplementary materials include brochures in a number of languages (English, Arabic, Spanish, Basque, French, Japanese, Dutch, Portuguese, Russian, Traditional Chinese and Simplified Chinese). These summarise the need for an alliance for biodiversity knowledge and call for collaboration in its establishment.
Collapse
Affiliation(s)
- Donald Hobern
- Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark Global Biodiversity Information Facility Secretariat Copenhagen Denmark
| | - Brigitte Baptiste
- Instituto de Investigación de Recursos Biológicos Alexander von Humboldt, Bogotá, Colombia Instituto de Investigación de Recursos Biológicos Alexander von Humboldt Bogotá Colombia
| | - Kyle Copas
- Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark Global Biodiversity Information Facility Secretariat Copenhagen Denmark
| | - Robert Guralnick
- Vertnet, Florida, United States of America Vertnet Florida United States of America.,University of Colorado, Boulder; University of Colorado Museum of Natural History, Boulder, United States of America University of Colorado, Boulder; University of Colorado Museum of Natural History Boulder United States of America.,Univ. of Florida, Gainesville, United States of America Univ. of Florida Gainesville United States of America
| | - Andrea Hahn
- Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark Global Biodiversity Information Facility Secretariat Copenhagen Denmark
| | - Edwin van Huis
- Naturalis, Amsterdam, Netherlands Naturalis Amsterdam Netherlands
| | - Eun-Shik Kim
- Kookmin University, Seoul, South Korea Kookmin University Seoul South Korea
| | - Melodie McGeoch
- Monash University, Clayton, Australia Monash University Clayton Australia
| | - Isayvani Naicker
- African Academy of Sciences, Nairobi, Kenya African Academy of Sciences Nairobi Kenya
| | - Laetitia Navarro
- German Centre for Integrative Biodiversity Research, Leipzig, Germany German Centre for Integrative Biodiversity Research Leipzig Germany
| | - Daniel Noesgaard
- Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark Global Biodiversity Information Facility Secretariat Copenhagen Denmark
| | - Michelle Price
- Conservatoire et Jardin botaniques de la Ville de Genève, Geneva, Switzerland Conservatoire et Jardin botaniques de la Ville de Genève Geneva Switzerland
| | - Andrew Rodrigues
- Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark Global Biodiversity Information Facility Secretariat Copenhagen Denmark
| | - Dmitry Schigel
- Global Biodiversity Information Facility Secretariat, Copenhagen, Denmark Global Biodiversity Information Facility Secretariat Copenhagen Denmark
| | - Carolyn A Sheffield
- Smithsonian Libraries/Biodiversity Heritage Library, Washington, DC, United States of America Smithsonian Libraries/Biodiversity Heritage Library Washington, DC United States of America
| | - John Wieczorek
- VertNet, Bariloche, Argentina VertNet Bariloche Argentina.,Museum of Vertebrate Zoology, University of California, Berkeley, United States of America Museum of Vertebrate Zoology, University of California Berkeley United States of America
| |
Collapse
|
11
|
Muñoz G, Kissling WD, van Loon EE. Biodiversity Observations Miner: A web application to unlock primary biodiversity data from published literature. Biodivers Data J 2019:e28737. [PMID: 30692868 PMCID: PMC6344444 DOI: 10.3897/bdj.7.e28737] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 12/19/2018] [Indexed: 11/28/2022] Open
Abstract
Background A considerable portion of primary biodiversity data is digitally locked inside published literature which is often stored as pdf files. Large-scale approaches to biodiversity science could benefit from retrieving this information and making it digitally accessible and machine-readable. Nonetheless, the amount and diversity of digitally published literature pose many challenges for knowledge discovery and retrieval. Text mining has been extensively used for data discovery tasks in large quantities of documents. However, text mining approaches for knowledge discovery and retrieval have been limited in biodiversity science compared to other disciplines. New information Here, we present a novel, open source text mining tool, the Biodiversity Observations Miner (BOM). This web application, written in R, allows the semi-automated discovery of punctual biodiversity observations (e.g. biotic interactions, functional or behavioural traits and natural history descriptions) associated with the scientific names present inside a corpus of scientific literature. Furthermore, BOM enable users the rapid screening of large quantities of literature based on word co-occurrences that match custom biodiversity dictionaries. This tool aims to increase the digital mobilisation of primary biodiversity data and is freely accessible via GitHub or through a web server.
Collapse
Affiliation(s)
- Gabriel Muñoz
- NASUA, Biodiversity research and conservation section, Quito, Ecuador NASUA, Biodiversity research and conservation section Quito Ecuador.,Faculty of Arts and Science, Department of Biology, Concordia University, Montreal, Canada Faculty of Arts and Science, Department of Biology, Concordia University Montreal Canada
| | - W Daniel Kissling
- Faculty of Science, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, Netherlands Faculty of Science, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam Amsterdam Netherlands
| | - E Emiel van Loon
- Faculty of Science, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, Netherlands Faculty of Science, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam Amsterdam Netherlands
| |
Collapse
|
12
|
Peterson AT, Asase A, Canhos DAL, de Souza S, Wieczorek J. Data Leakage and Loss in Biodiversity Informatics. Biodivers Data J 2018:e26826. [PMID: 30473617 PMCID: PMC6235996 DOI: 10.3897/bdj.6.e26826] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Accepted: 10/17/2018] [Indexed: 11/12/2022] Open
Abstract
The field of biodiversity informatics is in a massive, "grow-out" phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data "leakage" or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.
Collapse
Affiliation(s)
- A Townsend Peterson
- Biodiversity Institute, University of Kansas, Lawrence, United States of America Biodiversity Institute, University of Kansas Lawrence United States of America
| | - Alex Asase
- University of Ghana, Accra, Ghana University of Ghana Accra Ghana
| | | | | | - John Wieczorek
- Museum of Vertebrate Zoology, University of California, Berkeley, United States of America Museum of Vertebrate Zoology, University of California Berkeley United States of America
| |
Collapse
|
13
|
James SA, Soltis PS, Belbin L, Chapman AD, Nelson G, Paul DL, Collins M. Herbarium data: Global biodiversity and societal botanical needs for novel research. Appl Plant Sci 2018; 6:e1024. [PMID: 29732255 PMCID: PMC5851569 DOI: 10.1002/aps3.1024] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 12/30/2017] [Indexed: 05/11/2023]
Abstract
Building on centuries of research based on herbarium specimens gathered through time and around the globe, a new era of discovery, synthesis, and prediction using digitized collections data has begun. This paper provides an overview of how aggregated, open access botanical and associated biological, environmental, and ecological data sets, from genes to the ecosystem, can be used to document the impacts of global change on communities, organisms, and society; predict future impacts; and help to drive the remediation of change. Advocacy for botanical collections and their expansion is needed, including ongoing digitization and online publishing. The addition of non-traditional digitized data fields, user annotation capability, and born-digital field data collection enables the rapid access of rich, digitally available data sets for research, education, informed decision-making, and other scholarly and creative activities. Researchers are receiving enormous benefits from data aggregators including the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), the Atlas of Living Australia (ALA), and the Biodiversity Heritage Library (BHL), but effective collaboration around data infrastructures is needed when working with large and disparate data sets. Tools for data discovery, visualization, analysis, and skills training are increasingly important for inspiring novel research that improves the intrinsic value of physical and digital botanical collections.
Collapse
Affiliation(s)
- Shelley A. James
- National Herbarium of New South WalesRoyal Botanic Gardens and Domain TrustMrs Macquaries RoadSydneyNew South Wales2000Australia
| | - Pamela S. Soltis
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFlorida32611USA
| | - Lee Belbin
- Atlas of Living AustraliaCSIROClunies Ross StreetActonAustralia Capital Territory2601Australia
| | - Arthur D. Chapman
- Australian Biodiversity Information ServicesBallanVictoria3342Australia
| | - Gil Nelson
- iDigBioFlorida State UniversityTallahasseeFlorida32306USA
| | | | - Matthew Collins
- Advanced Computing and Information SystemsUniversity of FloridaGainesvilleFlorida32611USA
| |
Collapse
|
14
|
Tessarolo G, Ladle R, Rangel T, Hortal J. Temporal degradation of data limits biodiversity research. Ecol Evol 2017; 7:6863-6870. [PMID: 28904766 PMCID: PMC5587493 DOI: 10.1002/ece3.3259] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 06/14/2017] [Accepted: 06/25/2017] [Indexed: 11/09/2022] Open
Abstract
Spatial and/or temporal biases in biodiversity data can directly influence the utility, comparability, and reliability of ecological and evolutionary studies. While the effects of biased spatial coverage of biodiversity data are relatively well known, temporal variation in data quality (i.e., the congruence between recorded and actual information) has received much less attention. Here, we develop a conceptual framework for understanding the influence of time on biodiversity data quality based on three main processes: (1) the natural dynamics of ecological systems—such as species turnover or local extinction; (2) periodic taxonomic revisions, and; (3) the loss of physical and metadata due to inefficient curation, accidents, or funding shortfalls. Temporal decay in data quality driven by these three processes has fundamental consequences for the usage and comparability of data collected in different time periods. Data decay can be partly ameliorated by adopting standard protocols for generation, storage, and sharing data and metadata. However, some data degradation is unavoidable due to natural variations in ecological systems. Consequently, changes in biodiversity data quality over time need be carefully assessed and, if possible, taken into account when analyzing aging datasets.
Collapse
Affiliation(s)
- Geiziane Tessarolo
- Departamento de Ecologia Instituto de Ciências Biológicas Universidade Federal de Goiás Goiânia Brazil.,Programa de Pós-graduação em Recursos Naturais do Cerrado Universidade Estadual de Goiás Anápolis Brazil
| | - Richard Ladle
- ICBS Universidade Federal de Alagoas Maceió Brazil.,School of Geography and the Environment University of Oxford Oxford UK
| | - Thiago Rangel
- Departamento de Ecologia Instituto de Ciências Biológicas Universidade Federal de Goiás Goiânia Brazil
| | - Joaquin Hortal
- Departamento de Ecologia Instituto de Ciências Biológicas Universidade Federal de Goiás Goiânia Brazil.,Departamento de Biogeografía y Cambio Global Museo Nacional de Ciencias Naturales (MNCN-CSIC) Madrid Spain
| |
Collapse
|
15
|
Cogălniceanu D, Rozylowicz L, Székely P, Samoilă C, Stănescu F, Tudor M, Székely D, Iosif R. Diversity and distribution of reptiles in Romania. Zookeys 2013:49-76. [PMID: 24146598 PMCID: PMC3800809 DOI: 10.3897/zookeys.341.5502] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2013] [Accepted: 09/29/2013] [Indexed: 11/12/2022] Open
Abstract
The reptile fauna of Romania comprises 23 species, out of which 12 species reach here the limit of their geographic range. We compiled and updated a national database of the reptile species occurrences from a variety of sources including our own field surveys, personal communication from specialists, museum collections and the scientific literature. The occurrence records were georeferenced and stored in a geodatabase for additional analysis of their spatial patterns. The spatial analysis revealed a biased sampling effort concentrated in various protected areas, and deficient in the vast agricultural areas of the southern part of Romania. The patterns of species richness showed a higher number of species in the warmer and drier regions, and a relatively low number of species in the rest of the country. Our database provides a starting point for further analyses, and represents a reliable tool for drafting conservation plans.
Collapse
Affiliation(s)
- Dan Cogălniceanu
- University Ovidius Constanţa, Faculty of Natural Sciences and Agricultural Sciences, Al. Universităţii nr. 1, corp B, 900470, Constanţa, Romania
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Cogălniceanu D, Székely P, Samoilă C, Ruben I, Tudor M, Plăiaşu R, Stănescu F, Rozylowicz L. Diversity and distribution of amphibians in Romania. Zookeys 2013:35-57. [PMID: 23794877 PMCID: PMC3689111 DOI: 10.3897/zookeys.296.4872] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2013] [Accepted: 04/22/2013] [Indexed: 11/12/2022] Open
Abstract
Nineteen species of amphibians inhabit Romania, 9 of which reach their range limit on this territory. Based on published occurrence reports, museum collections and our own data we compiled a national database of amphibian occurrences. We georeferenced 26779 amphibian species occurrences, and performed an analysis of their spatial patterns, checking for hotspots and patterns of species richness. The results of spatial statistic analyses supported the idea of a biased sampling for Romania, with clear hotspots of increased sampling efforts. The sampling effort is biased towards species with high detectability, protected areas, and large cities. Future sampling efforts should be focused mostly on species with a high rarity score in order to accurately map their range. Our results are an important step in achieving the long-term goals of increasing the efficiency of conservation efforts and evaluating the species range shifts under climate change scenarios.
Collapse
Affiliation(s)
- Dan Cogălniceanu
- University Ovidius Constanţa, Faculty of Natural Sciences and Agricultural Sciences, Al. Universităţii 1, corp B, Constanţa 900470, Romania
| | | | | | | | | | | | | | | |
Collapse
|