1
|
Patten NN, Gaynor ML, Soltis DE, Soltis PS. Geographic And Taxonomic Occurrence R-based Scrubbing (gatoRs): An R package and workflow for processing biodiversity data. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11575. [PMID: 38638614 PMCID: PMC11022233 DOI: 10.1002/aps3.11575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 01/07/2024] [Accepted: 01/14/2024] [Indexed: 04/20/2024]
Abstract
Premise Digitized biodiversity data offer extensive information; however, obtaining and processing biodiversity data can be daunting. Complexities arise during data cleaning, such as identifying and removing problematic records. To address these issues, we created the R package Geographic And Taxonomic Occurrence R-based Scrubbing (gatoRs). Methods and Results The gatoRs workflow includes functions that streamline downloading records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). We also created functions to clean downloaded specimen records. Unlike previous R packages, gatoRs accounts for differences in download structure between GBIF and iDigBio and allows for user control via interactive cleaning steps. Conclusions Our pipeline enables the scientific community to process biodiversity data efficiently and is accessible to the R coding novice. We anticipate that gatoRs will be useful for both established and beginning users. Furthermore, we expect our package will facilitate the introduction of biodiversity-related concepts into the classroom via the use of herbarium specimens.
Collapse
Affiliation(s)
- Natalie N. Patten
- Department of MathematicsUniversity of FloridaGainesville32611FloridaUSA
- Present address:
Department of MathematicsThe Ohio State UniversityColumbus43210OhioUSA
| | - Michelle L. Gaynor
- Florida Museum of Natural HistoryUniversity of FloridaGainesville32611FloridaUSA
- Department of BiologyUniversity of FloridaGainesville32611FloridaUSA
| | - Douglas E. Soltis
- Florida Museum of Natural HistoryUniversity of FloridaGainesville32611FloridaUSA
- Department of BiologyUniversity of FloridaGainesville32611FloridaUSA
| | - Pamela S. Soltis
- Florida Museum of Natural HistoryUniversity of FloridaGainesville32611FloridaUSA
| |
Collapse
|
2
|
Coca-de-la-Iglesia M, Gallego-Narbón A, Alonso A, Valcárcel V. High rate of species misidentification reduces the taxonomic certainty of European biodiversity databases of ivies (Hedera L.). Sci Rep 2024; 14:4876. [PMID: 38418501 PMCID: PMC10902322 DOI: 10.1038/s41598-024-54735-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/15/2024] [Indexed: 03/01/2024] Open
Abstract
The digitization of natural history specimens and the popularization of citizen science are creating an unprecedented availability of large amounts of biodiversity data. These biodiversity inventories can be severely affected by species misidentification, a source of taxonomic uncertainty that is rarely acknowledged in biodiversity data management. For these reasons, taxonomists debate the use of online repositories to address biological questions at the species level. Hedera L. (ivies) provides an excellent case study as it is well represented in both herbaria and online repositories with thousands of records likely to be affected by high taxonomic uncertainty. We analyze the sources and extent of taxonomic errors in the identification of the European ivy species by reviewing herbarium specimens and find a high misidentification rate (18% on average), which varies between species (maximized in H. hibernica: 55%; H. azorica: 48%; H. iberica: 36%) and regions (maximized in the UK: 38% and Spain: 27%). We find a systematic misidentification of all European ivies with H. helix behind the high misidentification rates in herbaria and warn of even higher rates in online records. We compile a spatial database to overcome the large discrepancies we observed in species distributions between online and morphologically reviewed records.
Collapse
Affiliation(s)
- Marina Coca-de-la-Iglesia
- Departamento de Biología, Universidad Autónoma de Madrid, 28049, Madrid, Spain
- TRAGSATEC, Madrid, Spain
| | | | - Alejandro Alonso
- Departamento de Biología, Universidad Autónoma de Madrid, 28049, Madrid, Spain
| | - Virginia Valcárcel
- Departamento de Biología, Universidad Autónoma de Madrid, 28049, Madrid, Spain.
- Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM), Universidad Autónoma de Madrid, 28049, Madrid, Spain.
| |
Collapse
|
3
|
Specker F, Paz A, Crowther TW, Maynard DS. Treemendous: an R package for integrating taxonomic information across backbones. PeerJ 2024; 12:e16896. [PMID: 38436026 PMCID: PMC10908262 DOI: 10.7717/peerj.16896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024] Open
Abstract
Standardizing and translating species names from different databases is key to the successful integration of data sources in biodiversity research. There are numerous taxonomic name-resolution applications that implement increasingly powerful name-cleaning and matching approaches, allowing the user to resolve species relative to multiple backbones simultaneously. Yet there remains no principled approach for combining information across these underlying taxonomic backbones, complicating efforts to combine and merge species lists with inconsistent and conflicting taxonomic information. Here, we present Treemendous, an open-source software package for the R programming environment that integrates taxonomic relationships across four publicly available backbones to improve the name resolution of tree species. By mapping relationships across the backbones, this package can be used to resolve datasets with conflicting and inconsistent taxonomic origins, while ensuring the resulting species are accepted and consistent with a single reference backbone. The user can chain together different functionalities ranging from simple matching to a single backbone, to graph-based iterative matching using synonym-accepted relations across all backbones in the database. In addition, the package allows users to 'translate' one tree species list into another, streamlining the assimilation of new data into preexisting datasets or models. The package provides a flexible workflow depending on the use case, and can either be used as a stand-alone name-resolution package or in conjunction with existing packages as a final step in the name-resolution pipeline. The Treemendous package is fast and easy to use, allowing users to quickly merge different data sources by standardizing their species names according to the regularly updated database. By combining taxonomic information across multiple backbones, the package increases matching rates and minimizes data loss, allowing for more efficient translation of tree species datasets to aid research into forest biodiversity and tree ecology.
Collapse
Affiliation(s)
- Felix Specker
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Department of Biosystems Science and Engineering, ETH Zürich, Zürich, Switzerland
| | - Andrea Paz
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | | | - Daniel S. Maynard
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
4
|
Noll NW, Scherber C, Schäffler L. taxalogue: a toolkit to create comprehensive CO1 reference databases. PeerJ 2023; 11:e16253. [PMID: 38077427 PMCID: PMC10702336 DOI: 10.7717/peerj.16253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 09/18/2023] [Indexed: 12/18/2023] Open
Abstract
Background Taxonomic identification through DNA barcodes gained considerable traction through the invention of next-generation sequencing and DNA metabarcoding. Metabarcoding allows for the simultaneous identification of thousands of organisms from bulk samples with high taxonomic resolution. However, reliable identifications can only be achieved with comprehensive and curated reference databases. Therefore, custom reference databases are often created to meet the needs of specific research questions. Due to taxonomic inconsistencies, formatting issues, and technical difficulties, building a custom reference database requires tremendous effort. Here, we present taxalogue, an easy-to-use software for creating comprehensive and customized reference databases that provide clean and taxonomically harmonized records. In combination with extensive geographical filtering options, taxalogue opens up new possibilities for generating and testing evolutionary hypotheses. Methods taxalogue collects DNA sequences from several online sources and combines them into a reference database. Taxonomic incongruencies between the different data sources can be harmonized according to available taxonomies. Dereplication and various filtering options are available regarding sequence quality or metadata information. taxalogue is implemented in the open-source Ruby programming language, and the source code is available at https://github.com/nwnoll/taxalogue. We benchmark four reference databases by sequence identity against eight queries from different localities and trapping devices. Subsamples from each reference database were used to compare how well another one is covered. Results taxalogue produces reference databases with the best coverage at high identities for most tested queries, enabling more accurate, reliable predictions with higher certainty than the other benchmarked reference databases. Additionally, the performance of taxalogue is more consistent while providing good coverage for a variety of habitats, regions, and sampling methods. taxalogue simplifies the creation of reference databases and makes the process reproducible and transparent. Multiple available output formats for commonly used downstream applications facilitate the easy adoption of taxalogue in many different software pipelines. The resulting reference databases improve the taxonomic classification accuracy through high coverage of the query sequences at high identities.
Collapse
Affiliation(s)
- Niklas W. Noll
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| | - Christoph Scherber
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| | - Livia Schäffler
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| |
Collapse
|
5
|
Sandall EL, Maureaud AA, Guralnick R, McGeoch MA, Sica YV, Rogan MS, Booher DB, Edwards R, Franz N, Ingenloff K, Lucas M, Marsh CJ, McGowan J, Pinkert S, Ranipeta A, Uetz P, Wieczorek J, Jetz W. A globally integrated structure of taxonomy to support biodiversity science and conservation. Trends Ecol Evol 2023; 38:1143-1153. [PMID: 37684131 DOI: 10.1016/j.tree.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 07/30/2023] [Accepted: 08/04/2023] [Indexed: 09/10/2023]
Abstract
All aspects of biodiversity research, from taxonomy to conservation, rely on data associated with species names. Effective integration of names across multiple fields is paramount and depends on the coordination and organization of taxonomic data. We assess current efforts and find that even key applications for well-studied taxa still lack commonality in taxonomic information required for integration. We identify essential taxonomic elements from our interoperability assessment to support improved access and integration of taxonomic data. A stronger focus on these elements has the potential to involve taxonomic communities in biodiversity science and overcome broken linkages currently limiting research capacity. We encourage a community effort to democratize taxonomic expertise and language in order to facilitate maximum interoperability and integration.
Collapse
Affiliation(s)
- Emily L Sandall
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA.
| | - Aurore A Maureaud
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA; Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, NJ, USA.
| | - Robert Guralnick
- Department of Natural History, Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | - Melodie A McGeoch
- Securing Antarctica's Environmental Future, Department of Environment and Genetics, LaTrobe University, Melbourne, Australia
| | - Yanina V Sica
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Matthew S Rogan
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Douglas B Booher
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Robert Edwards
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA; Cleveland Museum of Natural History, Cleveland, OH, USA
| | - Nico Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Kate Ingenloff
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Maisha Lucas
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Charles J Marsh
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Jennifer McGowan
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA; The Nature Conservancy, Arlington, VA, USA
| | - Stefan Pinkert
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA; Department of Conservation Ecology, University of Marburg, Marburg, Germany
| | - Ajay Ranipeta
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - John Wieczorek
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA
| | - Walter Jetz
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06520, USA; Center for Biodiversity & Global Change, Yale University, New Haven, CT 06520, USA; E.O. Wilson Biodiversity Foundation, Durham, NC, USA
| |
Collapse
|
6
|
Seah BKB. Paying it forward: Crowdsourcing the harmonisation and linking of taxon names and biodiversity identifiers. Biodivers Data J 2023; 11:e114076. [PMID: 38312332 PMCID: PMC10838036 DOI: 10.3897/bdj.11.e114076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/06/2023] [Indexed: 02/06/2024] Open
Abstract
Linking records for the same taxa between different databases is an essential step when working with biodiversity data. However, name-matching alone is error-prone, because of issues such as homonyms (unrelated taxa with the same name) and synonyms (same taxon under different names). Therefore, most projects will require some curation to ensure that taxon identifiers are correctly linked. Unfortunately, formal guidance on such curation is uncommon and these steps are often ad hoc and poorly documented, which hinders transparency and reproducibility, yet the task requires specialist knowledge and cannot be easily automated without careful validation. Here, we present a case study on linking identifiers between the GBIF and NCBI taxonomies for a species checklist. This represents a common scenario: finding published sequence data (from NCBI) for species chosen by occurrence or geographical distribution (from GBIF). Wikidata, a publicly editable knowledge base of structured data, can serve as an additional information source for identifier linking. We suggest a software toolkit for taxon name-matching and data-cleaning, describe common issues encountered during curation and propose concrete steps to address them. For example, about 2.8% of the taxa in our dataset had wrong identifiers linked on Wikidata because of errors in name-matching caused by homonyms. By correcting such errors during data-cleaning, either directly (through editing Wikidata) or indirectly (by reporting errors in GBIF or NCBI), we crowdsource the curation and contribute to community resources, thereby improving the quality of downstream analyses.
Collapse
Affiliation(s)
- Brandon Kwee Boon Seah
- Thünen Institute for Biodiversity, Braunschweig, GermanyThünen Institute for BiodiversityBraunschweigGermany
| |
Collapse
|
7
|
Schellenberger Costa D, Boehnisch G, Freiberg M, Govaerts R, Grenié M, Hassler M, Kattge J, Muellner-Riehl AN, Rojas Andrés BM, Winter M, Watson M, Zizka A, Wirth C. The big four of plant taxonomy - a comparison of global checklists of vascular plant names. THE NEW PHYTOLOGIST 2023; 240:1687-1702. [PMID: 37243532 DOI: 10.1111/nph.18961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 04/12/2023] [Indexed: 05/29/2023]
Abstract
Taxonomic checklists used to verify published plant names and identify synonyms are a cornerstone of biological research. Four global authoritative checklists for vascular plants exist: Leipzig Catalogue of Vascular Plants, World Checklist of Vascular Plants, World Flora Online (successor of The Plant List, TPL), and WorldPlants. We compared these four checklists in terms of size and differences across taxa. We matched taxon names of these checklists and TPL against each other, identified differences across checklists, and evaluated the consistency of accepted names linked to individual taxon names. We assessed geographic and phylogenetic patterns of variance. All checklists differed strongly compared with TPL and provided identical information on c. 60% of plant names. Geographically, differences in checklists increased from low to high latitudes. Phylogenetically, we detected strong variability across families. A comparison of name-matching performance on taxon names submitted to the functional trait database TRY, and a check of completeness of accepted names evaluated against an independent, expert-curated checklist of the family Meliaceae, showed a similar performance across checklists. This study raises awareness on the differences in data and approach across these checklists potentially impacting analyses. We propose ideas on the way forward exploring synergies and harmonizing the four global checklists.
Collapse
Affiliation(s)
- David Schellenberger Costa
- Department of Special Botany and Functional Biodiversity, Faculty of Life Sciences, University of Leipzig, Johannisallee 21-23, 04103, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr 4, 04103, Leipzig, Germany
| | - Gerhard Boehnisch
- Research Group Functional Biogeography, Max Planck Institute for Biogeochemistry, Hans-Knoell-Str. 10, 07745, Jena, Germany
| | - Martin Freiberg
- Department of Special Botany and Functional Biodiversity, Faculty of Life Sciences, University of Leipzig, Johannisallee 21-23, 04103, Leipzig, Germany
| | - Rafaël Govaerts
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Kew Road, Richmond, TW9 3DS, UK
| | - Matthias Grenié
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr 4, 04103, Leipzig, Germany
| | | | - Jens Kattge
- Research Group Functional Biogeography, Max Planck Institute for Biogeochemistry, Hans-Knoell-Str. 10, 07745, Jena, Germany
| | - Alexandra N Muellner-Riehl
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr 4, 04103, Leipzig, Germany
- Department of Molecular Evolution and Plant Systematics & Herbarium (LZ), Faculty of Life Sciences, University of Leipzig, Johannisallee 21-23, 04103, Leipzig, Germany
| | - Blanca M Rojas Andrés
- Departamento de Botánica y Fisiología Vegetal, Universidad de Salamanca, Ave Licenciado Méndez Nieto s/n, 37007, Salamanca, Spain
- Biobanco de ADN Vegetal, Edificio Multiusos I+D+i, Universidad de Salamanca, Calle Espejo s/n, 37007, Salamanca, Spain
| | - Marten Winter
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr 4, 04103, Leipzig, Germany
| | - Mark Watson
- Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK
| | - Alexander Zizka
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr 4, 04103, Leipzig, Germany
- Department of Biology, Philipps-University Marburg, 35043, Marburg, Germany
| | - Christian Wirth
- Department of Special Botany and Functional Biodiversity, Faculty of Life Sciences, University of Leipzig, Johannisallee 21-23, 04103, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr 4, 04103, Leipzig, Germany
| |
Collapse
|
8
|
Szymura TH, Kassa H, Swacha G, Szymura M, Zając A, Kącki Z. Spatial patterns of vascular plant species richness in Poland - a data set. Sci Data 2023; 10:542. [PMID: 37596254 PMCID: PMC10439123 DOI: 10.1038/s41597-023-02446-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 08/04/2023] [Indexed: 08/20/2023] Open
Abstract
Recognition of species richness spatial patterns is important for nature conservation and theoretical studies. Inventorying species richness, especially at a larger spatial extent is challenging, thus different data sources are joined and harmonized to obtain a comprehensive data set. Here we present a new data set showing vascular plant species richness in Poland based on a grid of 10 × 10 km squares. The data set was created using data from two sources: the Atlas of Distribution of Vascular Plants in Poland and the Polish Vegetation Database. Using this data set, we analysed 2,160 species with taxonomical nomenclature according to the Euro + Med PlantBase checklist in 3,283 squares covering the entire territory of Poland (ca. 312,000 km2). The species were divided into groups according to their status and frequency of distribution, and the statistics for each square were obtained. For purposes of analysis, sampling bias was assessed. The data set promotes theoretical analysis on species richness and reinforces the planning of nature conservations.
Collapse
Affiliation(s)
- Tomasz H Szymura
- Botanical Garden, University of Wrocław, ul. Sienkiewicza 23, 50-335, Wrocław, Poland.
| | - Henok Kassa
- Department of Ecology, Biogeochemistry and Environmental Protection, University of Wrocław, Kanonia 6/8, 50-328, Wrocław, Poland
| | - Grzegorz Swacha
- Botanical Garden, University of Wrocław, ul. Sienkiewicza 23, 50-335, Wrocław, Poland
| | - Magdalena Szymura
- Institute of Agroecology and Plant Production, Wrocław University of Environmental and Life Sciences, Grunwaldzki Sq. 24A, 50-363, Wrocław, Poland
| | - Adam Zając
- Institute of Botany, Faculty of Biology and Earth Sciences, Jagiellonian University in Kraków, Kopernika 27, 31-501, Kraków, Poland
| | - Zygmunt Kącki
- Botanical Garden, University of Wrocław, ul. Sienkiewicza 23, 50-335, Wrocław, Poland
| |
Collapse
|
9
|
Dillon EM, Dunne EM, Womack TM, Kouvari M, Larina E, Claytor JR, Ivkić A, Juhn M, Carmona PSM, Robson SV, Saha A, Villafaña JA, Zill ME. Challenges and directions in analytical paleobiology. PALEOBIOLOGY 2023; 49:377-393. [PMID: 37809321 PMCID: PMC7615171 DOI: 10.1017/pab.2023.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Over the last 50 years, access to new data and analytical tools has expanded the study of analytical paleobiology, contributing to innovative analyses of biodiversity dynamics over Earth's history. Despite-or even spurred by-this growing availability of resources, analytical paleobiology faces deep-rooted obstacles that stem from the need for more equitable access to data and best practices to guide analyses of the fossil record. Recent progress has been accelerated by a collective push toward more collaborative, interdisciplinary, and open science, especially by early-career researchers. Here, we survey four challenges facing analytical paleobiology from an early-career perspective: (1) accounting for biases when interpreting the fossil record; (2) integrating fossil and modern biodiversity data; (3) building data science skills; and (4) increasing data accessibility and equity. We discuss recent efforts to address each challenge, highlight persisting barriers, and identify tools that have advanced analytical work. Given the inherent linkages between these challenges, we encourage discourse across disciplines to find common solutions. We also affirm the need for systemic changes that reevaluate how we conduct and share paleobiological research.
Collapse
Affiliation(s)
- Erin M. Dillon
- Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, California 93106, U.S.A.; Smithsonian Tropical Research Institute, Balboa, Republic of Panama
| | - Emma M. Dunne
- GeoZentrum Nordbayern, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany; School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Tom M. Womack
- School of Geography, Environment and Earth Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
| | - Miranta Kouvari
- Department of Earth Sciences, University College London, Gower Street, London WC1E 6BT, United Kingdom; Life Sciences Department, Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom
| | - Ekaterina Larina
- Jackson School of Geosciences, University of Texas, Austin, Texas 78712, U.S.A
| | - Jordan Ray Claytor
- Department of Biology, University of Washington, Seattle, Washington 98195, U.S.A; Burke Museum of Natural History and Culture, Seattle, Washington 98195, U.S.A
| | - Angelina Ivkić
- Department of Palaeontology, University of Vienna, Josef-Holaubek-Platz 2,1090 Vienna, Austria
| | - Mark Juhn
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California 90095, U.S.A
| | - Pablo S. Milla Carmona
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Ciencias Geológicas, Buenos Aires C1428EGA, Argentina; Instituto de Estudios Andinos “Don Pablo Groeber” (IDEAN, UBA-CONICET), Buenos Aires C1428EGA, Argentina
| | - Selina Viktor Robson
- Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Anwesha Saha
- Institute of Palaeobiology, Polish Academy of Sciences, ul. Twarda 51/55, 00-818 Warsaw, Poland; Laboratory of Paleogenetics and Conservation Genetics, Centre of New Technologies (CeNT), University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland
| | - Jaime A. Villafaña
- Department of Palaeontology, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria; Centro de Investigación en Recursos Naturales y Sustentabilidad, Universidad Bernardo O ‘Higgins, Santiago 8370993, Chile
| | - Michelle E. Zill
- Department of Earth and Planetary Sciences, University of California Riverside, Riverside, California 92521, U.S.A
| |
Collapse
|
10
|
Sterner B, Elliott S, Gilbert EE, Franz NM. Unified and pluralistic ideals for data sharing and reuse in biodiversity. Database (Oxford) 2023; 2023:baad048. [PMID: 37465916 PMCID: PMC10354506 DOI: 10.1093/database/baad048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 05/30/2023] [Accepted: 06/27/2023] [Indexed: 07/20/2023]
Abstract
How should billions of species observations worldwide be shared and made reusable? Many biodiversity scientists assume the ideal solution is to standardize all datasets according to a single, universal classification and aggregate them into a centralized, global repository. This ideal has known practical and theoretical limitations, however, which justifies investigating alternatives. To support better community deliberation and normative evaluation, we develop a novel conceptual framework showing how different organizational models, regulative ideals and heuristic strategies are combined to form shared infrastructures supporting data reuse. The framework is anchored in a general definition of data pooling as an activity of making a taxonomically standardized body of information available for community reuse via digital infrastructure. We describe and illustrate unified and pluralistic ideals for biodiversity data pooling and show how communities may advance toward these ideals using different heuristic strategies. We present evidence for the strengths and limitations of the unification and pluralistic ideals based on systemic relationships of power, responsibility and benefit they establish among stakeholders, and we conclude the pluralistic ideal is better suited for biodiversity data.
Collapse
Affiliation(s)
- Beckett Sterner
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| | - Steve Elliott
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| | - Edward E Gilbert
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| | - Nico M Franz
- School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA
| |
Collapse
|
11
|
Spear D, van Wilgen NJ, Rebelo AG, Botha JM. Collating biodiversity occurrence data for conservation. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2023.1037282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023] Open
Abstract
Plant and animal checklists, with conservation status information, are fundamental for conservation management. Historical field data, more recent data of digital origin and data-sharing platforms provide useful sources for collating species locality data. However, different biodiversity datasets have different formats and inconsistent naming systems. Additionally, most digital data sources do not provide an easy option for download by protected area. Further, data-entry-ready software is not readily available for conservation organization staff with limited technical skills to collate these heterogeneous data and create distribution maps and checklists for protected areas. The insights presented here are the outcome of conceptualizing a biodiversity information system for South African National Parks. We recognize that a fundamental requirement for achieving better standardization, sharing and use of biodiversity data for conservation is capacity building, internet connectivity, national institutional data management support and collaboration. We focus on some of the issues that need to be considered for capacity building, data standardization and data support. We outline the need for using taxonomic backbones and standardizing biodiversity data and the utility of data from the Global Biodiversity Information Facility and other available sources in this process. Additionally, we make recommendations for the fields needed in relational databases for collating species data that can be used to inform conservation decisions and outline steps that can be taken to enable easier collation of biodiversity data, using South Africa as a case study.
Collapse
|
12
|
Zhang J, Qian H. U.Taxonstand: An R package for standardizing scientific names of plants and animals. PLANT DIVERSITY 2023; 45:1-5. [PMID: 36876314 PMCID: PMC9975469 DOI: 10.1016/j.pld.2022.09.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 06/01/2023]
Abstract
The scientific names of organisms are key identifiers of plants and animals. Correctly treating scientific names is a prerequisite for biodiversity research and documentation. Here, we present an R package, 'U.Taxonstand', which can standardize and harmonize scientific names in plant and animal species lists at a fast speed and at a high rate of matching success. Unlike most of other similar R packages each of which works with only one taxonomic database, U.Taxonstand can work with all taxonomic databases, as long as they are properly formatted. Multiple databases for plants and animals that can be directly used by U.Taxonstand, which include bryophytes, vascular plants, amphibians, birds, fishes, mammals, and reptiles, are available online. U.Taxonstand can be a very useful tool for botanists, zoologists, ecologists and biogeographers to standardize and harmonize scientific names of organisms.
Collapse
Affiliation(s)
- Jian Zhang
- Center for Global Change and Complex Ecosystems, Zhejiang Tiantong Forest Ecosystem National Observation and Research Station, School of Ecological and Environmental Sciences, East China Normal University, 200241, Shanghai, China
| | - Hong Qian
- Research and Collections Center, Illinois State Museum, 1011 East Ash Street, Springfield, IL, 62703, USA
| |
Collapse
|
13
|
Laini A, Guareschi S, Bolpagni R, Burgazzi G, Bruno D, Gutiérrez-Cánovas C, Miranda R, Mondy C, Várbíró G, Cancellario T. biomonitoR: an R package for managing ecological data and calculating biomonitoring indices. PeerJ 2022. [DOI: 10.7717/peerj.14183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The monitoring of biological indicators is required to assess the impacts of environmental policies, compare ecosystems and guide management and conservation actions. However, the growing availability of ecological data has not been accompanied by concomitant processing tools able to facilitate data handling and analysis. Multiple common challenges limit the usefulness of biomonitoring information across ecosystems and biological groups. Biomonitoring data analysis is currently constrained by time-consuming steps for data preparation and a data processing environment with limited integration in terms of software, biological groups, and protocols. We introduce biomonitoR, a package for the R programming language that addresses technical challenges for the management of ecological data and metrics calculation. biomonitoR implements most of the biological indices currently used or proposed in different fields of ecology and water resource management. Its combination of customizable functions aims to support a transferable and comprehensive biomonitoring workflow in a user-friendly environment. biomonitoR represents a versatile toolbox with five main assets: (i) it checks taxonomic information against reference datasets allowing for customization of trait and sensitivity scores; (ii) it supports heterogeneous taxonomic resolution allowing computations at multiple taxonomic levels; (iii) it calculates multiple biological indices, including metrics for both broad and stressor-specific ecological assessments; (iv) it enables user-friendly data visualization, helping both decision-making processes and data interpretation; and (v) it allows working with an interactive web application straight from R. Overall, biomonitoR can benefit the wide biomonitoring community, including environmental private consultants, ecologists and natural resource managers.
Collapse
Affiliation(s)
- Alex Laini
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
- Department of Life Sciences and Systems Biology, University of Turin, Torino, Italy
| | - Simone Guareschi
- Estación Biológica de Doñana, Consejo Superior de Investigaciones Científicas, Sevilla, Spain
- Geography and Environment, Loughborough University, Loughborough, United Kingdom
| | - Rossano Bolpagni
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Gemma Burgazzi
- Institute for Environmental Sciences, University of Koblenz-Landau, Landau, Germany
| | - Daniel Bruno
- Instituto Pirenaico de Ecología (IPE), Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
| | | | - Rafael Miranda
- Department of Environmental Biology, University of Navarra, Pamplona, Spain
| | - Cédric Mondy
- French Agency for Biodiversity (OFB), Vincennes, France
| | - Gábor Várbíró
- Centre for Ecological Research, Institute of Aquatic Ecology, Debrecen, Hungary
| | - Tommaso Cancellario
- Department of Environmental Biology, University of Navarra, Pamplona, Spain
- Water Research Institute, National Research Council (CNR), Verbania, Italy
| |
Collapse
|
14
|
Kopperud BT, Lidgard S, Liow LH. Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps. PeerJ 2022; 10:e13921. [PMID: 35999848 PMCID: PMC9393005 DOI: 10.7717/peerj.13921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 07/29/2022] [Indexed: 01/19/2023] Open
Abstract
We use natural language processing (NLP) to retrieve location data for cheilostome bryozoan species (text-mined occurrences (TMO)) in an automated procedure. We compare these results with data combined from two major public databases (DB): the Ocean Biodiversity Information System (OBIS), and the Global Biodiversity Information Facility (GBIF). Using DB and TMO data separately and in combination, we present latitudinal species richness curves using standard estimators (Chao2 and the Jackknife) and range-through approaches. Our combined DB and TMO species richness curves quantitatively document a bimodal global latitudinal diversity gradient for extant cheilostomes for the first time, with peaks in the temperate zones. A total of 79% of the georeferenced species we retrieved from TMO (N = 1,408) and DB (N = 4,549) are non-overlapping. Despite clear indications that global location data compiled for cheilostomes should be improved with concerted effort, our study supports the view that many marine latitudinal species richness patterns deviate from the canonical latitudinal diversity gradient (LDG). Moreover, combining online biodiversity databases with automated information retrieval from the published literature is a promising avenue for expanding taxon-location datasets.
Collapse
Affiliation(s)
- Bjørn Tore Kopperud
- Natural History Museum, University of Oslo, Oslo, Norway,GeoBio-Center, Ludwig-Maximilians-Universität München, München, Germany,Department of Earth and Environmental Sciences, Ludwig-Maximilians-Universität München, München, Germany
| | - Scott Lidgard
- Negaunee Integrative Research Center, Field Museum of Natural History, Chicago, Illinois, U.S.A.
| | - Lee Hsiang Liow
- Natural History Museum, University of Oslo, Oslo, Norway,Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| |
Collapse
|
15
|
Pagad S, Bisset S, Genovesi P, Groom Q, Hirsch T, Jetz W, Ranipeta A, Schigel D, Sica YV, McGeoch MA. Country Compendium of the Global Register of Introduced and Invasive Species. Sci Data 2022; 9:391. [PMID: 35810161 PMCID: PMC9271038 DOI: 10.1038/s41597-022-01514-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 06/23/2022] [Indexed: 11/10/2022] Open
Abstract
The Country Compendium of the Global Register of Introduced and Invasive Species (GRIIS) is a collation of data across 196 individual country checklists of alien species, along with a designation of those species with evidence of impact at a country level. The Compendium provides a baseline for monitoring the distribution and invasion status of all major taxonomic groups, and can be used for the purpose of global analyses of introduced (alien, non-native, exotic) and invasive species (invasive alien species), including regional, single and multi-species taxon assessments and comparisons. It enables exploration of gaps and inferred absences of species across countries, and also provides one means for updating individual GRIIS Checklists. The Country Compendium is, for example, instrumental, along with data on first records of introduction, for assessing and reporting on invasive alien species targets, including for the Convention on Biological Diversity and Sustainable Development Goals. The GRIIS Country Compendium provides a baseline and mechanism for tracking the spread of introduced and invasive alien species across countries globally.Design Type(s) | Data integration objective ● Observation design | Measurement Type(s) | Alien species occurrence ● Evidence of impact invasive alien species assessment objective | Technology Type(s) | Agent expert ● Data collation | Factor Type(s) | Geographic location ● Origin / provenance ● Habitat | Sample Characteristics - Organism | Animalia ● Bacteria ● Chromista ● Fungi ● Plantae ● Protista (Protozoa) ● Viruses | Sample Characteristics - Location | Global countries |
Measurement(s) | Presence of invasive alien species | Technology Type(s) | Literature and datasets | Factor Type(s) | scientificName | Sample Characteristic - Organism | Multitaxon | Sample Characteristic - Environment | Multihabitat | Sample Characteristic - Location | Global |
Collapse
Affiliation(s)
- Shyama Pagad
- University of Auckland, Auckland, New Zealand. .,IUCN SSC Invasive Species Specialist Group, Auckland, New Zealand.
| | - Stewart Bisset
- Department of Environment and Genetics, LaTrobe University, Melbourne, 3086, Victoria, Australia
| | - Piero Genovesi
- IUCN SSC Invasive Species Specialist Group, Auckland, New Zealand.,Institute for Environmental Protection and Research (ISPRA), Rome, Italy
| | | | - Tim Hirsch
- Global Biodiversity Information Facility (GBIF) Secretariat, Universitetsparken 15, DK-2100, Copenhagen Ø, Denmark
| | - Walter Jetz
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT, USA.,Center for Biodiversity & Global Change, Yale University, New Haven, CT, USA
| | - Ajay Ranipeta
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT, USA.,Center for Biodiversity & Global Change, Yale University, New Haven, CT, USA
| | - Dmitry Schigel
- Global Biodiversity Information Facility (GBIF) Secretariat, Universitetsparken 15, DK-2100, Copenhagen Ø, Denmark
| | - Yanina V Sica
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT, USA.,Center for Biodiversity & Global Change, Yale University, New Haven, CT, USA
| | - Melodie A McGeoch
- IUCN SSC Invasive Species Specialist Group, Auckland, New Zealand. .,Department of Environment and Genetics, LaTrobe University, Melbourne, 3086, Victoria, Australia.
| |
Collapse
|
16
|
Seebens H, Kaplan E. DASCO: A workflow to downscale alien species checklists using occurrence records and to re-allocate species distributions across realms. NEOBIOTA 2022. [DOI: 10.3897/neobiota.74.81082] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Information about occurrences of alien species is often provided in so-called checklists, which represents lists of reported alien species in a region. In many cases, available checklists cover whole countries, which is too coarse for many analyses and limits capabilities of assessing status and trends of biological invasions. Information about point-wise occurrences is available in large quantities at online facilities such as GBIF and OBIS, which, however, do not provide information about the invasion status of individual populations. To close this gap, we here provide a semi-automated workflow called DASCO to downscale regional checklists using occurrence records obtained from GBIF and OBIS. Within the workflow, coordinate-based occurrence records for species listed in the provided regional checklists are obtained from GBIF and OBIS, and the status of being an alien population is assigned using the information in the provided checklists. In this way, information in checklists is made available at the local scale, which can then be re-allocated to any other spatial categorisation as provided by the user. In addition, habitats of species are determined to distinguish between marine, brackish, terrestrial, and freshwater species, which allows splitting the provided checklists to the respective realms and ecoregions. By using checklists of global databases, we showcase the usage of the DASCO workflow and revealed > 35 million occurrence records of alien populations in terrestrial and marine regions worldwide, which were back-transformed to terrestrial and marine regions for comparison. DASCO has the potential to be used as a basis for the widely applied species distribution models or assessments of status and trends of biological invasions at large geographic scales. The workflow is implemented in R and in full compliance with the FAIR data principles of open science.
Collapse
|