Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Patterson D, Mozzherin D, Shorthouse DP, Thessen A. Challenges with using names to link digital biodiversity information. Biodivers Data J 2016;4:e8080. [PMID: 27346955 PMCID: PMC4910497 DOI: 10.3897/bdj.4.e8080] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 05/19/2016] [Indexed: 01/05/2023] Open

For:	Patterson D, Mozzherin D, Shorthouse DP, Thessen A. Challenges with using names to link digital biodiversity information. Biodivers Data J 2016;4:e8080. [PMID: 27346955 PMCID: PMC4910497 DOI: 10.3897/bdj.4.e8080] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 05/19/2016] [Indexed: 01/05/2023] Open

Number

Cited by Other Article(s)

Cho MH, Cho KH, No KT. PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing. BMC Bioinformatics 2023;24:475. [PMID: 38097955 PMCID: PMC10722791 DOI: 10.1186/s12859-023-05588-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open

Seah BKB. Paying it forward: Crowdsourcing the harmonisation and linking of taxon names and biodiversity identifiers. Biodivers Data J 2023;11:e114076. [PMID: 38312332 PMCID: PMC10838036 DOI: 10.3897/bdj.11.e114076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/06/2023] [Indexed: 02/06/2024] Open

Brown MJM, Walker BE, Black N, Govaerts RHA, Ondo I, Turner R, Nic Lughadha E. rWCVP: a companion R package for the World Checklist of Vascular Plants. THE NEW PHYTOLOGIST 2023;240:1355-1365. [PMID: 37289204 DOI: 10.1111/nph.18919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/06/2023] [Indexed: 06/09/2023]

Patterson D. The scope and scale of the life sciences (‘Nature’s envelope’). RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e96132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Grenié M, Berti E, Carvajal‐Quintero J, Dädlow GML, Sagouis A, Winter M. Harmonizing taxon names in biodiversity data: a review of tools, databases, and best practices. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13802] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Durso AM, Ruiz de Castañeda R, Montalcini C, Mondardini MR, Fernandez-Marques JL, Grey F, Müller MM, Uetz P, Marshall BM, Gray RJ, Smith CE, Becker D, Pingleton M, Louies J, Abegg AD, Akuboy J, Alcoba G, Daltry JC, Entiauspe-Neto OM, Freed P, de Freitas MA, Glaudas X, Huang S, Huang T, Kalki Y, Kojima Y, Laudisoit A, Limbu KP, Martínez-Fonseca JG, Mebert K, Rödel MO, Ruane S, Ruedi M, Schmitz A, Tatum SA, Tillack F, Visvanathan A, Wüster W, Bolon I. Citizen science and online data: Opportunities and challenges for snake ecology and action against snakebite. Toxicon X 2021;9-10:100071. [PMID: 34278294 PMCID: PMC8264216 DOI: 10.1016/j.toxcx.2021.100071] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 06/10/2021] [Accepted: 06/15/2021] [Indexed: 12/03/2022] Open

Affiliation(s)

Andrew M. Durso Department of Biological Sciences, Florida Gulf Coast University, Ft. Myers, FL, USA Institute of Global Health, Department of Community Health and Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
Rafael Ruiz de Castañeda Institute of Global Health, Department of Community Health and Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland World Health Organization, Geneva, Switzerland
Camille Montalcini University of Bern, Bern, Switzerland
M. Rosa Mondardini Citizen Science Center Zürich (ETH Zürich and University of Zürich), Zürich, Switzerland
Jose L. Fernandez-Marques University of Geneva, Geneva, Switzerland
François Grey University of Geneva, Geneva, Switzerland
Martin M. Müller École polytechnique fédérale de Lausanne, Geneva, Switzerland
Peter Uetz The Reptile Database, Richmond, VA, USA Virginia Commonwealth University, Richmond, VA, USA
Benjamin M. Marshall Suranaree University of Technology, Nakhon Ratchasima, Thailand
Russell J. Gray R. J. Gray Ecology, New Smyrna Beach, FL, USA
Christopher E. Smith HerpMapper, St. Paul, MN, USA
Donald Becker HerpMapper, Cedar Rapids, IA, USA
Michael Pingleton HerpMapper, Champaign, IL, USA
Jose Louies Indian Snakes, Kottayam, Kerala, India
Arthur D. Abegg Instituto Butantan, São Paulo, São Paulo, Brazil University of São Paulo, São Paulo, São Paulo, Brazil
Jeannot Akuboy University of Kisangani, Kisangani, Democratic Republic of the Congo
Gabriel Alcoba University of Geneva Hospitals, Geneva, Switzerland
Jennifer C. Daltry Flora & Fauna International, Cambridge, England, UK Global Wildlife Conservation, Austin, TX, USA
Omar M. Entiauspe-Neto Universidade Federal do Rio Grande, Rio Grande, Rio Grande do Sul, Brazil
Paul Freed The Reptile Database, Richmond, VA, USA Reptile Database, Scotts Mills, OR, USA
Marco Antonio de Freitas Murici Ecological Station, Murici, Alagoas, Brazil
Xavier Glaudas University of the Witwatersrand, Johannesburg, South Africa Bangor University, Bangor, Wales, UK
Song Huang Anhui Normal University, Wuhu, Anhui, China
Tianqi Huang Rutgers University, New Brunswick, NJ, USA
Yatin Kalki Madras Crocodile Bank Trust, Mahabalipuram, Tamil Nadu, India
Yosuke Kojima Toho University, Funabashi, Japan
Anne Laudisoit EcoHealth Alliance, New York, NY, USA
Kul Prasad Limbu Tribhuvan University, Biratnagar, Nepal
José G. Martínez-Fonseca Northern Arizona University, Flagstaff, AZ, USA
Konrad Mebert Global Biology, Birr, Switzerland Institute of Development, Ecology, Conservation & Cooperation, Rome, Italy
Mark-Oliver Rödel Museum für Naturkunde - Leibniz Institute for Evolution and Biodiversity Science, Berlin, Germany
Sara Ruane Rutgers University, Newark, NJ, USA
Manuel Ruedi Museum d'Histoire naturelle Geneve, Geneva, Switzerland
Andreas Schmitz Museum d'Histoire naturelle Geneve, Geneva, Switzerland
Sarah A. Tatum University of North Georgia, Dahlonega, GA, USA
Frank Tillack Museum für Naturkunde - Leibniz Institute for Evolution and Biodiversity Science, Berlin, Germany
Avinash Visvanathan Friends of Snakes Society, Hyderabad, Telangana, India
Wolfgang Wüster Molecular Ecology and Fisheries Genetics Laboratory, School of Natural Sciences, Bangor University, Bangor, Wales, UK
Isabelle Bolon Institute of Global Health, Department of Community Health and Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland

Collapse

Stribling JB, Leppo EW. Relationship of taxonomic error to frequency of observation. PLoS One 2020;15:e0241933. [PMID: 33180842 PMCID: PMC7660486 DOI: 10.1371/journal.pone.0241933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 10/22/2020] [Indexed: 11/18/2022] Open

Campbell DL, Thessen AE, Ries L. A novel curation system to facilitate data integration across regional citizen science survey programs. PeerJ 2020;8:e9219. [PMID: 32821528 PMCID: PMC7395600 DOI: 10.7717/peerj.9219] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 04/28/2020] [Indexed: 11/20/2022] Open

Walton S, Livermore L, Bánki O, Cubey R, Drinkwater R, Englund M, Goble C, Groom Q, Kermorvant C, Rey I, Santos C, Scott B, Williams A, Wu Z. Landscape Analysis for the Specimen Data Refinery. RESEARCH IDEAS AND OUTCOMES 2020. [DOI: 10.3897/rio.6.e57602] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Portik DM, Wiens JJ. SuperCRUNCH: A bioinformatics toolkit for creating and manipulating supermatrices and other large phylogenetic datasets. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13392] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Bioinformatics for Marine Products: An Overview of Resources, Bottlenecks, and Perspectives. Mar Drugs 2019;17:md17100576. [PMID: 31614509 PMCID: PMC6835618 DOI: 10.3390/md17100576] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 10/01/2019] [Accepted: 10/02/2019] [Indexed: 12/13/2022] Open

Minelli A. The galaxy of the non-Linnaean nomenclature. HISTORY AND PHILOSOPHY OF THE LIFE SCIENCES 2019;41:31. [PMID: 31435827 DOI: 10.1007/s40656-019-0271-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 08/08/2019] [Indexed: 06/10/2023]

Abstract

Contrary to the traditional claim that needs for unambiguous communication about animal and plant species are best served by a single set of names (Linnaean nomenclature) ruled by international Codes, I suggest that a more diversified system is required, especially to cope with problems emerging from aggregation of biodiversity data in large databases. Departures from Linnaean nomenclature are sometimes intentional, but there are also other, less obvious but widespread forms of not Code-compliant grey nomenclature. A first problem is due to the circumstance that the Codes are intended to rule over the way names are applied to species and other taxonomic units, whereas users of taxonomy need names to be applied to specimens. For different reasons, it is often impossible to refer a specimen with certainty to a named species, and in those cases an open nomenclature is employed. Second, molecular taxonomy leads to the discovery of clusters of gene sequence diversity not necessarily equivalent to the species recognized and named by taxonomists. Those clusters are mostly indicated with informal names or formulas that challenge comparison between different publications or databases. In several instances, it is not even clear if a formula refers to an individual voucher specimen, or is a provisional species name. The use of non-Linnaean names and formulas must be revised and strengthened by fixing standard formats for the different kinds of objects or hypotheses and providing permanent association of 'grey names' with standardized source information such as author and year. In the context of a broad-scope revisitation of aims and scope of scientific nomenclature, it may be worth rethinking if natural objects like plant galls and lichens, although other than the 'single-entity' objects traditionally covered by biological classifications, may nevertheless deserve taxonomic names.

Collapse

A dataset of egg size and shape from more than 6,700 insect species. Sci Data 2019;6:104. [PMID: 31270334 PMCID: PMC6610123 DOI: 10.1038/s41597-019-0049-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 01/25/2019] [Indexed: 12/20/2022] Open

Insect egg size and shape evolve with ecology but not developmental rate. Nature 2019;571:58-62. [PMID: 31270484 DOI: 10.1038/s41586-019-1302-4] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/14/2019] [Indexed: 12/25/2022]

Stucky BJ, Balhoff JP, Barve N, Barve V, Brenskelle L, Brush MH, Dahlem GA, Gilbert JDJ, Kawahara AY, Keller O, Lucky A, Mayhew PJ, Plotkin D, Seltmann KC, Talamas E, Vaidya G, Walls R, Yoder M, Zhang G, Guralnick R. Developing a vocabulary and ontology for modeling insect natural history data: example data, use cases, and competency questions. Biodivers Data J 2019;7:e33303. [PMID: 30918448 PMCID: PMC6426826 DOI: 10.3897/bdj.7.e33303] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 02/28/2019] [Indexed: 11/12/2022] Open

Affiliation(s)

Brian J. Stucky Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
James P. Balhoff Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, United States of AmericaRenaissance Computing Institute, University of North CarolinaChapel Hill, NCUnited States of America
Narayani Barve Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
Vijay Barve Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
Laura Brenskelle Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
Matthew H. Brush Oregon Health and Science University, Portland, OR, United States of AmericaOregon Health and Science UniversityPortland, ORUnited States of America
Gregory A Dahlem Department of Biological Sciences, Northern Kentucky University, Highland Heights, KY, United States of AmericaDepartment of Biological Sciences, Northern Kentucky UniversityHighland Heights, KYUnited States of America
James D. J. Gilbert Department of Biological and Marine Sciences, University of Hull, Hull, United KingdomDepartment of Biological and Marine Sciences, University of HullHullUnited Kingdom
Akito Y. Kawahara Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America Entomology and Nematology Department, University of Florida, Gainesville, FL, United States of AmericaEntomology and Nematology Department, University of FloridaGainesville, FLUnited States of America
Oliver Keller Entomology and Nematology Department, University of Florida, Gainesville, FL, United States of AmericaEntomology and Nematology Department, University of FloridaGainesville, FLUnited States of America
Andrea Lucky Entomology and Nematology Department, University of Florida, Gainesville, FL, United States of AmericaEntomology and Nematology Department, University of FloridaGainesville, FLUnited States of America
Peter J. Mayhew Department of Biology, University of York, York, United KingdomDepartment of Biology, University of YorkYorkUnited Kingdom
David Plotkin Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
Katja C. Seltmann
Elijah Talamas Florida Department of Agriculture and Consumer Services, Gainesville, FL, United States of AmericaFlorida Department of Agriculture and Consumer ServicesGainesville, FLUnited States of America
Gaurav Vaidya Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
Ramona Walls Bio5 and CyVerse, University of Arizona, Tucson, AZ, United States of AmericaBio5 and CyVerse, University of ArizonaTucson, AZUnited States of America
Matt Yoder Species File Group, Illinois Natural History Survey, University of Illinois, Champaign, IL, United States of AmericaSpecies File Group, Illinois Natural History Survey, University of IllinoisChampaign, ILUnited States of America
Guanyang Zhang Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
Rob Guralnick Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America

Collapse

Jackson LM, Fernando PC, Hanscom JS, Balhoff JP, Mabee PM. Automated Integration of Trees and Traits: A Case Study Using Paired Fin Loss Across Teleost Fishes. Syst Biol 2018;67:559-575. [PMID: 29325126 PMCID: PMC6005059 DOI: 10.1093/sysbio/syx098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 12/15/2017] [Accepted: 12/21/2017] [Indexed: 11/24/2022] Open

Abstract

Data synthesis required for large-scale macroevolutionary studies is challenging with the current tools available for integration. Using a classic question regarding the frequency of paired fin loss in teleost fishes as a case study, we sought to create automated methods to facilitate the integration of broad-scale trait data with a sizable species-level phylogeny. Similar to the evolutionary pattern previously described for limbs, pelvic and pectoral fin reduction and loss are thought to have occurred independently multiple times in the evolution of fishes. We developed a bioinformatics pipeline to identify the presence and absence of pectoral and pelvic fins of 12,582 species. To do this, we integrated a synthetic morphological supermatrix of phenotypic data for the pectoral and pelvic fins for teleost fishes from the Phenoscape Knowledgebase (two presence/absence characters for 3047 taxa) with a species-level tree for teleost fishes from the Open Tree of Life project (38,419 species). The integration method detailed herein harnessed a new combined approach by utilizing data based on ontological inference, as well as phylogenetic propagation, to reduce overall data loss. Using inference enabled by ontology-based annotations, missing data were reduced from 98.0% to 85.9%, and further reduced to 34.8% by phylogenetic data propagation. These methods allowed us to extend the data to an additional 11,293 species for a total of 12,582 species with trait data. The pectoral fin appears to have been independently lost in a minimum of 19 lineages and the pelvic fin in 48. Though interpretation is limited by lack of phylogenetic resolution at the species level, it appears that following loss, both pectoral and pelvic fins were regained several (3) to many (14) times respectively. Focused investigation into putative regains of the pectoral fin, all within one clade (Anguilliformes), showed that the pectoral fin was regained at least twice following loss. Overall, this study points to specific teleost clades where strategic phylogenetic resolution and genetic investigation will be necessary to understand the pattern and frequency of pectoral fin reversals.

Collapse

Franz NM, Zhang C, Lee J. A logic approach to modelling nomenclatural change. Cladistics 2018;34:336-357. [PMID: 34645079 DOI: 10.1111/cla.12201] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2017] [Indexed: 11/27/2022] Open

Parr CS, Thessen AE. Biodiversity Informatics. ECOL INFORM 2018. [DOI: 10.1007/978-3-319-59928-1_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Mozzherin DY, Myltsev AA, Patterson DJ. "gnparser": a powerful parser for scientific names based on Parsing Expression Grammar. BMC Bioinformatics 2017;18:279. [PMID: 28549446 PMCID: PMC5446698 DOI: 10.1186/s12859-017-1663-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 04/28/2017] [Indexed: 11/16/2022] Open

Abstract

Background

Scientific names in biology act as universal links. They allow us to cross-reference information about organisms globally. However variations in spelling of scientific names greatly diminish their ability to interconnect data. Such variations may include abbreviations, annotations, misspellings, etc. Authorship is a part of a scientific name and may also differ significantly. To match all possible variations of a name we need to divide them into their elements and classify each element according to its role. We refer to this as ‘parsing’ the name. Parsing categorizes name’s elements into those that are stable and those that are prone to change. Names are matched first by combining them according to their stable elements. Matches are then refined by examining their varying elements. This two stage process dramatically improves the number and quality of matches. It is especially useful for the automatic data exchange within the context of “Big Data” in biology.

Results

We introduce Global Names Parser (gnparser). It is a Java tool written in Scala language (a language for Java Virtual Machine) to parse scientific names. It is based on a Parsing Expression Grammar. The parser can be applied to scientific names of any complexity. It assigns a semantic meaning (such as genus name, species epithet, rank, year of publication, authorship, annotations, etc.) to all elements of a name. It is able to work with nested structures as in the names of hybrids. gnparser performs with ≈99% accuracy and processes 30 million name-strings/hour per CPU thread. The gnparser library is compatible with Scala, Java, R, Jython, and JRuby. The parser can be used as a command line application, as a socket server, a web-app or as a RESTful HTTP-service. It is released under an Open source MIT license.

Conclusions

Global Names Parser (gnparser) is a fast, high precision tool for biodiversity informaticians and biologists working with large numbers of scientific names. It can replace expensive and error-prone manual parsing and standardization of scientific names in many situations, and can quickly enhance the interoperability of distributed biological information.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1663-3) contains supplementary material, which is available to authorized users.

Collapse

Rees JA, Cranston K. Automated assembly of a reference taxonomy for phylogenetic data synthesis. Biodivers Data J 2017:e12581. [PMID: 28765728 PMCID: PMC5515096 DOI: 10.3897/bdj.5.e12581] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 05/12/2017] [Indexed: 12/24/2022] Open

Constructing a biodiversity terminological inventory. PLoS One 2017;12:e0175277. [PMID: 28414821 PMCID: PMC5393592 DOI: 10.1371/journal.pone.0175277] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 03/23/2017] [Indexed: 11/22/2022] Open

Abstract

The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., terms. However, a given concept may be referred to in text using various synonyms or term variants, making search systems likely to overlook documents mentioning less known variants, which are albeit relevant to a query term. Domain-specific terminological resources, which include term variants, synonyms and related terms, are thus important in supporting semantic search over large textual archives. This article describes the use of text mining methods for the automatic construction of a large-scale biodiversity term inventory. The inventory consists of names of species, amongst which naming variations are prevalent. We apply a number of distributional semantic techniques on all of the titles in the Biodiversity Heritage Library, to compute semantic similarity between species names and support the automated construction of the resource. With the construction of our biodiversity term inventory, we demonstrate that distributional semantic models are able to identify semantically similar names that are not yet recorded in existing taxonomies. Such methods can thus be used to update existing taxonomies semi-automatically by deriving semantically related taxonomic names from a text corpus and allowing expert curators to validate them. We also evaluate our inventory as a means to improve search by facilitating automatic query expansion. Specifically, we developed a visual search interface that suggests semantically related species names, which are available in our inventory but not always in other repositories, to incorporate into the search query. An assessment of the interface by domain experts reveals that our query expansion based on related names is useful for increasing the number of relevant documents retrieved. Its exploitation can benefit both users and developers of search engines and text mining applications.

Collapse

Dietrich CH, Dmitriev DA. Insect phylogenetics in the digital age. CURRENT OPINION IN INSECT SCIENCE 2016;18:48-52. [PMID: 27939710 DOI: 10.1016/j.cois.2016.09.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 09/21/2016] [Indexed: 06/06/2023]

Franz N, Gilbert E, Ludäscher B, Weakley A. Controlling the taxonomic variable: Taxonomic concept resolution for a southeastern United States herbarium portal. RESEARCH IDEAS AND OUTCOMES 2016. [DOI: 10.3897/rio.2.e10610] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract Overview. Taxonomic names are imperfect identifiers of specific and sometimes conflicting taxonomic perspectives in aggregated biodiversity data environments. The inherent ambiguities of names can be mitigated using syntactic and semantic conventions developed under the taxonomic concept approach. These include: (1) representation of taxonomic concept labels (TCLs: name sec. source) to precisely identify name usages and meanings, (2) use of parent/child relationships to assemble separate taxonomic perspectives, and (3) expert provision of Region Connection Calculus articulations (RCC–5: congruence, [inverse] inclusion, overlap, exclusion) that specify how data identified to different-sourced TCLs can be integrated. Application of these conventions greatly increases trust in biodiversity data networks, most of which promote unitary taxonomic 'syntheses' that obscure the actual diversity of expert-held views. Better design solutions allow users to control the taxonomic variable and thereby assess the robustness of their biological inferences under different perspectives. A unique constellation of prior efforts – including the powerful Symbiota collections software platform, the Euler/X multi-taxonomy alignment toolkit, and the "Weakley Flora" which entails 7,000 concepts and more than 75,000 RCC–5 articulations – provides the opportunity to build a first full-scale concept resolution service for SERNEC, the SouthEast Regional Network of Expertise and Collections, currently with 60 member herbaria and 2 million occurrence records. Intellectual merit. We have developed a multi-dimensional, step-wise plan to transition SERNEC's data culture from name- to concept-based practices. (1) We will engage SERNEC experts through annual, regional workshops and follow-up interactions that will foster buy-in and ultimately the completion of 12 community-identified use cases. (2). We will leverage RCC–5 data from the Weakley Flora and further development of the Euler/X logic reasoning toolkit to provide comprehensive genus- to variety-level concept alignments for at least 10 major flora treatments with highest relevance to SERNEC. The visualizations and estimated > 1 billion inferred concept-to-concept relations will effectively drive specimen data integration in the transformed portal. (3) We will expand Symbiota's taxonomy and occurrence schemas and related user interfaces to support the new concept data, including novel batch and map-based specimen determination modules, with easy output options in Darwin Core Archive format. (4) Through combinations of the new technology, enlisted taxonomic expertise, and SERNEC's large image resources, we will upgrade minimally 80% of all SERNEC specimen identifications from names to the narrowest suitable TCLs, or add "uncertainty" flags to specimens needing further study. (5) We will utilize the novel tools and data to demonstrate how controlling for the taxonomic variable in 12 use cases variously drives the outcomes of evolutionary, ecological, and conservation-based research hypotheses. Broader impacts. Our project is focused on just one herbarium network, but the potential impact is as wide as Darwin Core or even comparative biology. We believe that trust in networked biodiversity data depends on open and dynamic system designs, allowing expert access and resolution of multiple conflicting views that reflect the complex realities of ongoing taxonomic research. Taking well over 1 million SERNEC records from name- to TCL-resolution will show that "big" specimen data can pass the credibility threshold needed to validate the substantive data mobilization investment. We will mentor one postdoctoral researcher (UNC), two Ph.D. students (ASU, UIUC), and at least 15 undergraduate students (ASU). Each of our workshops will capacitate 10-15 SERNEC experts, who in turn can recruit colleagues and students at their home collections. We will incorporate the project theme and use cases into undergraduate courses taught at six institutions and reaching an estimated 300-500 students annually (10-40% minority students). At each institution, project members will make a systematic effort to recruit new students from underrepresented groups. Our group's leadership of Symbiota (with close ties to iDigBio), SERNEC, and local biodiversity projects and centers will further promote the new data culture. We will create a feature story "Where do plant species occur?" for ASU's popular "Ask A Biologist" website, and a series of undergraduate student-led "How-To" videos that illustrate the use case workflows, including the creation of multi-taxonomy alignments. Collapse