Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Penev L, Lyal CH, Weitzman A, Morse DR, King D, Sautter G, Georgiev T, Morris RA, Catapano T, Agosti D. XML schemas and mark-up practices of taxonomic literature. Zookeys 2011:89-116. [PMID: 22207808 PMCID: PMC3234433 DOI: 10.3897/zookeys.150.2213] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 11/23/2011] [Indexed: 11/12/2022] Open

For:	Penev L, Lyal CH, Weitzman A, Morse DR, King D, Sautter G, Georgiev T, Morris RA, Catapano T, Agosti D. XML schemas and mark-up practices of taxonomic literature. Zookeys 2011:89-116. [PMID: 22207808 PMCID: PMC3234433 DOI: 10.3897/zookeys.150.2213] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Accepted: 11/23/2011] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Girón JC, Tarasov S, González Montaña LA, Matentzoglu N, Smith AD, Koch M, Boudinot BE, Bouchard P, Burks R, Vogt L, Yoder M, Osumi-Sutherland D, Friedrich F, Beutel RG, Mikó I. Formalizing Invertebrate Morphological Data: A Descriptive Model for Cuticle-Based Skeleto-Muscular Systems, an Ontology for Insect Anatomy, and their Potential Applications in Biodiversity Research and Informatics. Syst Biol 2023;72:1084-1100. [PMID: 37094905 DOI: 10.1093/sysbio/syad025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/17/2023] [Accepted: 04/21/2023] [Indexed: 04/26/2023] Open

Abstract

The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted comparison using semantic web technologies. Here we propose a Model for Describing Cuticular Anatomical Structures (MoDCAS) which incorporates structural properties and positional relationships for standardized, consistent, and reproducible descriptions of arthropod phenotypes. We applied the MoDCAS framework in creating the ontology for the Anatomy of the Insect Skeleto-Muscular system (AISM). The AISM is the first general insect ontology that aims to cover all taxa by providing generalized, fully logical, and queryable, definitions for each term. It was built using the Ontology Development Kit (ODK), which maximizes interoperability with Uberon (Uberon multispecies anatomy ontology) and other basic ontologies, enhancing the integration of insect anatomy into the broader biological sciences. A template system for adding new terms, extending, and linking the AISM to additional anatomical, phenotypic, genetic, and chemical ontologies is also introduced. The AISM is proposed as the backbone for taxon-specific insect ontologies and has potential applications spanning systematic biology and biodiversity informatics, allowing users to: 1) use controlled vocabularies and create semiautomated computer-parsable insect morphological descriptions; 2) integrate insect morphology into broader fields of research, including ontology-informed phylogenetic methods, logical homology hypothesis testing, evo-devo studies, and genotype to phenotype mapping; and 3) automate the extraction of morphological data from the literature, enabling the generation of large-scale phenomic data, by facilitating the production and testing of informatic tools able to extract, link, annotate, and process morphological data. This descriptive model and its ontological applications will allow for clear and semantically interoperable integration of arthropod phenotypes in biodiversity studies.

Collapse

Affiliation(s)

Jennifer C Girón Department of Entomology, Purdue University, West Lafayette, IN, USA Natural Science Research Laboratory, Museum of Texas Tech University, Lubbock, TX, USA
Sergei Tarasov Finnish Museum of Natural History, University of Helsinki, Pohjoinen Rautatiekatu 13, FI-00014 Helsinki, Finland
Luis Antonio González Montaña Facultad de Ciencias Básicas e Ingeniería, Universidad de los Llanos, Villavicencio, Meta, Colombia
Nicolas Matentzoglu Semanticly Ltd., London, UK
Aaron D Smith Department of Entomology, Purdue University, West Lafayette, IN, USA
Markus Koch Institute of Evolutionary Biology and Ecology, University of Bonn, An der Immenburg 1, 53121 Bonn, Germany
Brendon E Boudinot Department of Entomology & Nematology, University of California, Davis, One Shields Ave, CA, USA Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington DC, USA
Patrice Bouchard Biodiversity and Bioresources, Canadian National Collection of Insects, Arachnids and Nematodes, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, Ontario, K1A 0C6, Canada
Roger Burks Entomology Department, University of California, Riverside, 900 University Ave. Riverside, CA, USA
Lars Vogt TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167 Hannover, Germany
Matthew Yoder Illinois Natural History Survey, University of Illinois, Champaign, IL, USA
David Osumi-Sutherland European Bioinformatics institute (EMBL-EBI) WellcomeTrust Genome Campus, CambridgeUK
Frank Friedrich Institut für Zell- und Systembiologie der Tiere, Universität Hamburg, Martin-Luther-King-Platz 3, 20146, Hamburg, Germany
Rolf G Beutel Institut für Zoologie und Evolutionsforschung, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
István Mikó Department of Biological Sciences, University of New Hampshire, Durham, NH, USA

Collapse

Agosti D, Benichou L, Addink W, Arvanitidis C, Catapano T, Cochrane G, Dillen M, Döring M, Georgiev T, Gérard I, Groom Q, Kishor P, Kroh A, Kvaček J, Mergen P, Mietchen D, Pauperio J, Sautter G, Penev L. Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity publishing. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e97374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Dimitrova M, Senderov VE, Georgiev T, Zhelezov G, Penev L. Infrastructure and Population of the OpenBiodiv Biodiversity Knowledge Graph. Biodivers Data J 2021;9:e67671. [PMID: 34690512 PMCID: PMC8486731 DOI: 10.3897/bdj.9.e67671] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 09/08/2021] [Indexed: 11/12/2022] Open

Lücking A, Driller C, Stoeckel M, Abrami G, Pachzelt A, Mehler A. Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology. LANG RESOUR EVAL 2021. [DOI: 10.1007/s10579-021-09553-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Kõljalg U, Nilsson HR, Schigel D, Tedersoo L, Larsson KH, May TW, Taylor AFS, Jeppesen TS, Frøslev TG, Lindahl BD, Põldmaa K, Saar I, Suija A, Savchenko A, Yatsiuk I, Adojaan K, Ivanov F, Piirmann T, Pöhönen R, Zirk A, Abarenkov K. The Taxon Hypothesis Paradigm-On the Unambiguous Detection and Communication of Taxa. Microorganisms 2020;8:E1910. [PMID: 33266327 PMCID: PMC7760934 DOI: 10.3390/microorganisms8121910] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 11/24/2020] [Indexed: 12/27/2022] Open

Abstract

Here, we describe the taxon hypothesis (TH) paradigm, which covers the construction, identification, and communication of taxa as datasets. Defining taxa as datasets of individuals and their traits will make taxon identification and most importantly communication of taxa precise and reproducible. This will allow datasets with standardized and atomized traits to be used digitally in identification pipelines and communicated through persistent identifiers. Such datasets are particularly useful in the context of formally undescribed or even physically undiscovered species if data such as sequences from samples of environmental DNA (eDNA) are available. Implementing the TH paradigm will to some extent remove the impediment to hastily discover and formally describe all extant species in that the TH paradigm allows discovery and communication of new species and other taxa also in the absence of formal descriptions. The TH datasets can be connected to a taxonomic backbone providing access to the vast information associated with the tree of life. In parallel to the description of the TH paradigm, we demonstrate how it is implemented in the UNITE digital taxon communication system. UNITE TH datasets include rich data on individuals and their rDNA ITS sequences. These datasets are equipped with digital object identifiers (DOI) that serve to fix their identity in our communication. All datasets are also connected to a GBIF taxonomic backbone. Researchers processing their eDNA samples using UNITE datasets will, thus, be able to publish their findings as taxon occurrences in the GBIF data portal. UNITE species hypothesis (species level THs) datasets are increasingly utilized in taxon identification pipelines and even formally undescribed species can be identified and communicated by using UNITE. The TH paradigm seeks to achieve unambiguous, unique, and traceable communication of taxa and their properties at any level of the tree of life. It offers a rapid way to discover and communicate undescribed species in identification pipelines and data portals before they are lost to the sixth mass extinction.

Collapse

Affiliation(s)

Urmas Kõljalg Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.) Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Henrik R. Nilsson Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden; (H.R.N.); (K.-H.L.)
Dmitry Schigel Global Biodiversity Information Facility, 2100 Copenhagen, Denmark; (D.S.); (T.S.J.)
Leho Tedersoo Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Karl-Henrik Larsson Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden; (H.R.N.); (K.-H.L.)
Tom W. May Royal Botanic Gardens Victoria, Birdwood Ave, Melbourne, Victoria 3004, Australia;
Andy F. S. Taylor The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH, UK; Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, St Machar Drive, Aberdeen AB24 3UU, UK
Thomas Stjernegaard Jeppesen Global Biodiversity Information Facility, 2100 Copenhagen, Denmark; (D.S.); (T.S.J.)
Tobias Guldberg Frøslev GLOBE Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 København, Denmark;
Björn D. Lindahl Systematic Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden;
Kadri Põldmaa Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.) Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Irja Saar Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Ave Suija Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.) Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Anton Savchenko Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Iryna Yatsiuk Institute of Ecology and Earth Sciences, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (L.T.); (I.S.); (A.S.); (I.Y.)
Kristjan Adojaan Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.)
Filipp Ivanov Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.)
Timo Piirmann Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.)
Raivo Pöhönen Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.)
Allan Zirk Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.)
Kessy Abarenkov Natural History Museum, University of Tartu, 14a Ravila, 50411 Tartu, Estonia; (K.P.); (A.S.); (K.A.); (F.I.); (T.P.); (R.P.); (A.Z.); (K.A.)

Collapse

Rivera-Quiroz FA, Petcharad B, Miller JA. Mining data from legacy taxonomic literature and application for sampling spiders of the Teutamus group (Araneae; Liocranidae) in Southeast Asia. Sci Rep 2020;10:15787. [PMID: 32978432 PMCID: PMC7519673 DOI: 10.1038/s41598-020-72549-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 09/02/2020] [Indexed: 11/12/2022] Open

Abstract

Taxonomic literature contains information about virtually ever known species on Earth. In many cases, all that is known about a taxon is contained in this kind of literature, particularly for the most diverse and understudied groups. Taxonomic publications in the aggregate have documented a vast amount of specimen data. Among other things, these data constitute evidence of the existence of a particular taxon within a spatial and temporal context. When knowledge about a particular taxonomic group is rudimentary, investigators motivated to contribute new knowledge can use legacy records to guide them in their search for new specimens in the field. However, these legacy data are in the form of unstructured text, making it difficult to extract and analyze without a human interpreter. Here, we used a combination of semi-automatic tools to extract and categorize specimen data from taxonomic literature of one family of ground spiders (Liocranidae). We tested the application of these data on fieldwork optimization, using the relative abundance of adult specimens reported in literature as a proxy to find the best times and places for collecting the species (Teutamus politus) and its relatives (Teutamus group, TG) within Southeast Asia. Based on these analyses we decided to collect in three provinces in Thailand during the months of June and August. With our approach, we were able to collect more specimens of T. politus (188 specimens, 95 adults) than all the previous records in literature combined (102 specimens). Our approach was also effective for sampling other representatives of the TG, yielding at least one representative of every TG genus previously reported for Thailand. In total, our samples contributed 231 specimens (134 adults) to the 351 specimens previously reported in the literature for this country. Our results exemplify one application of mined literature data that allows investigators to more efficiently allocate effort and resources for the study of neglected, endangered, or interesting taxa and geographic areas. Furthermore, the integrative workflow demonstrated here shares specimen data with global online resources like Plazi and GBIF, meaning that others can freely reuse these data and contribute to them in the future. The contributions of the present study represent an increase of more than 35% on the taxonomic coverage of the TG in GBIF based on the number of species. Also, our extracted data represents 72% of the occurrences now available through GBIF for the TG and more than 85% of occurrences of T. politus. Taxonomic literature is a key source of undigitized biodiversity data for taxonomic groups that are underrepresented in the current biodiversity data sphere. Mobilizing these data is key to understanding and protecting some of the less well-known domains of biodiversity.

Collapse

OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science. PUBLICATIONS 2019. [DOI: 10.3390/publications7020038] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Faulwetter S, Pafilis E, Fanini L, Bailly N, Agosti D, Arvanitidis C, Boicenco L, Catapano T, Claus S, Dekeyzer S, Georgiev T, Legaki A, Mavraki D, Oulas A, Papastefanou G, Penev L, Sautter G, Schigel D, Senderov V, Teaca A, Tsompanou M. EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases. RESEARCH IDEAS AND OUTCOMES 2016. [DOI: 10.3897/rio.2.e10445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Faulwetter S, Pafilis E, Fanini L, Bailly N, Agosti D, Arvanitidis C, Boicenco L, Capatano T, Claus S, Dekeyzer S, Georgiev T, Legaki A, Mavraki D, Oulas A, Papastefanou G, Penev L, Sautter G, Schigel D, Senderov V, Teaca A, Tsompanou M. EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases. RESEARCH IDEAS AND OUTCOMES 2016. [DOI: 10.3897/rio.2.e9774] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Lyal CHC. Digitising legacy zoological taxonomic literature: Processes, products and using the output. Zookeys 2016:189-206. [PMID: 26877659 PMCID: PMC4741221 DOI: 10.3897/zookeys.550.9702] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 03/26/2015] [Indexed: 02/02/2023] Open

Abstract

By digitising legacy taxonomic literature using XML mark-up the contents become accessible to other taxonomic and nomenclatural information systems. Appropriate schemas need to be interoperable with other sectorial schemas, atomise to appropriate content elements and carry appropriate metadata to, for example, enable algorithmic assessment of availability of a name under the Code. Legacy (and new) literature delivered in this fashion will become part of a global taxonomic resource from which users can extract tailored content to meet their particular needs, be they nomenclatural, taxonomic, faunistic or other.

To date, most digitisation of taxonomic literature has led to a more or less simple digital copy of a paper original – the output of the many efforts has effectively been an electronic copy of a traditional library. While this has increased accessibility of publications through internet access, the means by which many scientific papers are indexed and located is much the same as with traditional libraries. OCR and born-digital papers allow use of web search engines to locate instances of taxon names and other terms, but OCR efficiency in recognising taxonomic names is still relatively poor, people’s ability to use search engines effectively is mixed, and many papers cannot be searched directly. Instead of building digital analogues of traditional publications, we should consider what properties we require of future taxonomic information access. Ideally the content of each new digital publication should be accessible in the context of all previous published data, and the user able to retrieve nomenclatural, taxonomic and other data / information in the form required without having to scan all of the original papers and extract target content manually. This opens the door to dynamic linking of new content with extant systems: automatic population and updating of taxonomic catalogues, ZooBank and faunal lists, all descriptions of a taxon and its children instantly accessible with a single search, comparison of classifications used in different publications, and so on. A means to do this is through marking up content into XML, and the more atomised the mark-up the greater the possibilities for data retrieval and integration. Mark-up requires XML that accommodates the required content elements and is interoperable with other XML schemas, and there are now several written to do this, particularly TaxPub, taxonX and taXMLit, the last of these being the most atomised. We now need to automate this process as far as possible. Manual and automatic data and information retrieval is demonstrated by projects such as INOTAXA and Plazi. As we move to creating and using taxonomic products through the power of the internet, we need to ensure the output, while satisfying in its production the requirements of the Code, is fit for purpose in the future.

Collapse

Senderov V, Penev L. The Open Biodiversity Knowledge Management System in Scholarly Publishing. RESEARCH IDEAS AND OUTCOMES 2016. [DOI: 10.3897/rio.2.e7757] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Miller JA, Agosti D, Penev L, Sautter G, Georgiev T, Catapano T, Patterson D, King D, Pereira S, Vos RA, Sierra S. Integrating and visualizing primary data from prospective and legacy taxonomic literature. Biodivers Data J 2015;3:e5063. [PMID: 26023286 PMCID: PMC4442254 DOI: 10.3897/bdj.3.e5063] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 05/06/2015] [Indexed: 11/24/2022] Open

Abstract

Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovry and description, and 2) the prevelence of rarity in species descriptions.

Collapse

Erwin T, Stoev P, Georgiev T, Penev L. ZooKeys 500: traditions and innovations hand-in-hand servicing our taxonomic community. Zookeys 2015:1-8. [PMID: 25987868 PMCID: PMC4432237 DOI: 10.3897/zookeys.500.9844] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 04/22/2015] [Indexed: 11/22/2022] Open

Miller JA, Georgiev T, Stoev P, Sautter G, Penev L. Corrected data re-harvested: curating literature in the era of networked biodiversity informatics. Biodivers Data J 2015:e4552. [PMID: 25632264 PMCID: PMC4304254 DOI: 10.3897/bdj.3.e4552] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 01/21/2015] [Indexed: 11/12/2022] Open

Federhen S. Type material in the NCBI Taxonomy Database. Nucleic Acids Res 2014;43:D1086-98. [PMID: 25398905 DOI: 10.1093/nar/gku1127] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Liew TS, Vermeulen JJ, Marzuki MEB, Schilthuizen M. A cybertaxonomic revision of the micro-landsnail genus Plectostoma Adam (Mollusca, Caenogastropoda, Diplommatinidae), from Peninsular Malaysia, Sumatra and Indochina. Zookeys 2014:1-107. [PMID: 24715783 PMCID: PMC3974427 DOI: 10.3897/zookeys.393.6717] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 02/27/2014] [Indexed: 11/12/2022] Open

Henle K, Bell S, Brotons L, Clobert J, Evans D, Goerg C, Grodzinska-Jurcak M, Gruber B, Haila Y, Henry PY, Huth A, Julliard R, Keil P, Kleyer M, Kotze DJ, Kunin W, Lengyel S, Lin YP, Loyau A, Luck G, Magnuson W, Margules C, Matsinos Y, May P, Sousa-Pinto I, Possingham H, Potts S, Ring I, Pryke J, Samways M, Saunders D, Schmeller D, Simila J, Sommer S, Steffan-Dewenter I, Stoev P, Sykes M, Tóthmérész B, Yam R, Tzanopoulos J, Penev L. Nature Conservation – a new dimension in Open Access publishing bridging science and application. NATURE CONSERVATION 2012. [DOI: 10.3897/natureconservation.1.3081] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Remsen D, Knapp S, Georgiev T, Stoev P, Penev L. From text to structured data: Converting a word-processed floristic checklist into Darwin Core Archive format. PHYTOKEYS 2012;9:1-13. [PMID: 22371687 PMCID: PMC3281575 DOI: 10.3897/phytokeys.9.2770] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 01/27/2012] [Indexed: 05/24/2023]

Berendsohn WG, Güntsch A, Hoffmann N, Kohlbecker A, Luther K, Müller A. Biodiversity information platforms: From standards to interoperability. Zookeys 2011:71-87. [PMID: 22207807 PMCID: PMC3234432 DOI: 10.3897/zookeys.150.2166] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 11/23/2011] [Indexed: 11/19/2022] Open

Abstract

One of the most serious bottlenecks in the scientific workflows of biodiversity sciences is the need to integrate data from different sources, software applications, and services for analysis, visualisation and publication. For more than a quarter of a century the TDWG Biodiversity Information Standards organisation has a central role in defining and promoting data standards and protocols supporting interoperability between disparate and locally distributed systems.Although often not sufficiently recognized, TDWG standards are the foundation of many popular Biodiversity Informatics applications and infrastructures ranging from small desktop software solutions to large scale international data networks. However, individual scientists and groups of collaborating scientist have difficulties in fully exploiting the potential of standards that are often notoriously complex, lack non-technical documentations, and use different representations and underlying technologies. In the last few years, a series of initiatives such as Scratchpads, the EDIT Platform for Cybertaxonomy, and biowikifarm have started to implement and set up virtual work platforms for biodiversity sciences which shield their users from the complexity of the underlying standards. Apart from being practical work-horses for numerous working processes related to biodiversity sciences, they can be seen as information brokers mediating information between multiple data standards and protocols.The ViBRANT project will further strengthen the flexibility and power of virtual biodiversity working platforms by building software interfaces between them, thus facilitating essential information flows needed for comprehensive data exchange, data indexing, web-publication, and versioning. This work will make an important contribution to the shaping of an international, interoperable, and user-oriented biodiversity information infrastructure.

Collapse

Smith VS, Penev L. Collaborative electronic infrastructures to accelerate taxonomic research. Zookeys 2011:1-3. [PMID: 22207803 PMCID: PMC3234428 DOI: 10.3897/zookeys.150.2458] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2011] [Accepted: 11/28/2011] [Indexed: 11/30/2022] Open

Johnson NF. A collaborative, integrated and electronic future for taxonomy. INVERTEBR SYST 2011. [DOI: 10.1071/is11052] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]