1
|
Takano A, Cole TCH, Konagai H. A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER. Sci Rep 2024; 14:112. [PMID: 38167449 PMCID: PMC10761843 DOI: 10.1038/s41598-023-50179-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
Digital extraction of label data from natural history specimens along with more efficient procedures of data entry and processing is essential for improving documentation and global information availability. Herbaria have made great advances in this direction lately. In this study, using optical character recognition (OCR) and named entity recognition (NER) techniques, we have been able to make further advancements towards fully automatic extraction of label data from herbarium specimen images. This system can be developed and run on a consumer grade desktop computer with standard specifications, and can also be applied to extracting label data from diverse kinds of natural history specimens, such as those in entomological collections. This system can facilitate the digitization and publication of natural history museum specimens around the world.
Collapse
Affiliation(s)
- Atsuko Takano
- Institute of Natural Science and Environment, University of Hyogo/The Museum of Nature and Human Activities, Hyogo, 6 Chome, Yayoigaoka, Sanda, Hyogo, 669-1546, Japan.
| | - Theodor C H Cole
- Institute of Biology, Dahlem Center of Plant Sciences, Freie Universität Berlin, Altensteinstrasse 6, 14195, Berlin, Germany
| | - Hajime Konagai
- Functions Tales, Shimogamo-Honmachi 19-1-101, Sakyo-ku, Kyoto, 606-0862, Japan
| |
Collapse
|
2
|
de Koning K, Broekhuijsen J, Kühn I, Ovaskainen O, Taubert F, Endresen D, Schigel D, Grimm V. Digital twins: dynamic model-data fusion for ecology. Trends Ecol Evol 2023; 38:916-926. [PMID: 37208222 DOI: 10.1016/j.tree.2023.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 04/17/2023] [Accepted: 04/18/2023] [Indexed: 05/21/2023]
Abstract
Digital twins (DTs) are an emerging phenomenon in the public and private sectors as a new tool to monitor and understand systems and processes. DTs have the potential to change the status quo in ecology as part of its digital transformation. However, it is important to avoid misguided developments by managing expectations about DTs. We stress that DTs are not just big models of everything, containing big data and machine learning. Rather, the strength of DTs is in combining data, models, and domain knowledge, and their continuous alignment with the real world. We suggest that researchers and stakeholders exercise caution in DT development, keeping in mind that many of the strengths and challenges of computational modelling in ecology also apply to DTs.
Collapse
Affiliation(s)
- Koen de Koning
- Wageningen University and Research, Environmental Systems Analysis Group, P.O. Box 47, 6700, AA, Wageningen, The Netherlands
| | - Jeroen Broekhuijsen
- Nederlandse organisatie voor toegepast natuurwetenschappenlijk onderzoek - TNO, Department of Monitoring & Control Services, Eemsgolaan 3, 9727 DW Groningen, The Netherlands
| | - Ingolf Kühn
- Helmholtz Centre for Environmental Research - UFZ, Department of Community Ecology, Theodor-Lieser-Strasse, 4, 06120 Halle, Germany; Martin Luther University Halle-Wittenberg, Institute for Biology/Geobotany & Botanical Garden, Große Steinstraße 79/80, 06108 Halle, Germany; German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany
| | - Otso Ovaskainen
- Department of Biological and Environmental Science, University of Jyväskylä, P.O. Box 35 (Survontie 9C), FI-40014 Jyväskylä, Finland; Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, P.O. Box 65, Helsinki 00014, Finland; Department of Biology, Centre for Biodiversity Dynamics, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Franziska Taubert
- Helmholtz Centre for Environmental Research - UFZ, Department of Ecological Modelling, Permoserstr. 15, 04318 Leipzig, Germany
| | - Dag Endresen
- University of Oslo, Natural History Museum, Sars gate 1, NO-0562 Oslo, Norway.
| | - Dmitry Schigel
- Global Biodiversity Information Facility - GBIF Secreteriat, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Volker Grimm
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany; Helmholtz Centre for Environmental Research - UFZ, Department of Ecological Modelling, Permoserstr. 15, 04318 Leipzig, Germany; University of Potsdam, Plant Ecology and Nature Conservation, Am Mühlenberg 3, 14476 Potsdam, Germany
| |
Collapse
|
3
|
Braker EM. Phototank setup and focus stack imaging method for reptile and amphibian specimens (Amphibia, Reptilia). Zookeys 2022; 1134:185-210. [PMID: 36761107 PMCID: PMC9836466 DOI: 10.3897/zookeys.1134.96103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open
Abstract
Fluid-preserved reptile and amphibian specimens are challenging to photograph with traditional methods due to their complex three-dimensional forms and reflective surfaces when removed from solution. An effective approach to counteract these issues involves combining focus stack photography with the use of a photo immersion tank. Imaging specimens beneath a layer of preservative fluid eliminates glare and risk of specimen desiccation, while focus stacking produces sharp detail through merging multiple photographs taken at successive focal steps to create a composite image with an extended depth of field. This paper describes the wet imaging components and focus stack photography workflow developed while conducting a large-scale digitization project for targeted reptile and amphibian specimens housed in the University of Colorado Museum of Natural History Herpetology Collection. This methodology can be implemented in other collections settings and adapted for use with fluid-preserved specimen types across the Tree of Life to generate high-quality, taxonomically informative images for use in documenting biodiversity, remote examination of fine traits, inclusion in publications, and educational applications.
Collapse
Affiliation(s)
- Emily M. Braker
- Vertebrate Zoology, University of Colorado Museum of Natural History, UCB 265, Boulder CO 80309, USAUniversity of Colorado Museum of Natural HistoryBoulderUnited States of America
| |
Collapse
|
4
|
Vaidya G, Cellinese N, Lapp H. A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx). PeerJ 2022; 10:e12618. [PMID: 35186448 PMCID: PMC8855714 DOI: 10.7717/peerj.12618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 11/18/2021] [Indexed: 01/06/2023] Open
Abstract
To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.
Collapse
Affiliation(s)
- Gaurav Vaidya
- Renaissance Computing Institute (RENCI), University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America,Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of America
| | - Nico Cellinese
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of America,Informatics Institute, University of Florida, Gainesville, FL, United States of America
| | - Hilmar Lapp
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
| |
Collapse
|
5
|
Parker-Allie F, Pando F, Telenius A, Ganglo JC, Vélez D, Gibbons MJ, Talavan A, Raymond M, Russell L, Talukdar G, Vargas M, Radji R, Koivula H, Heughebaert A, Endresen D, Amariles-García D, Osawa T. Towards a Post-Graduate Level Curriculum for Biodiversity Informatics. Perspectives from the Global Biodiversity Information Facility (GBIF) Community. Biodivers Data J 2021; 9:e68010. [PMID: 34720633 PMCID: PMC8516826 DOI: 10.3897/bdj.9.e68010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/06/2021] [Indexed: 11/12/2022] Open
Abstract
Biodiversity informatics is a new and evolving field, requiring efforts to develop capacity and a curriculum for this field of science. The main objective was to summarise the level of activity and the efforts towards developing biodiversity informatics curricula, for work-based training and/or academic teaching at universities, taking place within the Global Biodiversity Information Facility (GBIF) countries and its associated network. A survey approach was used to identify existing capacities and resources within the network. Most of GBIF Nodes survey respondents (80%) are engaged in onsite training activities, with a focus on work-based professionals, mostly researchers, policy-makers and students. Training topics include data mobilisation, digitisation, management, publishing, analysis and use, to enable the accessibility of analogue and digital biological data that currently reside as scattered datasets. An initial assessment of academic teaching activities highlighted that countries in most regions, to varying degrees, were already engaged in the conceptualisation, development and/or implementation of formal academic programmes in biodiversity informatics, including programmes in Benin, Colombia, Costa Rica, Finland, France, India, Norway, South Africa, Sweden, Taiwan and Togo. Digital e-learning platforms were an important tool to help build capacity in many countries. In terms of the potential in the Nodes network, 60% expressed willingness to be recruited or commissioned for capacity enhancement purposes. Contributions and activities of various country nodes across the network have been highlighted and a working curriculum framework has been defined.
Collapse
Affiliation(s)
- Fatima Parker-Allie
- South African National Biodiversity Institute, Cape Town, South Africa South African National Biodiversity Institute Cape Town South Africa
| | - Francisco Pando
- Real Jardin Botanico -CSIC, Madrid, Spain Real Jardin Botanico -CSIC Madrid Spain
| | - Anders Telenius
- Swedish Museum of Natural History, Stockholm, Sweden Swedish Museum of Natural History Stockholm Sweden
| | - Jean C Ganglo
- Laboratoire des Sciences Forestières, Faculté des Sciences Agronomiques, Université d'Abomey-Calavi, Abomey-Calavi, Benin Laboratoire des Sciences Forestières, Faculté des Sciences Agronomiques, Université d'Abomey-Calavi Abomey-Calavi Benin
| | - Danny Vélez
- Instituto Alexander von Humboldt, Bogotá, Colombia Instituto Alexander von Humboldt Bogotá Colombia
| | - Mark John Gibbons
- University of Western Cape, Cape Town, South Africa University of Western Cape Cape Town South Africa
| | - Alberto Talavan
- Independent Consultant, Copenhagen, Denmark Independent Consultant Copenhagen Denmark
| | | | - Laura Russell
- GBIF, Copenhagen, Denmark GBIF Copenhagen Denmark.,University of Kansas, KU Biodiversity Institute, Lawrence, United States of America University of Kansas, KU Biodiversity Institute Lawrence United States of America.,VertNet, Lawrence, United States of America VertNet Lawrence United States of America
| | - Gautam Talukdar
- Wildlife Institute of India, Dehradun, India Wildlife Institute of India Dehradun India
| | - Manuel Vargas
- Instituto Nacional de Biodiversidad, Santo Domingo de Heredia, Costa Rica Instituto Nacional de Biodiversidad Santo Domingo de Heredia Costa Rica
| | - Raoufou Radji
- University of Lome, Lome, Togo University of Lome Lome Togo
| | - Hanna Koivula
- CSC- IT Centre for Science, Espoo, Finland CSC- IT Centre for Science Espoo Finland
| | - André Heughebaert
- Belgian Biodiversity Platform, Bruxelles, Belgium Belgian Biodiversity Platform Bruxelles Belgium
| | - Dag Endresen
- University of Oslo, Oslo, Norway University of Oslo Oslo Norway.,University of Oslo, Oslo, Norway University of Oslo Oslo Norway.,GBIF Norway, Oslo, Norway GBIF Norway Oslo Norway
| | - Daniel Amariles-García
- International Center for Tropical Agriculture, Cali, Colombia International Center for Tropical Agriculture Cali Colombia
| | - Takeshi Osawa
- Tokyo Metropolitan University, Tokyo, Japan Tokyo Metropolitan University Tokyo Japan
| |
Collapse
|
6
|
Monfils AK, Krimmel ER, Linton DL, Marsico TD, Morris AB, Ruhfel BR. Collections Education: The Extended Specimen and Data Acumen. Bioscience 2021; 72:177-188. [PMID: 35145351 PMCID: PMC8824687 DOI: 10.1093/biosci/biab109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Biodiversity scientists must be fluent across disciplines; they must possess the quantitative, computational, and data skills necessary for working with large, complex data sets, and they must have foundational skills and content knowledge from ecology, evolution, taxonomy, and systematics. To effectively train the emerging workforce, we must teach science as we conduct science and embrace emerging concepts of data acumen alongside the knowledge, tools, and techniques foundational to organismal biology. We present an open education resource that updates the traditional plant collection exercise to incorporate best practices in twenty-first century collecting and to contextualize the activities that build data acumen. Students exposed to this resource gained skills and content knowledge in plant taxonomy and systematics, as well as a nuanced understanding of collections-based data resources. We discuss the importance of the extended specimen in fostering scientific discovery and reinforcing foundational concepts in biodiversity science, taxonomy, and systematics.
Collapse
Affiliation(s)
- Anna K Monfils
- Central Michigan University, Mount Pleasant, Michigan, United States
| | - Erica R Krimmel
- Florida State University, Tallahassee, Florida, United States
| | - Debra L Linton
- Central Michigan University, Mount Pleasant, Michigan, United States
| | | | - Ashley B Morris
- Furman University, Greenville, South Carolina, United States
| | - Brad R Ruhfel
- University of Michigan, Ann Arbor, Michigan, United States
| |
Collapse
|
7
|
Affiliation(s)
- E Sally Chang
- Postdoctoral researcher in evolutionary genomics, a member of the SSE Public Policy Committee and the current chair of the SSE Graduate Student Advisory Council
| |
Collapse
|