1
|
Sterner B, Witteveen J, Franz N. Coordinating dissent as an alternative to consensus classification: insights from systematics for bio-ontologies. HISTORY AND PHILOSOPHY OF THE LIFE SCIENCES 2020; 42:8. [PMID: 32030540 DOI: 10.1007/s40656-020-0300-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 01/17/2020] [Indexed: 06/10/2023]
Abstract
The collection and classification of data into meaningful categories is a key step in the process of knowledge making. In the life sciences, the design of data discovery and integration tools has relied on the premise that a formal classificatory system for expressing a body of data should be grounded in consensus definitions for classifications. On this approach, exemplified by the realist program of the Open Biomedical Ontologies Foundry, progress is maximized by grounding the representation and aggregation of data on settled knowledge. We argue that historical practices in systematic biology provide an important and overlooked alternative approach to classifying and disseminating data, based on a principle of coordinative rather than definitional consensus. Systematists have developed a robust system for referring to taxonomic entities that can deliver high quality data discovery and integration without invoking consensus about reality or "settled" science.
Collapse
Affiliation(s)
- Beckett Sterner
- School of Life Sciences, Arizona State University, Tempe, USA.
| | - Joeri Witteveen
- Department of Science Education, Section for History and Philosophy of Science, University of Copenhagen, Copenhagen, Denmark
| | - Nico Franz
- School of Life Sciences, Arizona State University, Tempe, USA
| |
Collapse
|
2
|
Franz NM, Musher LJ, Brown JW, Yu S, Ludäscher B. Verbalizing phylogenomic conflict: Representation of node congruence across competing reconstructions of the neoavian explosion. PLoS Comput Biol 2019; 15:e1006493. [PMID: 30768597 PMCID: PMC6395011 DOI: 10.1371/journal.pcbi.1006493] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 02/28/2019] [Accepted: 09/10/2018] [Indexed: 11/24/2022] Open
Abstract
Phylogenomic research is accelerating the publication of landmark studies that aim to resolve deep divergences of major organismal groups. Meanwhile, systems for identifying and integrating the products of phylogenomic inference-such as newly supported clade concepts-have not kept pace. However, the ability to verbalize node concept congruence and conflict across multiple, in effect simultaneously endorsed phylogenomic hypotheses, is a prerequisite for building synthetic data environments for biological systematics and other domains impacted by these conflicting inferences. Here we develop a novel solution to the conflict verbalization challenge, based on a logic representation and reasoning approach that utilizes the language of Region Connection Calculus (RCC-5) to produce consistent alignments of node concepts endorsed by incongruent phylogenomic studies. The approach employs clade concept labels to individuate concepts used by each source, even if these carry identical names. Indirect RCC-5 modeling of intensional (property-based) node concept definitions, facilitated by the local relaxation of coverage constraints, allows parent concepts to attain congruence in spite of their differentially sampled children. To demonstrate the feasibility of this approach, we align two recent phylogenomic reconstructions of higher-level avian groups that entail strong conflict in the "neoavian explosion" region. According to our representations, this conflict is constituted by 26 instances of input "whole concept" overlap. These instances are further resolvable in the output labeling schemes and visualizations as "split concepts", which provide the labels and relations needed to build truly synthetic phylogenomic data environments. Because the RCC-5 alignments fundamentally reflect the trained, logic-enabled judgments of systematic experts, future designs for such environments need to promote a culture where experts routinely assess the intensionalities of node concepts published by our peers-even and especially when we are not in agreement with each other.
Collapse
Affiliation(s)
- Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Lukas J. Musher
- Richard Gilder Graduate School and Department of Ornithology, American Museum of Natural History, New York, New York, United States of America
| | - Joseph W. Brown
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Shizhuo Yu
- Department of Computer Science, University of California at Davis, Davis, California, United States of America
| | - Bertram Ludäscher
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| |
Collapse
|
3
|
Cui H, Macklin JA, Sachs J, Reznicek A, Starr J, Ford B, Penev L, Chen HL. Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production. Biodivers Data J 2018; 6:e29616. [PMID: 30473620 PMCID: PMC6235995 DOI: 10.3897/bdj.6.e29616] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 10/23/2018] [Indexed: 01/17/2023] Open
Abstract
Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production.
Collapse
Affiliation(s)
- Hong Cui
- University of Arizona, TUCSON, United States of AmericaUniversity of ArizonaTUCSONUnited States of America
| | - James A. Macklin
- Agriculture and Agri-Food Canada, Ottawa, CanadaAgriculture and Agri-Food CanadaOttawaCanada
| | - Joel Sachs
- Agriculture and Agri-Food Canada, Ottawa, CanadaAgriculture and Agri-Food CanadaOttawaCanada
| | - Anton Reznicek
- University of Michigan, Ann Arbor, United States of AmericaUniversity of MichiganAnn ArborUnited States of America
| | - Julian Starr
- University of Ottawa, Ottawa, CanadaUniversity of OttawaOttawaCanada
| | - Bruce Ford
- University of Manitoba, Winnipeg, CanadaUniversity of ManitobaWinnipegCanada
| | - Lyubomir Penev
- Pensoft Publishers & Bulgarian Academy of Sciences, Sofia, BulgariaPensoft Publishers & Bulgarian Academy of SciencesSofiaBulgaria
| | - Hsin-Liang Chen
- University of Massachusetts at Boston, Boston, United States of AmericaUniversity of Massachusetts at BostonBostonUnited States of America
| |
Collapse
|
4
|
Jackson LM, Fernando PC, Hanscom JS, Balhoff JP, Mabee PM. Automated Integration of Trees and Traits: A Case Study Using Paired Fin Loss Across Teleost Fishes. Syst Biol 2018; 67:559-575. [PMID: 29325126 PMCID: PMC6005059 DOI: 10.1093/sysbio/syx098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 12/15/2017] [Accepted: 12/21/2017] [Indexed: 11/24/2022] Open
Abstract
Data synthesis required for large-scale macroevolutionary studies is challenging with the current tools available for integration. Using a classic question regarding the frequency of paired fin loss in teleost fishes as a case study, we sought to create automated methods to facilitate the integration of broad-scale trait data with a sizable species-level phylogeny. Similar to the evolutionary pattern previously described for limbs, pelvic and pectoral fin reduction and loss are thought to have occurred independently multiple times in the evolution of fishes. We developed a bioinformatics pipeline to identify the presence and absence of pectoral and pelvic fins of 12,582 species. To do this, we integrated a synthetic morphological supermatrix of phenotypic data for the pectoral and pelvic fins for teleost fishes from the Phenoscape Knowledgebase (two presence/absence characters for 3047 taxa) with a species-level tree for teleost fishes from the Open Tree of Life project (38,419 species). The integration method detailed herein harnessed a new combined approach by utilizing data based on ontological inference, as well as phylogenetic propagation, to reduce overall data loss. Using inference enabled by ontology-based annotations, missing data were reduced from 98.0% to 85.9%, and further reduced to 34.8% by phylogenetic data propagation. These methods allowed us to extend the data to an additional 11,293 species for a total of 12,582 species with trait data. The pectoral fin appears to have been independently lost in a minimum of 19 lineages and the pelvic fin in 48. Though interpretation is limited by lack of phylogenetic resolution at the species level, it appears that following loss, both pectoral and pelvic fins were regained several (3) to many (14) times respectively. Focused investigation into putative regains of the pectoral fin, all within one clade (Anguilliformes), showed that the pectoral fin was regained at least twice following loss. Overall, this study points to specific teleost clades where strategic phylogenetic resolution and genetic investigation will be necessary to understand the pattern and frequency of pectoral fin reversals.
Collapse
Affiliation(s)
- Laura M Jackson
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - Pasan C Fernando
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - Josh S Hanscom
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive Suite 540, Chapel Hill, NC 27517, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| |
Collapse
|
5
|
Franz NM, Zhang C, Lee J. A logic approach to modelling nomenclatural change. Cladistics 2018; 34:336-357. [PMID: 34645079 DOI: 10.1111/cla.12201] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2017] [Indexed: 11/27/2022] Open
Abstract
We utilize an Answer Set Programming (ASP) approach to show that the principles of nomenclature are tractable in computational logic. To this end we design a hypothetical, 20 nomenclatural taxon use case, with starting conditions that embody several overarching principles of the International Code of Zoological Nomenclature, including Binomial Nomenclature, Priority, Coordination, Homonymy, Typification and the structural requirement of Gender Agreement. The use case ending conditions are triggered by the reinterpretation of the diagnostic features of one of 12 type specimens anchoring the corresponding species-level epithets. Permutations of this child-to-parent reassignment action lead to 36 alternative scenarios, where each scenario requires a set of 1-14 logically contingent nomenclatural emendations. We show that an ASP transition system approach can correctly infer the Code-mandated changes for each scenario, and visually output the ending conditions. The results provide a foundation for further developing logic-based nomenclatural change optimization and validation services, which could be applied in global nomenclatural registries. More generally, logic explorations of nomenclatural and taxonomic change scenarios provide a novel means of assessing design biases inherent in the principles of nomenclature, and can therefore inform the design of future, big data-compatible identifier systems that recognize and mitigate these constraints.
Collapse
Affiliation(s)
- Nico M Franz
- School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA
| | - Chao Zhang
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, PO Box 878809, Tempe, AZ, 85287-8809, USA
| | - Joohyung Lee
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, PO Box 878809, Tempe, AZ, 85287-8809, USA
| |
Collapse
|
6
|
Korshunova T, Martynov A, Bakken T, Evertsen J, Fletcher K, Mudianta IW, Saito H, Lundin K, Michael Schrödl, Picton B. Polyphyly of the traditional family Flabellinidae affects a major group of Nudibranchia: aeolidacean taxonomic reassessment with descriptions of several new families, genera, and species (Mollusca, Gastropoda). Zookeys 2017; 717:1-139. [PMID: 29391848 PMCID: PMC5784208 DOI: 10.3897/zookeys.717.21885] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 11/08/2017] [Indexed: 12/29/2022] Open
Abstract
The Flabellinidae, a heterogeneous assembly of supposedly plesiomorphic to very derived sea slug groups, have not yet been addressed by integrative studies. Here novel material of rarely seen Arctic taxa as well as North Atlantic, North and South Pacific, and tropical Indo-West Pacific flabellinid species is investigated morpho-anatomically and with multi-locus markers (partial COI, 16S rDNA, 28S rDNA and H3) which were generated and analysed in a comprehensive aeolid taxon sampling. It was found that the current family Flabellinidae is polyphyletic and its phylogeny and taxonomic patterns cannot be understood without considering members from all the Aeolidacean families and, based on a robust phylogenetic hypothesis, morpho-anatomical evolution of aeolids is more complex than suspected in earlier works and requires reclassification of the taxon. Morphological diversity of Flabellinidae is corroborated by molecular divergence rates and supports establishing three new families (Apataidae fam. n., Flabellinopsidae fam. n., Samlidae fam. n.), 16 new genera, 13 new species, and two new subspecies among the former Flabellinidae. Two families, namely Coryphellidae and Paracoryphellidae, are restored and traditional Flabellinidae is considerably restricted. The distinctness of the recently described family Unidentiidae is confirmed by both morphological and molecular data. Several species complexes among all ex-"Flabellinidae" lineages are recognised using both morphological and molecular data. The present study shows that Facelinidae and Aeolidiidae, together with traditional "Tergipedidae", deeply divide traditional "Flabellinidae." Diagnoses for all aeolidacean families are therefore provided and additionally two new non-flabellinid families (Abronicidae fam. n. and Murmaniidae fam. n.) within traditional tergipedids are established to accommodate molecular and morphological disparity. To address relationships and disparity, we propose a new family system for aeolids. Here the aeolidacean species are classified into at least 102 genera and 24 families. Operational rules for integration of morphological and molecular data for taxonomy are suggested.
Collapse
Affiliation(s)
- Tatiana Korshunova
- Koltzov Institute of Developmental Biology, RAS, 26 Vavilova Str., 119334 Moscow, Russia
- Zoological Museum, Moscow State University, Bolshaya Nikitskaya Str. 6, 125009 Moscow, Russia
| | - Alexander Martynov
- Zoological Museum, Moscow State University, Bolshaya Nikitskaya Str. 6, 125009 Moscow, Russia
| | - Torkild Bakken
- NTNU University Museum, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jussi Evertsen
- NTNU University Museum, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | | | | | - Hiroshi Saito
- National Museum of Nature and Science, Amakubo 4-1-1, Tsukuba, Japan
| | - Kennet Lundin
- Gothenburg Natural History Museum, Box 7283, S-40235, Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, S-40530, Gothenburg, Sweden
| | - Michael Schrödl
- Zoologische Staatssammlung München, Münchhausenstr. 21, D-81247 München, Germany
| | - Bernard Picton
- National Museums Northern Ireland, Holywood, Northern Ireland, United Kingdom
- Queen’s University, Belfast, Northern Ireland, United Kingdom
| |
Collapse
|
7
|
Franz N, Zhang G. Three new species of entimine weevils in Early Miocene amber from the Dominican Republic (Coleoptera: Curculionidae). Biodivers Data J 2017; 5:e10469. [PMID: 28325975 PMCID: PMC5345054 DOI: 10.3897/bdj.5.e10469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 11/29/2016] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Using syntactic and semantic conventions of the taxonomic concept approach (Franz et al. 2015), we describe three newly recognized fossil broad-nosed weevils (Coleoptera: Curculionidae: Entiminae) preserved in Early Miocene amber (ca. 20.4-16.0 mya) from the Dominican Republic: Scelianoma compactasp. n. sec. Franz & Zhang (2017) (henceforth abbreviated as [FZ2017]), Tropirhinus palpebratussp. n. [FZ2017], and Diaprepes anticussp. n. [FZ2017]. The taxonomic assignment of the amber inclusions is grounded in a preceding phylogenetic analysis by Franz (2012). As many as 88 of the 143 therein identified characters were coded for the fossils, whose traits are largely congruent with those present in extant congeners while also differing in ways that justify their new nomenclatural and taxonomic status. NEW INFORMATION We present detailed images, descriptions, and phylogenetically informed diagnoses for the three new species-level entities, along with logically consistent Region Connection Calculus (RCC-5) alignments of the amended genus-level classifications for Scelianoma Franz and Girón 2009 [FZ2017], Tropirhinus Schoenherr 1823 [FZ2017], and Diaprepes Schoenherr 1823 [FZ2017] - in relation to 2-4 preceding classifications published in 1982-2012. The description of Scelianoma compacta [FZ2017] from Hispaniola is indicative of a more widespread historical range of Scelianoma [FZ2017] than reflected in the extant, southwestern Puerto Rican Scelianoma elydimorpha Franz and Girón 2009 sec. Franz and Girón (2009). The presence of Diaprepes anticus [FZ2017] in Hispaniola during the Early Miocene suggests an eastward directed process of island colonization and likely speciation of members of Diaprepes [FZ2017], given that most extant relatives occur throughout the Lesser Antilles. The herein presented data will facilitate more reliable reconstructions of historical biographic processes thought to have played a prominent role in the diversification of the West Indian and Neotropical mainland broad-nosed weevil lineages.
Collapse
Affiliation(s)
- Nico Franz
- Arizona State University, Tempe, United States of America
| | - Guanyang Zhang
- Arizona State University, Tempe, United States of America
| |
Collapse
|
8
|
Ytow N. Taxonaut: an application software for comparative display of multiple taxonomies with a use case of GBIF Species API. Biodivers Data J 2016:e9787. [PMID: 27932916 PMCID: PMC5136681 DOI: 10.3897/bdj.4.e9787] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 09/23/2016] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The Species API of the Global Biodiversity Information Facility (GBIF) provides public access to taxonomic data aggregated from multiple data sources. Each data source follows its own classification which can be inconsistent with classifications from other sources. Even with a reference classification e.g. the GBIF Backbone taxonomy, a comprehensive method to compare classifications in the data aggregation is essential, especially for non-expert users. NEW INFORMATION A Java application was developed to compare multiple taxonomies graphically using classification data acquired from GBIF's ChecklistBank via the GBIF Species API. It uses a table to display taxonomies where each column represents a taxonomy under comparison, with an aligner column to organise taxa by name. Each cell contains the name of a taxon if the classification in that column contains the name. Each column also has a cell showing the hierarchy of the taxonomy by a folder metaphor where taxa are aligned and synchronised in the aligner column. A set of those comparative tables shows taxa categorised by relationship between taxonomies. The result set is also available as tables in an Excel format file.
Collapse
Affiliation(s)
- Nozomi Ytow
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8572, Japan
| |
Collapse
|
9
|
Franz N, Gilbert E, Ludäscher B, Weakley A. Controlling the taxonomic variable: Taxonomic concept resolution for a southeastern United States herbarium portal. RESEARCH IDEAS AND OUTCOMES 2016. [DOI: 10.3897/rio.2.e10610] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Overview. Taxonomic names are imperfect identifiers of specific and sometimes conflicting taxonomic perspectives in aggregated biodiversity data environments. The inherent ambiguities of names can be mitigated using syntactic and semantic conventions developed under the taxonomic concept approach. These include: (1) representation of taxonomic concept labels (TCLs: name sec. source) to precisely identify name usages and meanings, (2) use of parent/child relationships to assemble separate taxonomic perspectives, and (3) expert provision of Region Connection Calculus articulations (RCC–5: congruence, [inverse] inclusion, overlap, exclusion) that specify how data identified to different-sourced TCLs can be integrated. Application of these conventions greatly increases trust in biodiversity data networks, most of which promote unitary taxonomic 'syntheses' that obscure the actual diversity of expert-held views. Better design solutions allow users to control the taxonomic variable and thereby assess the robustness of their biological inferences under different perspectives. A unique constellation of prior efforts – including the powerful Symbiota collections software platform, the Euler/X multi-taxonomy alignment toolkit, and the "Weakley Flora" which entails 7,000 concepts and more than 75,000 RCC–5 articulations – provides the opportunity to build a first full-scale concept resolution service for SERNEC, the SouthEast Regional Network of Expertise and Collections, currently with 60 member herbaria and 2 million occurrence records.
Intellectual merit. We have developed a multi-dimensional, step-wise plan to transition SERNEC's data culture from name- to concept-based practices. (1) We will engage SERNEC experts through annual, regional workshops and follow-up interactions that will foster buy-in and ultimately the completion of 12 community-identified use cases. (2). We will leverage RCC–5 data from the Weakley Flora and further development of the Euler/X logic reasoning toolkit to provide comprehensive genus- to variety-level concept alignments for at least 10 major flora treatments with highest relevance to SERNEC. The visualizations and estimated > 1 billion inferred concept-to-concept relations will effectively drive specimen data integration in the transformed portal. (3) We will expand Symbiota's taxonomy and occurrence schemas and related user interfaces to support the new concept data, including novel batch and map-based specimen determination modules, with easy output options in Darwin Core Archive format. (4) Through combinations of the new technology, enlisted taxonomic expertise, and SERNEC's large image resources, we will upgrade minimally 80% of all SERNEC specimen identifications from names to the narrowest suitable TCLs, or add "uncertainty" flags to specimens needing further study. (5) We will utilize the novel tools and data to demonstrate how controlling for the taxonomic variable in 12 use cases variously drives the outcomes of evolutionary, ecological, and conservation-based research hypotheses.
Broader impacts. Our project is focused on just one herbarium network, but the potential impact is as wide as Darwin Core or even comparative biology. We believe that trust in networked biodiversity data depends on open and dynamic system designs, allowing expert access and resolution of multiple conflicting views that reflect the complex realities of ongoing taxonomic research. Taking well over 1 million SERNEC records from name- to TCL-resolution will show that "big" specimen data can pass the credibility threshold needed to validate the substantive data mobilization investment. We will mentor one postdoctoral researcher (UNC), two Ph.D. students (ASU, UIUC), and at least 15 undergraduate students (ASU). Each of our workshops will capacitate 10-15 SERNEC experts, who in turn can recruit colleagues and students at their home collections. We will incorporate the project theme and use cases into undergraduate courses taught at six institutions and reaching an estimated 300-500 students annually (10-40% minority students). At each institution, project members will make a systematic effort to recruit new students from underrepresented groups. Our group's leadership of Symbiota (with close ties to iDigBio), SERNEC, and local biodiversity projects and centers will further promote the new data culture. We will create a feature story "Where do plant species occur?" for ASU's popular "Ask A Biologist" website, and a series of undergraduate student-led "How-To" videos that illustrate the use case workflows, including the creation of multi-taxonomy alignments.
Collapse
|
10
|
Affiliation(s)
- Marco Sigovini
- CNR – National Research Council of Italy ISMAR – Marine Sciences Institute Arsenale Tesa 104 Castello 2737/F I‐30122 Venice Italy
| | - Erica Keppel
- CNR – National Research Council of Italy ISMAR – Marine Sciences Institute Arsenale Tesa 104 Castello 2737/F I‐30122 Venice Italy
- Smithsonian Environmental Research Center (SERC) 647, Contees Wharf Road Edgewater MD 21037 USA
| | - Davide Tagliapietra
- CNR – National Research Council of Italy ISMAR – Marine Sciences Institute Arsenale Tesa 104 Castello 2737/F I‐30122 Venice Italy
| |
Collapse
|
11
|
Franz NM, Pier NM, Reeder DM, Chen M, Yu S, Kianmajd P, Bowers S, Ludäscher B. Two Influential Primate Classifications Logically Aligned. Syst Biol 2016; 65:561-82. [PMID: 27009895 PMCID: PMC4911943 DOI: 10.1093/sysbio/syw023] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Revised: 03/11/2016] [Accepted: 03/17/2016] [Indexed: 01/02/2023] Open
Abstract
Classifications and phylogenies of perceived natural entities change in the light of new evidence. Taxonomic changes, translated into Code-compliant names, frequently lead to name:meaning dissociations across succeeding treatments. Classification standards such as the Mammal Species of the World (MSW) may experience significant levels of taxonomic change from one edition to the next, with potential costs to long-term, large-scale information integration. This circumstance challenges the biodiversity and phylogenetic data communities to express taxonomic congruence and incongruence in ways that both humans and machines can process, that is, to logically represent taxonomic alignments across multiple classifications. We demonstrate that such alignments are feasible for two classifications of primates corresponding to the second and third MSW editions. Our approach has three main components: (i) use of taxonomic concept labels, that is name sec. author (where sec. means according to), to assemble each concept hierarchy separately via parent/child relationships; (ii) articulation of select concepts across the two hierarchies with user-provided Region Connection Calculus (RCC-5) relationships; and (iii) the use of an Answer Set Programming toolkit to infer and visualize logically consistent alignments of these input constraints. Our use case entails the Primates sec. Groves (1993; MSW2-317 taxonomic concepts; 233 at the species level) and Primates sec. Groves (2005; MSW3-483 taxonomic concepts; 376 at the species level). Using 402 RCC-5 input articulations, the reasoning process yields a single, consistent alignment and 153,111 Maximally Informative Relations that constitute a comprehensive meaning resolution map for every concept pair in the Primates sec. MSW2/MSW3. The complete alignment, and various partitions thereof, facilitate quantitative analyses of name:meaning dissociation, revealing that nearly one in three taxonomic names are not reliable across treatments-in the sense of the same name identifying congruent taxonomic meanings. The RCC-5 alignment approach is potentially widely applicable in systematics and can achieve scalable, precise resolution of semantically evolving name usages in synthetic, next-generation biodiversity, and phylogeny data platforms.
Collapse
Affiliation(s)
- Nico M Franz
- School of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287, USA;
| | - Naomi M Pier
- School of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287, USA
| | - Deeann M Reeder
- Department of Biology, Bucknell University, 1 Dent Drive, Lewisburg, PA 17837, USA
| | - Mingmin Chen
- Department of Computer Science, 2063 Kemper Hall, 1 Shields Avenue, University of California at Davis, CA 95616, USA
| | - Shizhuo Yu
- Department of Computer Science, 2063 Kemper Hall, 1 Shields Avenue, University of California at Davis, CA 95616, USA
| | - Parisa Kianmajd
- Department of Computer Science, 2063 Kemper Hall, 1 Shields Avenue, University of California at Davis, CA 95616, USA
| | - Shawn Bowers
- Department of Computer Science, 502 East Boone Avenue, AD Box 26, Gonzaga University, Spokane, WA 99258, USA
| | - Bertram Ludäscher
- Gradate School of Library and Information Science, 510 East Daniel Street, University of Illinois at Urbana-Champaign, Champaign, IL 61820
| |
Collapse
|
12
|
Patterson D, Mozzherin D, Shorthouse DP, Thessen A. Challenges with using names to link digital biodiversity information. Biodivers Data J 2016; 4:e8080. [PMID: 27346955 PMCID: PMC4910497 DOI: 10.3897/bdj.4.e8080] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 05/19/2016] [Indexed: 01/05/2023] Open
Affiliation(s)
| | - Dmitry Mozzherin
- Illinois Natural History Survey, Champaign, IL, United States of America
| | | | - Anne Thessen
- The Data Detektive, Waltham, United States of America
- The Ronin Institute for Independent Scholarship, Montclair, United States of America
| |
Collapse
|
13
|
Thessen AE, Bunker DE, Buttigieg PL, Cooper LD, Dahdul WM, Domisch S, Franz NM, Jaiswal P, Lawrence-Dill CJ, Midford PE, Mungall CJ, Ramírez MJ, Specht CD, Vogt L, Vos RA, Walls RL, White JW, Zhang G, Deans AR, Huala E, Lewis SE, Mabee PM. Emerging semantics to link phenotype and environment. PeerJ 2015; 3:e1470. [PMID: 26713234 PMCID: PMC4690371 DOI: 10.7717/peerj.1470] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 11/12/2015] [Indexed: 11/20/2022] Open
Abstract
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.
Collapse
Affiliation(s)
- Anne E. Thessen
- Ronin Institute for Independent Scholarship, Monclair, NJ, United States
- The Data Detektiv, Waltham, MA, United States
| | - Daniel E. Bunker
- Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, United States
| | - Pier Luigi Buttigieg
- HGF-MPG Group for Deep Sea Ecology and Technology, Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar-und Meeresforschung, Bremerhaven, Germany
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Wasila M. Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| | - Sami Domisch
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Carolyn J. Lawrence-Dill
- Departments of Genetics, Development and Cell Biology and Agronomy, Iowa State University, Ames, IA, United States
| | | | | | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales–CONICET, Buenos Aires, Argentina
| | - Chelsea D. Specht
- Departments of Plant and Microbial Biology & Integrative Biology, University of California, Berkeley, CA, United States
| | - Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany
| | | | - Ramona L. Walls
- iPlant Collaborative, University of Arizona, Tucson, AZ, United States
| | - Jeffrey W. White
- US Arid Land Agricultural Research Center, United States Department of Agriculture—ARS, Maricopa, AZ, United States
| | - Guanyang Zhang
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, PA, United States
| | - Eva Huala
- Phoenix Bioinformatics, Redwood City, CA, United States
| | - Suzanna E. Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| |
Collapse
|
14
|
Jansen MA, Franz NM. Phylogenetic revision of Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleoptera, Curculionidae) using taxonomic concept annotations and alignments. Zookeys 2015; 528:1-133. [PMID: 26692791 PMCID: PMC4668883 DOI: 10.3897/zookeys.528.6001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 10/03/2015] [Indexed: 11/12/2022] Open
Abstract
This contribution adopts the taxonomic concept annotation and alignment approach. Accordingly, and where indicated, previous and newly inferred meanings of taxonomic names are individuated according to one specific source. Articulations among these concepts and pairwise, logically consistent alignments of original and revisionary classifications are also provided, in addition to conventional nomenclatural provenance information. A phylogenetic revision of the broad-nosed weevil genera Minyomerus Horn, 1876 sec. O'Brien & Wibmer (1982), and Piscatopus Sleeper, 1960 sec. O'Brien & Wibmer (1982) (Curculionidae [non-focal]: Entiminae [non-focal]: Tanymecini [non-focal]) is presented. Prior to this study, Minyomerus sec. O'Brien & Wibmer (1982) contained seven species, whereas the monotypic Piscatopus sec. O'Brien & Wibmer (1982) was comprised solely of Piscatopus griseus Sleeper, 1960 sec. O'Brien & Wibmer (1982). We thoroughly redescribe these recognized species-level entities and furthermore describe ten species as new to science: Minyomerus bulbifrons sec. Jansen & Franz (2015) (henceforth: [JF2015]), sp. n., Minyomerus aeriballux [JF2015], sp. n., Minyomerus cracens [JF2015], sp. n., Minyomerus gravivultus [JF2015], sp. n., Minyomerus imberbus [JF2015], sp. n., Minyomerus reburrus [JF2015], sp. n., Minyomerus politus [JF2015], sp. n., Minyomerus puticulatus [JF2015], sp. n., Minyomerus rutellirostris [JF2015], sp. n., and Minyomerus trisetosus [JF2015], sp. n. A cladistic analysis using 46 morphological characters of 22 terminal taxa (5/17 outgroup/ingroup) yielded a single most-parsimonious cladogram (L = 82, CI = 65, RI = 82). The analysis strongly supports the monophyly of Minyomerus [JF2015] with eight unreversed synapomorphies, and places Piscatopus griseus sec. O'Brien & Wibmer (1982) within the genus as sister to Minyomerus rutellirostris [JF2015]. Accordingly, Piscatopus sec. Sleeper (1960), syn. n. is changed to junior synonymy of Minyomerus [JF2015], and its sole member Piscatopus griseus sec. Sleeper (1960) is moved to Minyomerus [JF2015] as Minyomerus griseus [JF2015], comb. n. In addition, the formerly designated type Minyomerus innocuus Horn, 1876 sec. Pierce (1913), syn. n. is changed to junior synonymy of Minyomerus microps (Say, 1831) [JF2015] which has priority. The genus is widespread throughout western North America, ranging from Canada to Mexico and Baja California. Apparent patterns of interspecific diversity of exterior and genitalic morphology, varying host plant ranges, overlapping and widely extending species distributions, suggest an early origin for Minyomerus [JF2015], with a diversification that likely followed the development of North American desert biomes. Three species in the genus - i.e., Minyomerus languidus Horn, 1876 [JF2015], Minyomerus microps [JF2015], and Minyomerus trisetosus [JF2015] - are putatively considered parthenogenetic.
Collapse
Affiliation(s)
- M. Andrew Jansen
- School of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287-4501
| | - Nico M. Franz
- School of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287-4501
| |
Collapse
|