1
|
Ribeiro TM, Espíndola A. Integrated phylogenomic approaches in insect systematics. CURRENT OPINION IN INSECT SCIENCE 2024; 61:101150. [PMID: 38061460 DOI: 10.1016/j.cois.2023.101150] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/16/2023] [Accepted: 11/25/2023] [Indexed: 12/29/2023]
Abstract
The increased accessibility of genomic and imaging methods, and the improved access to ecological, spatial, and other natural history-related data is allowing for insect systematics to grow and find answers to central evolutionary and taxonomic questions. Today, integrated studies in insect phylogenomics and systematics are combining natural history, behavior, developmental biology, morphology, fossils, geographic range data, and ecological interactions. This integration is contributing to the clarification of evolutionary relationships, and the recognition of the role played by these factors on the evolution of insects. Future work should continue to build on these advances, seeking to further increase open-access databasing and support for natural history research, as well as expand its analytical palettes.
Collapse
Affiliation(s)
- Taís Ma Ribeiro
- Department of Entomology, University of Maryland, 4112 Plant Sciences Building, 4291 Fieldhouse Dr., College Park, MD 20742-4454, USA
| | - Anahí Espíndola
- Department of Entomology, University of Maryland, 4112 Plant Sciences Building, 4291 Fieldhouse Dr., College Park, MD 20742-4454, USA.
| |
Collapse
|
2
|
Du ZH, Hu WL, Li JQ, Shang X, You ZH, Chen ZZ, Huang YA. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data. Commun Biol 2023; 6:1268. [PMID: 38097699 PMCID: PMC10721875 DOI: 10.1038/s42003-023-05634-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Wei-Lin Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhuang-Zhuang Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
3
|
Lopes A, Carbonera J, Schmidt D, Garcia L, Rodrigues F, Abel M. Using terms and informal definitions to classify domain entities into top-level ontology concepts: An approach based on language models. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
4
|
Ngai J, Kalter M, Byrd JB, Racz R, He Y. Ontology-Based Classification and Analysis of Adverse Events Associated With the Usage of Chloroquine and Hydroxychloroquine. Front Pharmacol 2022; 13:812338. [PMID: 35401219 PMCID: PMC8983871 DOI: 10.3389/fphar.2022.812338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 03/07/2022] [Indexed: 12/20/2022] Open
Abstract
Multiple methodologies have been developed to identify and predict adverse events (AEs); however, many of these methods do not consider how patient population characteristics, such as diseases, age, and gender, affect AEs seen. In this study, we evaluated the utility of collecting and analyzing AE data related to hydroxychloroquine (HCQ) and chloroquine (CQ) from US Prescribing Information (USPIs, also called drug product labels or package inserts), the FDA Adverse Event Reporting System (FAERS), and peer-reviewed literature from PubMed/EMBASE, followed by AE classification and modeling using the Ontology of Adverse Events (OAE). Our USPI analysis showed that CQ and HCQ AE profiles were similar, although HCQ was reported to be associated with fewer types of cardiovascular, nervous system, and musculoskeletal AEs. According to EMBASE literature mining, CQ and HCQ were associated with QT prolongation (primarily when treating COVID-19), heart arrhythmias, development of Torsade des Pointes, and retinopathy (primarily when treating lupus). The FAERS data was analyzed by proportional ratio reporting, Chi-square test, and minimal case number filtering, followed by OAE classification. HCQ was associated with 63 significant AEs (including 21 cardiovascular AEs) for COVID-19 patients and 120 significant AEs (including 12 cardiovascular AEs) for lupus patients, supporting the hypothesis that the disease being treated affects the type and number of certain CQ/HCQ AEs that are manifested. Using an HCQ AE patient example reported in the literature, we also ontologically modeled how an AE occurs and what factors (e.g., age, biological sex, and medical history) are involved in the AE formation. The methodology developed in this study can be used for other drugs and indications to better identify patient populations that are particularly vulnerable to AEs.
Collapse
Affiliation(s)
- Jamie Ngai
- College of Pharmacy, University of Michigan, Ann Arbor, MI, United States
| | - Madison Kalter
- College of Literature, Science, and Arts, University of Michigan, Ann Arbor, MI, United States
| | - James Brian Byrd
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Rebecca Racz
- Division of Applied Regulatory Science, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Yongqun He
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, United States.,Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI, United States.,Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, United States
| |
Collapse
|
5
|
Porto DS, Dahdul WM, Lapp H, Balhoff JP, Vision TJ, Mabee PM, Uyeda J. Assessing Bayesian Phylogenetic Information Content of Morphological Data Using Knowledge from Anatomy Ontologies. Syst Biol 2022; 71:1290-1306. [PMID: 35285502 PMCID: PMC9558846 DOI: 10.1093/sysbio/syac022] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 02/09/2022] [Accepted: 03/05/2022] [Indexed: 11/18/2022] Open
Abstract
Morphology remains a primary source of phylogenetic information for many groups of organisms, and the only one for most fossil taxa. Organismal anatomy is not a collection of randomly assembled and independent “parts”, but instead a set of dependent and hierarchically nested entities resulting from ontogeny and phylogeny. How do we make sense of these dependent and at times redundant characters? One promising approach is using ontologies—structured controlled vocabularies that summarize knowledge about different properties of anatomical entities, including developmental and structural dependencies. Here, we assess whether evolutionary patterns can explain the proximity of ontology-annotated characters within an ontology. To do so, we measure phylogenetic information across characters and evaluate if it matches the hierarchical structure given by ontological knowledge—in much the same way as across-species diversity structure is given by phylogeny. We implement an approach to evaluate the Bayesian phylogenetic information (BPI) content and phylogenetic dissonance among ontology-annotated anatomical data subsets. We applied this to data sets representing two disparate animal groups: bees (Hexapoda: Hymenoptera: Apoidea, 209 chars) and characiform fishes (Actinopterygii: Ostariophysi: Characiformes, 463 chars). For bees, we find that BPI is not substantially explained by anatomy since dissonance is often high among morphologically related anatomical entities. For fishes, we find substantial information for two clusters of anatomical entities instantiating concepts from the jaws and branchial arch bones, but among-subset information decreases and dissonance increases substantially moving to higher-level subsets in the ontology. We further applied our approach to address particular evolutionary hypotheses with an example of morphological evolution in miniature fishes. While we show that phylogenetic information does match ontology structure for some anatomical entities, additional relationships and processes, such as convergence, likely play a substantial role in explaining BPI and dissonance, and merit future investigation. Our work demonstrates how complex morphological data sets can be interrogated with ontologies by allowing one to access how information is spread hierarchically across anatomical concepts, how congruent this information is, and what sorts of processes may play a role in explaining it: phylogeny, development, or convergence. [Apidae; Bayesian phylogenetic information; Ostariophysi; Phenoscape; phylogenetic dissonance; semantic similarity.]
Collapse
Affiliation(s)
- Diego S Porto
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061, USA
| | - Wasila M Dahdul
- UCI Libraries,University of California, Irvine, Irvine, CA 92623, USA
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - Hilmar Lapp
- Center for Genomic and Computational Biology, Duke University, 101 Science Drive, Durham, NC 27708, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, USA
| | - Todd J Vision
- Department of Biology and School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
- Battelle, National Ecological Observatory Network, Boulder, CO 80301, USA
| | - Josef Uyeda
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061, USA
| |
Collapse
|
6
|
Cui H, Ford B, Starr J, Reznicek A, Zhang L, Macklin JA. Authors’ attitude toward adopting a new workflow to improve the computability of phenotype publications. Database (Oxford) 2022; 2022:6519872. [PMID: 35106535 PMCID: PMC9278328 DOI: 10.1093/database/baac001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/24/2021] [Accepted: 01/10/2022] [Indexed: 11/13/2022]
Abstract
Critical to answering large-scale questions in biology is the integration of knowledge from different disciplines into a coherent, computable whole. Controlled vocabularies such as ontologies represent a clear path toward this goal. Using survey questionnaires, we examined the attitudes of biologists toward adopting controlled vocabularies in phenotype publications. Our questions cover current experience and overall attitude with controlled vocabularies, the awareness of the issues around ambiguity and inconsistency in phenotype descriptions and post-publication professional data curation, the preferred solutions and the effort and desired rewards for adopting a new authoring workflow. Results suggest that although the existence of controlled vocabularies is widespread, their use is not common. A majority of respondents (74%) are frustrated with ambiguity in phenotypic descriptions, and there is a strong agreement (mean agreement score 4.21 out of 5) that author curation would better reflect the original meaning of phenotype data. Moreover, the vast majority (85%) of researchers would try a new authoring workflow if resultant data were more consistent and less ambiguous. Even more respondents (93%) suggested that they would try and possibly adopt a new authoring workflow if it required 5% additional effort as compared to normal, but higher rates resulted in a steep decline in likely adoption rates. Among the four different types of rewards, two types of citations were the most desired incentives for authors to produce computable data. Overall, our results suggest the adoption of a new authoring workflow would be accelerated by a user-friendly and efficient software-authoring tool, an increased awareness of the challenges text ambiguity creates for external curators and an elevated appreciation of the benefits of controlled vocabularies.
Collapse
Affiliation(s)
- Hong Cui
- School of Information, University of Arizona , 1103 E. Second Street, Tucson, AZ 85705, USA
| | - Bruce Ford
- Department of Biological Sciences, University of Manitoba , 50 Sifton Road, Winnipeg, MB R3T 2N2, Canada
| | - Julian Starr
- Department of Biology, University of Ottawa , 30 Marie Curie Road, Ottawa, ON K1N 6N5, Canada
| | - Anton Reznicek
- SLA Herbarium, University of Michigan , 3600 Varsity Drive #1046, Ann Arbor, MI 48019, USA
| | - Limin Zhang
- School of Information, University of Arizona , 1103 E. Second Street, Tucson, AZ 85705, USA
| | - James A Macklin
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada , 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada
| |
Collapse
|
7
|
Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021; 12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. RESULTS Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. CONCLUSIONS We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| |
Collapse
|
8
|
Porto DS, Almeida EAB, Pennell MW. Investigating Morphological Complexes Using Informational Dissonance and Bayes Factors: A Case Study in Corbiculate Bees. Syst Biol 2021; 70:295-306. [PMID: 32722788 PMCID: PMC7882150 DOI: 10.1093/sysbio/syaa059] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 07/16/2020] [Accepted: 07/17/2020] [Indexed: 11/22/2022] Open
Abstract
It is widely recognized that different regions of a genome often have different evolutionary histories and that ignoring this variation when estimating phylogenies can be misleading. However, the extent to which this is also true for morphological data is still largely unknown. Discordance among morphological traits might plausibly arise due to either variable convergent selection pressures or else phenomena such as hemiplasy. Here, we investigate patterns of discordance among 282 morphological characters, which we scored for 50 bee species particularly targeting corbiculate bees, a group that includes the well-known eusocial honeybees and bumblebees. As a starting point for selecting the most meaningful partitions in the data, we grouped characters as morphological modules, highly integrated trait complexes that as a result of developmental constraints or coordinated selection we expect to share an evolutionary history and trajectory. In order to assess conflict and coherence across and within these morphological modules, we used recently developed approaches for computing Bayesian phylogenetic information allied with model comparisons using Bayes factors. We found that despite considerable conflict among morphological complexes, accounting for among-character and among-partition rate variation with individual gamma distributions, rate multipliers, and linked branch lengths can lead to coherent phylogenetic inference using morphological data. We suggest that evaluating information content and dissonance among partitions is a useful step in estimating phylogenies from morphological data, just as it is with molecular data. Furthermore, we argue that adopting emerging approaches for investigating dissonance in genomic datasets may provide new insights into the integration and evolution of anatomical complexes. [Apidae; entropy; morphological modules; phenotypic integration; phylogenetic information.].
Collapse
Affiliation(s)
- Diego S Porto
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP), Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver BC V6T 1Z4, Canada
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, 926 West Campus Drive, Blacksburg, VA 24061 USA
| | - Eduardo A B Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto (FFCLRP), Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
| | - Matthew W Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver BC V6T 1Z4, Canada
| |
Collapse
|
9
|
Chan L, Vasilevsky N, Thessen A, McMurry J, Haendel M. The landscape of nutri-informatics: a review of current resources and challenges for integrative nutrition research. Database (Oxford) 2021; 2021:baab003. [PMID: 33494105 PMCID: PMC7833928 DOI: 10.1093/database/baab003] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/18/2020] [Accepted: 01/07/2021] [Indexed: 12/14/2022]
Abstract
Informatics has become an essential component of research in the past few decades, capitalizing on the efficiency and power of computation to improve the knowledge gained from increasing quantities and types of data. While other fields of research such as genomics are well represented in informatics resources, nutrition remains underrepresented. Nutrition is one of the most integral components of human life, and it impacts individuals far beyond just nutrient provisions. For example, nutrition plays a role in cultural practices, interpersonal relationships and body image. Despite this, integrated computational investigations have been limited due to challenges within nutrition informatics (nutri-informatics) and nutrition data. The purpose of this review is to describe the landscape of nutri-informatics resources available for use in computational nutrition research and clinical utilization. In particular, we will focus on the application of biomedical ontologies and their potential to improve the standardization and interoperability of nutrition terminologies and relationships between nutrition and other biomedical disciplines such as disease and phenomics. Additionally, we will highlight challenges currently faced by the nutri-informatics community including experimental design, data aggregation and the roles scientific journals and primary nutrition researchers play in facilitating data reuse and successful computational research. Finally, we will conclude with a call to action to create and follow community standards regarding standardization of language, documentation specifications and requirements for data reuse. With the continued movement toward community standards of this kind, the entire nutrition research community can transition toward greater usage of Findability, Accessibility, Interoperability and Reusability principles and in turn more transparent science.
Collapse
Affiliation(s)
- Lauren Chan
- College of Public Health and Human Sciences, Oregon State University, 101 Milam Hall, Corvallis, OR 97331, USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, 3181 SW Sam Jackson Park Rd SN4N, Portland, OR 97239, USA
| | - Anne Thessen
- Environmental and Molecular Toxicology Department, Oregon State University, 1007 Ag & Life Sciences Building, Corvallis, OR 97331, USA
| | - Julie McMurry
- College of Public Health and Human Sciences, Oregon State University, 101 Milam Hall, Corvallis, OR 97331, USA
| | - Melissa Haendel
- Oregon Clinical and Translational Research Institute, Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, 3181 SW Sam Jackson Park Rd SN4N, Portland, OR 97239, USA
- Environmental and Molecular Toxicology Department, Oregon State University, 1007 Ag & Life Sciences Building, Corvallis, OR 97331, USA
| |
Collapse
|
10
|
Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
11
|
Cui H, Zhang L, Ford B, Cheng HL, Macklin JA, Reznicek A, Starr J. Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5995854. [PMID: 33216896 PMCID: PMC7678789 DOI: 10.1093/database/baaa079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/24/2020] [Accepted: 08/27/2020] [Indexed: 12/31/2022]
Abstract
To use published phenotype information in computational analyses, there have been efforts to convert descriptions of phenotype characters from human languages to ontologized statements. This postpublication curation process is not only slow and costly, it is also burdened with significant intercurator variation (including curator-author variation), due to different interpretations of a character by various individuals. This problem is inherent in any human-based intellectual activity. To address this problem, making scientific publications semantically clear (i.e. computable) by the authors at the time of publication is a critical step if we are to avoid postpublication curation. To help authors efficiently produce species phenotypes while producing computable data, we are experimenting with an author-driven ontology development approach and developing and evaluating a series of ontology-aware software modules that would create publishable species descriptions that are readily useable in scientific computations. The first software module prototype called Measurement Recorder has been developed to assist authors in defining continuous measurements and reported in this paper. Two usability studies of the software were conducted with 22 undergraduate students majoring in information science and 32 in biology. Results suggest that participants can use Measurement Recorder without training and they find it easy to use after limited practice. Participants also appreciate the semantic enhancement features. Measurement Recorder's character reuse features facilitate character convergence among participants by 48% and have the potential to further reduce user errors in defining characters. A set of software design issues have also been identified and then corrected. Measurement Recorder enables authors to record measurements in a semantically clear manner and enriches phenotype ontology along the way. Future work includes representing the semantic data as Resource Description Framework (RDF) knowledge graphs and characterizing the division of work between authors as domain knowledge providers and ontology engineers as knowledge formalizers in this new author-driven ontology development approach.
Collapse
Affiliation(s)
- Hong Cui
- School of Information, University of Arizona, Tucson, AZ 85705, USA
| | - Limin Zhang
- School of Information, University of Arizona, Tucson, AZ 85705, USA
| | - Bruce Ford
- Department of Biological sciences, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | - Hsin-Liang Cheng
- Curtis Laws Wilson Library, Missouri University of Science and Technology, Rolla, MO 65409, USA
| | - James A Macklin
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada
| | - Anton Reznicek
- LSA Herbarium, University of Michigan, Ann Arbor, MI 48019, USA
| | - Julian Starr
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
12
|
Thessen AE, Walls RL, Vogt L, Singer J, Warren R, Buttigieg PL, Balhoff JP, Mungall CJ, McGuinness DL, Stucky BJ, Yoder MJ, Haendel MA. Transforming the study of organisms: Phenomic data models and knowledge bases. PLoS Comput Biol 2020; 16:e1008376. [PMID: 33232313 PMCID: PMC7685442 DOI: 10.1371/journal.pcbi.1008376] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.
Collapse
Affiliation(s)
- Anne E. Thessen
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- Ronin Institute for Independent Scholarship, Monclair, New Jersey, United States of America
| | - Ramona L. Walls
- Bio5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
| | | | | | - Pier Luigi Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, Champaign, Illinois, United States of America
| | - Melissa A. Haendel
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
13
|
Mabee PM, Balhoff JP, Dahdul WM, Lapp H, Mungall CJ, Vision TJ. A Logical Model of Homology for Comparative Biology. Syst Biol 2020; 69:345-362. [PMID: 31596473 PMCID: PMC7672696 DOI: 10.1093/sysbio/syz067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 09/20/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open
Abstract
There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.].
Collapse
Affiliation(s)
- Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, USA
| | - Wasila M Dahdul
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - Hilmar Lapp
- Center for Genomic and Computational Biology, Duke University, 101 Science Drive, Durham, NC 27708, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Todd J Vision
- Department of Biology and School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3280, USA
| |
Collapse
|
14
|
The Spider Anatomy Ontology (SPD)—A Versatile Tool to Link Anatomy with Cross-Disciplinary Data. DIVERSITY 2019. [DOI: 10.3390/d11100202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Spiders are a diverse group with a high eco-morphological diversity, which complicates anatomical descriptions especially with regard to its terminology. New terms are constantly proposed, and definitions and limits of anatomical concepts are regularly updated. Therefore, it is often challenging to find the correct terms, even for trained scientists, especially when the terminology has obstacles such as synonyms, disputed definitions, ambiguities, or homonyms. Here, we present the Spider Anatomy Ontology (SPD), which we developed combining the functionality of a glossary (a controlled defined vocabulary) with a network of formalized relations between terms that can be used to compute inferences. The SPD follows the guidelines of the Open Biomedical Ontologies and is available through the NCBO BioPortal (ver. 1.1). It constitutes of 757 valid terms and definitions, is rooted with the Common Anatomy Reference Ontology (CARO), and has cross references to other ontologies, especially of arthropods. The SPD offers a wealth of anatomical knowledge that can be used as a resource for any scientific study as, for example, to link images to phylogenetic datasets, compute structural complexity over phylogenies, and produce ancestral ontologies. By using a common reference in a standardized way, the SPD will help bridge diverse disciplines, such as genomics, taxonomy, systematics, evolution, ecology, and behavior.
Collapse
|
15
|
Sluys R. The evolutionary terrestrialization of planarian flatworms (Platyhelminthes, Tricladida, Geoplanidae): a review and research programme. ZOOSYST EVOL 2019. [DOI: 10.3897/zse.95.38727] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The terrestrialization of animal life from aquatic ancestors is a key transition during the history of life. Planarian flatworms form an ideal group of model organisms to study this colonization of the land because they have freshwater, marine, and terrestrial representatives. The widespread occurrence of terrestrial flatworms is a testament to their remarkable success occupying a new niche on land. This lineage of terrestrial worms provides a unique glimpse of an evolutionary pathway by which a group of early divergent aquatic, invertebrate metazoans has moved onto land. Land flatworms are among the first groups of animals to have evolved terrestrial adaptations and to have extensively radiated. Study of this terrestrialization process and the anatomical key innovations facilitating their colonization of the land will contribute greatly to our understanding of this important step in Metazoan history. The context and scientific background are reviewed regarding the evolutionary terrestrialization of land flatworms. Furthermore, a framework of a research programme is sketched, which has as its main objective to test hypotheses on the evolution of land planarians, specifically whether particular anatomical and physiological key innovations have contributed to their evolutionary successful terrestrial colonization and radiation. In this context special attention is paid to the respiration in aquatic and terrestrial planarians. The research programme depends on a comprehensive phylogenetic analysis of all major taxa of the land flatworms on the basis of both molecular and anatomical data. The data sets should be analyzed phylogenetically with a suite of phylogenetic inference methods. Building on such robust reconstructions, it will be possible to study associations between key innovations and the evolutionary terrestrialization process.
Collapse
|
16
|
Ferris E, Abegglen LM, Schiffman JD, Gregg C. Accelerated Evolution in Distinctive Species Reveals Candidate Elements for Clinically Relevant Traits, Including Mutation and Cancer Resistance. Cell Rep 2019. [PMID: 29514101 PMCID: PMC6294302 DOI: 10.1016/j.celrep.2018.02.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The identity of most functional elements in the mammalian genome and the phenotypes they impact are unclear. Here, we perform a genomewide comparative analysis of patterns of accelerated evolution in species with highly distinctive traits to discover candidate functional elements for clinically important phenotypes. We identify accelerated regions (ARs) in the elephant, hibernating bat, orca, dolphin, naked mole rat, and thirteen-lined ground squirrel lineages in mammalian conserved regions, uncovering ~33,000 elements that bind hundreds of different regulatory proteins in humans and mice. ARs in the elephant, the largest land mammal, are uniquely enriched near elephant DNA damage response genes. The genomic hotspot for elephant ARs is the E3 ligase subunit of the Fanconi anemia complex, a master regulator of DNA repair. Additionally, ARs in the six species are associated with specific human clinical phenotypes that have apparent concordance with overt traits in each species.
Collapse
Affiliation(s)
- Elliott Ferris
- Department of Neurobiology and Anatomy, University of Utah, Salt Lake City, UT 84132-3401, USA
| | - Lisa M Abegglen
- Department of Pediatrics, University of Utah, Salt Lake City, UT 84132-3401, USA; Huntsman Cancer Institute, Salt Lake City, UT, USA
| | - Joshua D Schiffman
- Department of Pediatrics, University of Utah, Salt Lake City, UT 84132-3401, USA; Department of Oncological Sciences, University of Utah, Salt Lake City, UT 84132-3401, USA; Huntsman Cancer Institute, Salt Lake City, UT, USA
| | - Christopher Gregg
- Department of Neurobiology and Anatomy, University of Utah, Salt Lake City, UT 84132-3401, USA; Department of Human Genetics, University of Utah, Salt Lake City, UT 84132-3401, USA; New York Stem Cell Foundation, New York, NY, USA.
| |
Collapse
|
17
|
Cui H, Macklin JA, Sachs J, Reznicek A, Starr J, Ford B, Penev L, Chen HL. Incentivising use of structured language in biological descriptions: Author-driven phenotype data and ontology production. Biodivers Data J 2018; 6:e29616. [PMID: 30473620 PMCID: PMC6235995 DOI: 10.3897/bdj.6.e29616] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 10/23/2018] [Indexed: 01/17/2023] Open
Abstract
Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production.
Collapse
Affiliation(s)
- Hong Cui
- University of Arizona, TUCSON, United States of AmericaUniversity of ArizonaTUCSONUnited States of America
| | - James A. Macklin
- Agriculture and Agri-Food Canada, Ottawa, CanadaAgriculture and Agri-Food CanadaOttawaCanada
| | - Joel Sachs
- Agriculture and Agri-Food Canada, Ottawa, CanadaAgriculture and Agri-Food CanadaOttawaCanada
| | - Anton Reznicek
- University of Michigan, Ann Arbor, United States of AmericaUniversity of MichiganAnn ArborUnited States of America
| | - Julian Starr
- University of Ottawa, Ottawa, CanadaUniversity of OttawaOttawaCanada
| | - Bruce Ford
- University of Manitoba, Winnipeg, CanadaUniversity of ManitobaWinnipegCanada
| | - Lyubomir Penev
- Pensoft Publishers & Bulgarian Academy of Sciences, Sofia, BulgariaPensoft Publishers & Bulgarian Academy of SciencesSofiaBulgaria
| | - Hsin-Liang Chen
- University of Massachusetts at Boston, Boston, United States of AmericaUniversity of Massachusetts at BostonBostonUnited States of America
| |
Collapse
|
18
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
19
|
Jonquet C, Toulet A, Dutta B, Emonet V. Harnessing the Power of Unified Metadata in an Ontology Repository: The Case of AgroPortal. JOURNAL ON DATA SEMANTICS 2018. [DOI: 10.1007/s13740-018-0091-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Muñoz-Fuentes V, Cacheiro P, Meehan TF, Aguilar-Pimentel JA, Brown SDM, Flenniken AM, Flicek P, Galli A, Mashhadi HH, Hrabě de Angelis M, Kim JK, Lloyd KCK, McKerlie C, Morgan H, Murray SA, Nutter LMJ, Reilly PT, Seavitt JR, Seong JK, Simon M, Wardle-Jones H, Mallon AM, Smedley D, Parkinson HE. The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation. CONSERV GENET 2018; 19:995-1005. [PMID: 30100824 PMCID: PMC6061128 DOI: 10.1007/s10592-018-1072-9] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 05/03/2018] [Indexed: 01/08/2023]
Abstract
The International Mouse Phenotyping Consortium (IMPC) is building a catalogue of mammalian gene function by producing and phenotyping a knockout mouse line for every protein-coding gene. To date, the IMPC has generated and characterised 5186 mutant lines. One-third of the lines have been found to be non-viable and over 300 new mouse models of human disease have been identified thus far. While current bioinformatics efforts are focused on translating results to better understand human disease processes, IMPC data also aids understanding genetic function and processes in other species. Here we show, using gorilla genomic data, how genes essential to development in mice can be used to help assess the potentially deleterious impact of gene variants in other species. This type of analyses could be used to select optimal breeders in endangered species to maintain or increase fitness and avoid variants associated to impaired-health phenotypes or loss-of-function mutations in genes of critical importance. We also show, using selected examples from various mammal species, how IMPC data can aid in the identification of candidate genes for studying a condition of interest, deliver information about the mechanisms involved, or support predictions for the function of genes that may play a role in adaptation. With genotyping costs decreasing and the continued improvements of bioinformatics tools, the analyses we demonstrate can be routinely applied.
Collapse
Affiliation(s)
- Violeta Muñoz-Fuentes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Pilar Cacheiro
- Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Terrence F. Meehan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Juan Antonio Aguilar-Pimentel
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
| | - Steve D. M. Brown
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| | - Ann M. Flenniken
- The Centre for Phenogenomics, Toronto, ON M5T 3H7 Canada
- Mount Sinai Hospital, Toronto, ON M5G 1X5 Canada
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | | | - Hamed Haseli Mashhadi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Martin Hrabě de Angelis
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- German Center for Diabetes Research (DZD), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
- School of Life Science Weihenstephan, Technische Universität München, Alte Akademie 8, 85354 Freising, Germany
| | - Jong Kyoung Kim
- Department of New Biology, DGIST, Daegu, 42988 Republic of Korea
| | - K. C. Kent Lloyd
- Mouse Biology Program, University of California, Davis, CA 95618 USA
| | - Colin McKerlie
- The Centre for Phenogenomics, Toronto, ON M5T 3H7 Canada
- Mount Sinai Hospital, Toronto, ON M5G 1X5 Canada
- The Hospital for Sick Children, Toronto, ON M5G 1X84 Canada
| | - Hugh Morgan
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| | | | - Lauryl M. J. Nutter
- The Centre for Phenogenomics, Toronto, ON M5T 3H7 Canada
- The Hospital for Sick Children, Toronto, ON M5G 1X84 Canada
| | - Patrick T. Reilly
- PHENOMIN-iCS, 1 Rue Laurent Fries, 67404 Illkirch Cedex, Alsace France
| | - John R. Seavitt
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Je Kyung Seong
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Interdisciplinary Program for Bioinformatics and Program for Cancer Biology, Seoul National University, Seoul, Republic of Korea
| | - Michelle Simon
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| | | | - Ann-Marie Mallon
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
| | - Damian Smedley
- Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Helen E. Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - the IMPC consortium
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Clinical Pharmacology, William Harvey Research Institute, School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
- German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Medical Research Council Harwell Institute (Mammalian Genetics Unit and Mary Lyon Centre), Harwell, Oxfordshire OX11 0RD UK
- The Centre for Phenogenomics, Toronto, ON M5T 3H7 Canada
- Mount Sinai Hospital, Toronto, ON M5G 1X5 Canada
- Wellcome Trust Sanger Institute, Cambridge, CB10 1SA UK
- German Center for Diabetes Research (DZD), Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
- School of Life Science Weihenstephan, Technische Universität München, Alte Akademie 8, 85354 Freising, Germany
- Department of New Biology, DGIST, Daegu, 42988 Republic of Korea
- Mouse Biology Program, University of California, Davis, CA 95618 USA
- The Hospital for Sick Children, Toronto, ON M5G 1X84 Canada
- The Jackson Laboratory, Bar Harbor, ME 04609 USA
- PHENOMIN-iCS, 1 Rue Laurent Fries, 67404 Illkirch Cedex, Alsace France
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- Laboratory of Developmental Biology and Genomics, College of Veterinary Medicine, Interdisciplinary Program for Bioinformatics and Program for Cancer Biology, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
21
|
Dahdul W, Manda P, Cui H, Balhoff JP, Dececchi TA, Ibrahim N, Lapp H, Vision T, Mabee PM. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems. Database (Oxford) 2018; 2018:5255130. [PMID: 30576485 PMCID: PMC6301375 DOI: 10.1093/database/bay110] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 08/22/2018] [Accepted: 09/24/2018] [Indexed: 11/12/2022]
Abstract
Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.
Collapse
Affiliation(s)
| | - Prashanti Manda
- University of North Carolina at Greensboro, Greensboro, NC, USA
| | - Hong Cui
- University of Arizona, Tucson, AZ, USA
| | - James P Balhoff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - T Alexander Dececchi
- University of South Dakota, Vermillion, SD, USA
- Current affiliation: University of Pittsburgh at Johnstown, Johnstown, PA, USA
| | - Nizar Ibrahim
- University of Chicago, Chicago, IL, USA
- Current affiliation: University of Detroit Mercy, Detroit, MI, USA & University of Portsmouth, Portsmouth, UK
| | | | - Todd Vision
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | |
Collapse
|
22
|
|
23
|
Wipfler B, Pohl H, Yavorskaya MI, Beutel RG. A review of methods for analysing insect structures - the role of morphology in the age of phylogenomics. CURRENT OPINION IN INSECT SCIENCE 2016; 18:60-68. [PMID: 27939712 DOI: 10.1016/j.cois.2016.09.004] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 09/26/2016] [Indexed: 06/06/2023]
Abstract
Techniques currently used in insect morphology are outlined briefly. Scanning electron microscopy (SEM) and microphotography are used mainly for documenting external features, the former providing more information on tiny surface structures and the latter on coloration, transparency and degree of sclerotization. A broad spectrum of methods is now available for anatomical studies: histological serial sections, confocal laser scanning microscopy (CLSM), light-sheet fluorescence microscopy (LSFM), serial block-face scanning electron microscopy (SBFSEM), dual beam scanning electron microscopy (FIB-SEM), nuclear magnetic resonance imaging (NMRI), and μ-computed tomography (micro-CT). The use of SBFSEM and FIB-SEM is restricted to extremely small samples. NMRI is used mainly in in vivo studies. Micro-computed tomography, in combination with computer-based reconstruction, has greatly accelerated the acquisition of high quality data in a phylogenetic context. Morphology will continue to play a vital role in phylogenetic and evolutionary investigations. It provides independent data for checking the plausibility of molecular phylogenies and is the only source of information for placing extinct taxa. It is the necessary basis for reconstructing character evolution on the phenotypic level and for developing complex evolutionary scenarios. Computer-based anatomical ontologies are an additional future perspective of morphological work.
Collapse
Affiliation(s)
- Benjamin Wipfler
- Entomology Group, Institut für Spezielle Zoologie und Evolutionsbiologie mit Phyletischem Museum, Friedrich-Schiller-Universität Jena, 07743 Jena, Germany
| | - Hans Pohl
- Entomology Group, Institut für Spezielle Zoologie und Evolutionsbiologie mit Phyletischem Museum, Friedrich-Schiller-Universität Jena, 07743 Jena, Germany
| | - Margarita I Yavorskaya
- Entomology Group, Institut für Spezielle Zoologie und Evolutionsbiologie mit Phyletischem Museum, Friedrich-Schiller-Universität Jena, 07743 Jena, Germany
| | - Rolf G Beutel
- Entomology Group, Institut für Spezielle Zoologie und Evolutionsbiologie mit Phyletischem Museum, Friedrich-Schiller-Universität Jena, 07743 Jena, Germany.
| |
Collapse
|
24
|
Stöhr S, Martynov A. Paedomorphosis as an Evolutionary Driving Force: Insights from Deep-Sea Brittle Stars. PLoS One 2016; 11:e0164562. [PMID: 27806039 PMCID: PMC5091845 DOI: 10.1371/journal.pone.0164562] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 09/05/2016] [Indexed: 11/29/2022] Open
Abstract
Heterochronic development has been proposed to have played an important role in the evolution of echinoderms. In the class Ophiuroidea, paedomorphosis (retention of juvenile characters into adulthood) has been documented in the families Ophiuridae and Ophiolepididae but not been investigated on a broader taxonomic scale. Historical errors, confusing juvenile stages with paedomorphic species, show the difficulties in correctly identifying the effects of heterochrony on development and evolution. This study presents a detailed analysis of 40 species with morphologies showing various degrees of juvenile appearance in late ontogeny. They are compared to a range of early ontogenetic stages from paedomorphic and non-paedomorphic species. Both quantitative and qualitative measurements are taken and analysed. The results suggest that strongly paedomorphic species are usually larger than other species at comparable developmental stage. The findings support recent notions of polyphyletic origin of the families Ophiuridae and Ophiolepididae. The importance of paedomorphosis and its correct recognition for the practice of taxonomy and phylogeny are emphasized.
Collapse
Affiliation(s)
- Sabine Stöhr
- Swedish Museum of Natural History, Department of Zoology, Stockholm, Sweden
| | | |
Collapse
|
25
|
Dececchi TA, Mabee PM, Blackburn DC. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices. PLoS One 2016; 11:e0155680. [PMID: 27191170 PMCID: PMC4871461 DOI: 10.1371/journal.pone.0155680] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 05/03/2016] [Indexed: 01/17/2023] Open
Abstract
Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.
Collapse
Affiliation(s)
- T. Alex Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - David C. Blackburn
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
26
|
Blank CE, Cui H, Moore LR, Walls RL. MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions. J Biomed Semantics 2016; 7:18. [PMID: 27076900 PMCID: PMC4830071 DOI: 10.1186/s13326-016-0060-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 04/02/2016] [Indexed: 12/03/2022] Open
Abstract
Background MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature. Results MicrO currently has ~14550 classes (~2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by ~24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO’s Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology. Conclusions By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we intend MicrO to be a powerful new tool to increase the computing power of bioinformatics tools such as the automated text mining of prokaryotic taxonomic descriptions using natural language processing. We also intend MicrO to support the development of new bioinformatics tools that aim to develop new connections between microbial phenotypes and genotypes (i.e., the gene content in genomes). Future ontology development will include incorporation of pathogenic phenotypes and prokaryotic habitats. Electronic supplementary material The online version of this article (doi:10.1186/s13326-016-0060-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Carrine E Blank
- Department of Geosciences, University of Montana, Missoula, MT 59812 USA
| | - Hong Cui
- School of Information, University of Arizona, Tucson, AZ 85719 USA
| | - Lisa R Moore
- Department of Biological Sciences, University of Southern Maine, Portland, ME 04104 USA
| | | |
Collapse
|
27
|
Druzinsky RE, Balhoff JP, Crompton AW, Done J, German RZ, Haendel MA, Herrel A, Herring SW, Lapp H, Mabee PM, Muller HM, Mungall CJ, Sternberg PW, Van Auken K, Vinyard CJ, Williams SH, Wall CE. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals. PLoS One 2016; 11:e0149102. [PMID: 26870952 PMCID: PMC4752357 DOI: 10.1371/journal.pone.0149102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/27/2016] [Indexed: 01/27/2023] Open
Abstract
Background In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Development and Testing of the Ontology Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Results and Significance Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
Collapse
Affiliation(s)
- Robert E. Druzinsky
- Department of Oral Biology, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| | - James P. Balhoff
- RTI International, Research Triangle Park, North Carolina, United States of America
| | - Alfred W. Crompton
- Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - James Done
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Rebecca Z. German
- Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
| | - Melissa A. Haendel
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Anthony Herrel
- Département d’Ecologie et de Gestion de la Biodiversité, Museum National d’Histoire Naturelle, Paris, France
| | - Susan W. Herring
- University of Washington, Department of Orthodontics, Seattle, Washington, United States of America
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Hans-Michael Muller
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Christopher J. Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Paul W. Sternberg
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
- Howard Hughes Medical Institute, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Christopher J. Vinyard
- Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
| | - Susan H. Williams
- Department of Biomedical Sciences, Ohio University Heritage College of Osteopathic Medicine, Athens, Ohio, United States of America
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
28
|
|
29
|
Thessen AE, Bunker DE, Buttigieg PL, Cooper LD, Dahdul WM, Domisch S, Franz NM, Jaiswal P, Lawrence-Dill CJ, Midford PE, Mungall CJ, Ramírez MJ, Specht CD, Vogt L, Vos RA, Walls RL, White JW, Zhang G, Deans AR, Huala E, Lewis SE, Mabee PM. Emerging semantics to link phenotype and environment. PeerJ 2015; 3:e1470. [PMID: 26713234 PMCID: PMC4690371 DOI: 10.7717/peerj.1470] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 11/12/2015] [Indexed: 11/20/2022] Open
Abstract
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.
Collapse
Affiliation(s)
- Anne E. Thessen
- Ronin Institute for Independent Scholarship, Monclair, NJ, United States
- The Data Detektiv, Waltham, MA, United States
| | - Daniel E. Bunker
- Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, United States
| | - Pier Luigi Buttigieg
- HGF-MPG Group for Deep Sea Ecology and Technology, Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar-und Meeresforschung, Bremerhaven, Germany
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Wasila M. Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| | - Sami Domisch
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Carolyn J. Lawrence-Dill
- Departments of Genetics, Development and Cell Biology and Agronomy, Iowa State University, Ames, IA, United States
| | | | | | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales–CONICET, Buenos Aires, Argentina
| | - Chelsea D. Specht
- Departments of Plant and Microbial Biology & Integrative Biology, University of California, Berkeley, CA, United States
| | - Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany
| | | | - Ramona L. Walls
- iPlant Collaborative, University of Arizona, Tucson, AZ, United States
| | - Jeffrey W. White
- US Arid Land Agricultural Research Center, United States Department of Agriculture—ARS, Maricopa, AZ, United States
| | - Guanyang Zhang
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, PA, United States
| | - Eva Huala
- Phoenix Bioinformatics, Redwood City, CA, United States
| | - Suzanna E. Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| |
Collapse
|
30
|
Edmunds RC, Su B, Balhoff JP, Eames BF, Dahdul WM, Lapp H, Lundberg JG, Vision TJ, Dunham RA, Mabee PM, Westerfield M. Phenoscape: Identifying Candidate Genes for Evolutionary Phenotypes. Mol Biol Evol 2015; 33:13-24. [PMID: 26500251 PMCID: PMC4693980 DOI: 10.1093/molbev/msv223] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology.
Collapse
Affiliation(s)
| | - Baofeng Su
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University
| | | | - B Frank Eames
- Department of Anatomy and Cell Biology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Wasila M Dahdul
- National Evolutionary Synthesis Center, Durham, NC Department of Biology, University of South Dakota
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, NC
| | - John G Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, Philadelphia, Philadelphia, PA
| | - Todd J Vision
- National Evolutionary Synthesis Center, Durham, NC Department of Biology, University of North Carolina, Chapel Hill
| | - Rex A Dunham
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University
| | | | | |
Collapse
|
31
|
Manda P, Balhoff JP, Lapp H, Mabee P, Vision TJ. Using the phenoscape knowledgebase to relate genetic perturbations to phenotypic evolution. Genesis 2015. [PMID: 26220875 DOI: 10.1002/dvg.22878] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The abundance of phenotypic diversity among species can enrich our knowledge of development and genetics beyond the limits of variation that can be observed in model organisms. The Phenoscape Knowledgebase (KB) is designed to enable exploration and discovery of phenotypic variation among species. Because phenotypes in the KB are annotated using standard ontologies, evolutionary phenotypes can be compared with phenotypes from genetic perturbations in model organisms. To illustrate the power of this approach, we review the use of the KB to find taxa showing evolutionary variation similar to that of a query gene. Matches are made between the full set of phenotypes described for a gene and an evolutionary profile, the latter of which is defined as the set of phenotypes that are variable among the daughters of any node on the taxonomic tree. Phenoscape's semantic similarity interface allows the user to assess the statistical significance of each match and flags matches that may only result from differences in annotation coverage between genetic and evolutionary studies. Tools such as this will help meet the challenge of relating the growing volume of genetic knowledge in model organisms to the diversity of phenotypes in nature. The Phenoscape KB is available at http://kb.phenoscape.org.
Collapse
Affiliation(s)
- Prashanti Manda
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina.,US National Evolutionary Synthesis Center, Durham, North Carolina
| | - James P Balhoff
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina.,US National Evolutionary Synthesis Center, Durham, North Carolina
| | - Hilmar Lapp
- US National Evolutionary Synthesis Center, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota
| | - Todd J Vision
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina.,US National Evolutionary Synthesis Center, Durham, North Carolina
| |
Collapse
|
32
|
Ruzicka L, Bradford YM, Frazer K, Howe DG, Paddock H, Ramachandran S, Singer A, Toro S, Van Slyke CE, Eagle AE, Fashena D, Kalita P, Knight J, Mani P, Martin R, Moxon SAT, Pich C, Schaper K, Shao X, Westerfield M. ZFIN, The zebrafish model organism database: Updates and new directions. Genesis 2015; 53:498-509. [PMID: 26097180 DOI: 10.1002/dvg.22868] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 06/16/2015] [Accepted: 06/17/2015] [Indexed: 12/19/2022]
Abstract
The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for genetic and genomic data from zebrafish (Danio rerio) research. ZFIN staff curate detailed information about genes, mutants, genotypes, reporter lines, sequences, constructs, antibodies, knockdown reagents, expression patterns, phenotypes, gene product function, and orthology from publications. Researchers can submit mutant, transgenic, expression, and phenotype data directly to ZFIN and use the ZFIN Community Wiki to share antibody and protocol information. Data can be accessed through topic-specific searches, a new site-wide search, and the data-mining resource ZebrafishMine (http://zebrafishmine.org). Data download and web service options are also available. ZFIN collaborates with major bioinformatics organizations to verify and integrate genomic sequence data, provide nomenclature support, establish reciprocal links, and participate in the development of standardized structured vocabularies (ontologies) used for data annotation and searching. ZFIN-curated gene, function, expression, and phenotype data are available for comparative exploration at several multi-species resources. The use of zebrafish as a model for human disease is increasing. ZFIN is supporting this growing area with three major projects: adding easy access to computed orthology data from gene pages, curating details of the gene expression pattern changes in mutant fish, and curating zebrafish models of human diseases.
Collapse
Affiliation(s)
| | | | - Ken Frazer
- ZFIN, 5291 University of Oregon, Eugene, Oregon
| | | | | | | | - Amy Singer
- ZFIN, 5291 University of Oregon, Eugene, Oregon
| | | | | | | | | | | | | | - Prita Mani
- ZFIN, 5291 University of Oregon, Eugene, Oregon
| | - Ryan Martin
- ZFIN, 5291 University of Oregon, Eugene, Oregon
| | | | | | | | - Xiang Shao
- ZFIN, 5291 University of Oregon, Eugene, Oregon
| | | |
Collapse
|
33
|
Bouchard F. Understanding Colonial Traits Using Symbiosis Research and Ecosystem Ecology. ACTA ACUST UNITED AC 2015. [DOI: 10.1162/biot.2009.4.3.240] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
34
|
Collier N, Oellrich A, Groza T. Toward knowledge support for analysis and interpretation of complex traits. Genome Biol 2015; 14:214. [PMID: 24079802 PMCID: PMC4053827 DOI: 10.1186/gb-2013-14-9-214] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The systematic description of complex traits, from the organism to the cellular level, is important for hypothesis generation about underlying disease mechanisms. We discuss how intelligent algorithms might provide support, leading to faster throughput.
Collapse
|
35
|
Thacker RW, Díaz MC, Kerner A, Vignes-Lebbe R, Segerdell E, Haendel MA, Mungall CJ. The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. J Biomed Semantics 2014; 5:39. [PMID: 25276334 PMCID: PMC4177528 DOI: 10.1186/2041-1480-5-39] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 07/22/2014] [Indexed: 12/31/2022] Open
Abstract
Background Porifera (sponges) are ancient basal metazoans that lack organs. They provide insight into key evolutionary transitions, such as the emergence of multicellularity and the nervous system. In addition, their ability to synthesize unusual compounds offers potential biotechnical applications. However, much of the knowledge of these organisms has not previously been codified in a machine-readable way using modern web standards. Results The Porifera Ontology is intended as a standardized coding system for sponge anatomical features currently used in systematics. The ontology is available from http://purl.obolibrary.org/obo/poro.owl, or from the project homepage http://porifera-ontology.googlecode.com/. The version referred to in this manuscript is permanently available from http://purl.obolibrary.org/obo/poro/releases/2014-03-06/. Conclusions By standardizing character representations, we hope to facilitate more rapid description and identification of sponge taxa, to allow integration with other evolutionary database systems, and to perform character mapping across the major clades of sponges to better understand the evolution of morphological features. Future applications of the ontology will focus on creating (1) ontology-based species descriptions; (2) taxonomic keys that use the nested terms of the ontology to more quickly facilitate species identifications; and (3) methods to map anatomical characters onto molecular phylogenies of sponges. In addition to modern taxa, the ontology is being extended to include features of fossil taxa.
Collapse
Affiliation(s)
- Robert W Thacker
- Department of Biology, University of Alabama at Birmingham, Birmingham, USA
| | | | - Adeline Kerner
- CR2P, UMR 7207 CNRS-MNHN-UPMC, Département Histoire de la Terre, Muséum National d'Histoire Naturelle, Bâtiment de Géologie, CP48, 57 rue Cuvier, 75005 Paris, France
| | - Régine Vignes-Lebbe
- CR2P, UMR 7207 CNRS-MNHN-UPMC, Département Histoire de la Terre, Muséum National d'Histoire Naturelle, Bâtiment de Géologie, CP48, 57 rue Cuvier, 75005 Paris, France
| | - Erik Segerdell
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, USA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, USA
| | | |
Collapse
|
36
|
Ramírez MJ, Michalik P. Calculating structural complexity in phylogenies using ancestral ontologies. Cladistics 2014; 30:635-649. [DOI: 10.1111/cla.12075] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2014] [Indexed: 01/29/2023] Open
Affiliation(s)
- Martín J. Ramírez
- Museo Argentino de Ciencias Naturales “Bernardino Rivadavia” - CONICET; Av. Angel Gallardo 470 C1405DJR Buenos Aires Argentina
| | - Peter Michalik
- Zoologisches Institut und Museum; Ernst-Moritz-Arndt-Universität; J.-S.-Bach-Str. 11/12 D-17489 Greifswald Germany
| |
Collapse
|
37
|
Affiliation(s)
- Stefan Richter
- Allgemeine & Spezielle Zoologie; Institut für Biowissenschaften; Universität Rostock; Rostock Germany
| | - Christian S. Wirkner
- Allgemeine & Spezielle Zoologie; Institut für Biowissenschaften; Universität Rostock; Rostock Germany
| |
Collapse
|
38
|
Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D. Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semantics 2014; 5:15. [PMID: 24602198 PMCID: PMC4006457 DOI: 10.1186/2041-1480-5-15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/09/2014] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Llandinam Building, SY23 3DB Aberystwyth, UK
| | - Melissa Haendel
- OHSU Library and Department of Medical Informatics, Portland, Oregon, USA
- Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
39
|
Van Slyke CE, Bradford YM, Westerfield M, Haendel MA. The zebrafish anatomy and stage ontologies: representing the anatomy and development of Danio rerio. J Biomed Semantics 2014; 5:12. [PMID: 24568621 PMCID: PMC3944782 DOI: 10.1186/2041-1480-5-12] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 02/07/2014] [Indexed: 01/07/2023] Open
Abstract
Background The Zebrafish Anatomy Ontology (ZFA) is an OBO Foundry ontology that is used in conjunction with the Zebrafish Stage Ontology (ZFS) to describe the gross and cellular anatomy and development of the zebrafish, Danio rerio, from single cell zygote to adult. The zebrafish model organism database (ZFIN) uses the ZFA and ZFS to annotate phenotype and gene expression data from the primary literature and from contributed data sets. Results The ZFA models anatomy and development with a subclass hierarchy, a partonomy, and a developmental hierarchy and with relationships to the ZFS that define the stages during which each anatomical entity exists. The ZFA and ZFS are developed utilizing OBO Foundry principles to ensure orthogonality, accessibility, and interoperability. The ZFA has 2860 classes representing a diversity of anatomical structures from different anatomical systems and from different stages of development. Conclusions The ZFA describes zebrafish anatomy and development semantically for the purposes of annotating gene expression and anatomical phenotypes. The ontology and the data have been used by other resources to perform cross-species queries of gene expression and phenotype data, providing insights into genetic relationships, morphological evolution, and models of human disease.
Collapse
|
40
|
Paul R, Groza T, Hunter J, Zankl A. Inferring characteristic phenotypes via class association rule mining in the bone dysplasia domain. J Biomed Inform 2013; 48:73-83. [PMID: 24333481 DOI: 10.1016/j.jbi.2013.12.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 11/01/2013] [Accepted: 12/01/2013] [Indexed: 11/15/2022]
Abstract
Finding, capturing and describing characteristic features represents a key aspect in disorder definition, diagnosis and management. This process is particularly challenging in the case of rare disorders, due to the sparse nature of data and expertise. From a computational perspective, finding characteristic features is associated with some additional major challenges, such as formulating a computationally tractable definition, devising appropriate inference algorithms or defining sound validation mechanisms. In this paper we aim to deal with each of these problems in the context provided by the skeletal dysplasia domain. We propose a clear definition for characteristic phenotypes, we experiment with a novel, class association rule mining algorithm and we discuss our lessons learned from both an automatic and human-based validation of our approach.
Collapse
Affiliation(s)
- Razan Paul
- School of ITEE, The University of Queensland, Australia.
| | - Tudor Groza
- School of ITEE, The University of Queensland, Australia.
| | - Jane Hunter
- School of ITEE, The University of Queensland, Australia.
| | - Andreas Zankl
- Bone Dysplasia Research Group, UQ Centre for Clinical Research (UQCCR), The University of Queensland, Australia; Genetic Health Queensland, Royal Brisbane and Women's Hospital, Herston, Australia.
| |
Collapse
|
41
|
Jensen M, Cox AP, Chaudhry N, Ng M, Sule D, Duncan W, Ray P, Weinstock-Guttman B, Smith B, Ruttenberg A, Szigeti K, Diehl AD. The neurological disease ontology. J Biomed Semantics 2013; 4:42. [PMID: 24314207 PMCID: PMC4028878 DOI: 10.1186/2041-1480-4-42] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2013] [Accepted: 11/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer's disease, multiple sclerosis, and stroke. DESCRIPTION ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms 'disease', 'diagnosis', 'disease course', and 'disorder'. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer's disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at http://code.google.com/p/neurological-disease-ontology along with a discussion list and an issue tracker. CONCLUSION ND seeks to provide a formal foundation for the representation of clinical and research data pertaining to neurological diseases. ND will enable its users to connect data in a robust way with related data that is annotated using other terminologies and ontologies in the biomedical domain.
Collapse
Affiliation(s)
- Mark Jensen
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Alexander P Cox
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Naveed Chaudhry
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Marcus Ng
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Donat Sule
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - William Duncan
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Patrick Ray
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
| | - Bianca Weinstock-Guttman
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Barry Smith
- Department of Philosophy, University at Buffalo, 135 Park Hall, Buffalo, NY 14260, USA
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Alan Ruttenberg
- Department of Oral Diagnostic Sciences, University at Buffalo School of Dental Medicine, 355 Squire Hall, Buffalo, NY 14214, USA
| | - Kinga Szigeti
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| | - Alexander D Diehl
- Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, 701 Ellicott Street, Buffalo, NY 14203, USA
| |
Collapse
|
42
|
Oellrich A, Grabmüller C, Rebholz-Schuhmann D. Automatically transforming pre- to post-composed phenotypes: EQ-lising HPO and MP. J Biomed Semantics 2013; 4:29. [PMID: 24131519 PMCID: PMC4016257 DOI: 10.1186/2041-1480-4-29] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2013] [Accepted: 04/12/2013] [Indexed: 01/24/2023] Open
Abstract
Background Large-scale mutagenesis projects are ongoing to improve our understanding about the pathology and subsequently the treatment of diseases. Such projects do not only record the genotype but also report phenotype descriptions of the genetically modified organisms under investigation. Thus far, phenotype data is stored in species-specific databases that lack coherence and interoperability in their phenotype representations. One suggestion to overcome the lack of integration are Entity-Quality (EQ) statements. However, a reliable automated transformation of the phenotype annotations from the databases into EQ statements is still missing. Results Here, we report on our ongoing efforts to develop a method (called EQ-liser) for the automated generation of EQ representations from phenotype ontology concept labels. We implemented the suggested method in a prototype and applied it to a subset of Mammalian and Human Phenotype Ontology concepts. In the case of MP, we were able to identify the correct EQ representation in over 52% of structure and process phenotypes. However, applying the EQ-liser prototype to the Human Phenotype Ontology yields a correct EQ representation in only 13.3% of the investigated cases. Conclusions With the application of the prototype to two phenotype ontologies, we were able to identify common patterns of mistakes when generating the EQ representation. Correcting these mistakes will pave the way to a species-independent solution to automatically derive EQ representations from phenotype ontology concept labels. Furthermore, we were able to identify inconsistencies in the existing manually defined EQ representations of current phenotype ontologies. Correcting these inconsistencies will improve the quality of the manually defined EQ statements.
Collapse
Affiliation(s)
- Anika Oellrich
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | | | | |
Collapse
|
43
|
Collier N, Tran MV, Le HQ, Ha QT, Oellrich A, Rebholz-Schuhmann D. Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. PLoS One 2013; 8:e72965. [PMID: 24155869 PMCID: PMC3796529 DOI: 10.1371/journal.pone.0072965] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 07/15/2013] [Indexed: 11/19/2022] Open
Abstract
The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.
Collapse
Affiliation(s)
- Nigel Collier
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom
- National Institute of Informatics, Tokyo, Japan
- * E-mail:
| | - Mai-vu Tran
- National Institute of Informatics, Tokyo, Japan
- Knowledge Technology Laboratory, University of Engineering and Technology - VNU, Hanoi, Vietnam
| | - Hoang-quynh Le
- National Institute of Informatics, Tokyo, Japan
- Knowledge Technology Laboratory, University of Engineering and Technology - VNU, Hanoi, Vietnam
| | - Quang-Thuy Ha
- Knowledge Technology Laboratory, University of Engineering and Technology - VNU, Hanoi, Vietnam
| | - Anika Oellrich
- Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Dietrich Rebholz-Schuhmann
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
44
|
Druzinsky R, Mungall C, Haendel M, Lapp H, Mabee P. What is an anatomy ontology? Anat Rec (Hoboken) 2013; 296:1797-9. [DOI: 10.1002/ar.22805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 06/18/2013] [Accepted: 08/05/2013] [Indexed: 11/08/2022]
Affiliation(s)
- Robert Druzinsky
- Department of Oral Biology; College of Dentistry, University of Illinois; Chicago
| | - Christopher Mungall
- Department of Genome Dynamics; Lawrence Berkeley Laboratory; Berkeley California
| | - Melissa Haendel
- Department of Medical Informatics and Epidemiology; Oregon Health & Science University; Portland Oregon
| | - Hilmar Lapp
- National Evolutionary Synthesis Center (NESCent); Durham North Carolina
| | - Paula Mabee
- Department of Biology; University of South Dakota; Vermillion South Dakota
| |
Collapse
|
45
|
Balhoff JP, Mikó I, Yoder MJ, Mullins PL, Deans AR. A semantic model for species description applied to the ensign wasps (hymenoptera: evaniidae) of New Caledonia. Syst Biol 2013; 62:639-59. [PMID: 23652347 PMCID: PMC3739881 DOI: 10.1093/sysbio/syt028] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 02/14/2013] [Accepted: 04/23/2013] [Indexed: 12/01/2022] Open
Abstract
Taxonomic descriptions are unparalleled sources of knowledge of life's phenotypic diversity. As natural language prose, these data sets are largely refractory to computation and integration with other sources of phenotypic data. By formalizing taxonomic descriptions using ontology-based semantic representation, we aim to increase the reusability and computability of taxonomists' primary data. Here, we present a revision of the ensign wasp (Hymenoptera: Evaniidae) fauna of New Caledonia using this new model for species description. Descriptive matrices, specimen data, and taxonomic nomenclature are gathered in a unified Web-based application, mx, then exported as both traditional taxonomic treatments and semantic statements using the OWL Web Ontology Language. Character:character-state combinations are then annotated following the entity-quality phenotype model, originally developed to represent mutant model organism phenotype data; concepts of anatomy are drawn from the Hymenoptera Anatomy Ontology and linked to phenotype descriptors from the Phenotypic Quality Ontology. The resulting set of semantic statements is provided in Resource Description Framework format. Applying the model to real data, that is, specimens, taxonomic names, diagnoses, descriptions, and redescriptions, provides us with a foundation to discuss limitations and potential benefits such as automated data integration and reasoner-driven queries. Four species of ensign wasp are now known to occur in New Caledonia: Szepligetella levipetiolata, Szepligetella deercreeki Deans and Mikó sp. nov., Szepligetella irwini Deans and Mikó sp. nov., and the nearly cosmopolitan Evania appendigaster. A fifth species, Szepligetella sericea, including Szepligetella impressa, syn. nov., has not yet been collected in New Caledonia but can be found on islands throughout the Pacific and so is included in the diagnostic key.
Collapse
Affiliation(s)
- James P. Balhoff
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - István Mikó
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Matthew J. Yoder
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Patricia L. Mullins
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Andrew R. Deans
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
46
|
Abstract
Motivation: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast. Results: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species. Availability: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/). Contact:mah79@cam.ac.uk or vw253@cam.ac.uk
Collapse
Affiliation(s)
- Midori A Harris
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| | | | | | | | | |
Collapse
|
47
|
Vogt L, Nickel M, Jenner RA, Deans AR. The need for data standards in zoomorphology. J Morphol 2013; 274:793-808. [PMID: 23508988 DOI: 10.1002/jmor.20138] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2012] [Revised: 12/10/2012] [Accepted: 01/18/2013] [Indexed: 11/05/2022]
Abstract
eScience is a new approach to research that focuses on data mining and exploration rather than data generation or simulation. This new approach is arguably a driving force for scientific progress and requires data to be openly available, easily accessible via the Internet, and compatible with each other. eScience relies on modern standards for the reporting and documentation of data and metadata. Here, we suggest necessary components (i.e., content, concept, nomenclature, format) of such standards in the context of zoomorphology. We document the need for using data repositories to prevent data loss and how publication practice is currently changing, with the emergence of dynamic publications and the publication of digital datasets. Subsequently, we demonstrate that in zoomorphology the scientific record is still limited to published literature and that zoomorphological data are usually not accessible through data repositories. The underlying problem is that zoomorphology lacks the standards for data and metadata. As a consequence, zoomorphology cannot participate in eScience. We argue that the standardization of morphological data requires i) a standardized framework for terminologies for anatomy and ii) a formalized method of description that allows computer-parsable morphological data to be communicable, compatible, and comparable. The role of controlled vocabularies (e.g., ontologies) for developing respective terminologies and methods of description is discussed, especially in the context of data annotation and semantic enhancement of publications. Finally, we introduce the International Consortium for Zoomorphology Standards, a working group that is open to everyone and whose aim is to stimulate and synthesize dialog about standards. It is the Consortium's ultimate goal to assist the zoomorphology community in developing modern data and metadata standards, including anatomy ontologies, thereby facilitating the participation of zoomorphology in eScience.
Collapse
Affiliation(s)
- Lars Vogt
- Abteilung Zoologie und Evolutionsbiologie, Institut für Evolutionsbiologie und Ökologie, Fachgruppe Biologie, Universität Bonn; An der Immenburg 1, Bonn D-53121, Germany.
| | | | | | | |
Collapse
|
48
|
Groza T, Hunter J, Zankl A. Mining skeletal phenotype descriptions from scientific literature. PLoS One 2013; 8:e55656. [PMID: 23409017 PMCID: PMC3568099 DOI: 10.1371/journal.pone.0055656] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Accepted: 12/28/2012] [Indexed: 12/02/2022] Open
Abstract
Phenotype descriptions are important for our understanding of genetics, as they enable the computation and analysis of a varied range of issues related to the genetic and developmental bases of correlated characters. The literature contains a wealth of such phenotype descriptions, usually reported as free-text entries, similar to typical clinical summaries. In this paper, we focus on creating and making available an annotated corpus of skeletal phenotype descriptions. In addition, we present and evaluate a hybrid Machine Learning approach for mining phenotype descriptions from free text. Our hybrid approach uses an ensemble of four classifiers and experiments with several aggregation techniques. The best scoring technique achieves an F-1 score of 71.52%, which is close to the state-of-the-art in other domains, where training data exists in abundance. Finally, we discuss the influence of the features chosen for the model on the overall performance of the method.
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, Australia.
| | | | | |
Collapse
|
49
|
Groza T, Hunter J, Zankl A. Decomposing phenotype descriptions for the human skeletal phenome. BIOMEDICAL INFORMATICS INSIGHTS 2013; 6:1-14. [PMID: 23440304 PMCID: PMC3572876 DOI: 10.4137/bii.s10729] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. The intrinsic value and knowledge captured within such descriptions can only be expressed by taking advantage of their inner structure that implicitly combines qualities and anatomical entities. We present a meta-model (the Phenotype Fragment Ontology) and a processing pipeline that enable together the automatic decomposition and conceptualization of phenotype descriptions for the human skeletal phenome. We use this approach to showcase the usefulness of the generic concept of phenotype decomposition by performing an experimental study on all skeletal phenotype concepts defined in the Human Phenotype Ontology.
Collapse
Affiliation(s)
- Tudor Groza
- School of ITEE, The University of Queensland, Australia
| | | | | |
Collapse
|
50
|
Dahdul WM, Balhoff JP, Blackburn DC, Diehl AD, Haendel MA, Hall BK, Lapp H, Lundberg JG, Mungall CJ, Ringwald M, Segerdell E, Van Slyke CE, Vickaryous MK, Westerfield M, Mabee PM. A unified anatomy ontology of the vertebrate skeletal system. PLoS One 2012; 7:e51070. [PMID: 23251424 PMCID: PMC3519498 DOI: 10.1371/journal.pone.0051070] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 10/30/2012] [Indexed: 11/19/2022] Open
Abstract
The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.
Collapse
Affiliation(s)
- Wasila M Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|