1
|
Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021; 12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. RESULTS Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. CONCLUSIONS We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| |
Collapse
|
2
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
3
|
Liu J, Wang Z. Diverse array-designed modes of combination therapies in Fangjiomics. Acta Pharmacol Sin 2015; 36:680-8. [PMID: 25864646 PMCID: PMC4594182 DOI: 10.1038/aps.2014.125] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 10/30/2014] [Indexed: 12/11/2022] Open
Abstract
In line with the complexity of disease networks, diverse combination therapies have been demonstrated potential in the treatment of different patients with complex diseases in a personal combination profile. However, the identification of rational, compatible and effective drug combinations remains an ongoing challenge. Based on a holistic theory integrated with reductionism, Fangjiomics systematically develops multiple modes of array-designed combination therapies. We define diverse "magic shotgun" vertical, horizontal, focusing, siege and dynamic arrays according to different spatiotemporal distributions of hits on targets, pathways and networks. Through these multiple adaptive modes for treating complex diseases, Fangjiomics may help to identify rational drug combinations with synergistic or additive efficacy but reduced adverse side effects that reverse complex diseases by reconstructing or rewiring multiple targets, pathways and networks. Such a novel paradigm for combination therapies may allow us to achieve more precise treatments by developing phenotype-driven quantitative multi-scale modeling for rational drug combinations.
Collapse
Affiliation(s)
- Jun Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Zhong Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| |
Collapse
|
4
|
Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D. Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semantics 2014; 5:15. [PMID: 24602198 PMCID: PMC4006457 DOI: 10.1186/2041-1480-5-15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/09/2014] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Llandinam Building, SY23 3DB Aberystwyth, UK
| | - Melissa Haendel
- OHSU Library and Department of Medical Informatics, Portland, Oregon, USA
- Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
5
|
Collier N, Tran MV, Le HQ, Ha QT, Oellrich A, Rebholz-Schuhmann D. Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking. PLoS One 2013; 8:e72965. [PMID: 24155869 PMCID: PMC3796529 DOI: 10.1371/journal.pone.0072965] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 07/15/2013] [Indexed: 11/19/2022] Open
Abstract
The identification of phenotype descriptions in the scientific literature, case reports and patient records is a rewarding task for bio-medical text mining. Any progress will support knowledge discovery and linkage to other resources. However because of their wide variation a number of challenges still remain in terms of their identification and semantic normalisation before they can be fully exploited for research purposes. This paper presents novel techniques for identifying potential complex phenotype mentions by exploiting a hybrid model based on machine learning, rules and dictionary matching. A systematic study is made of how to combine sequence labels from these modules as well as the merits of various ontological resources. We evaluated our approach on a subset of Medline abstracts cited by the Online Mendelian Inheritance of Man database related to auto-immune diseases. Using partial matching the best micro-averaged F-score for phenotypes and five other entity classes was 79.9%. A best performance of 75.3% was achieved for phenotype candidates using all semantics resources. We observed the advantage of using SVM-based learn-to-rank for sequence label combination over maximum entropy and a priority list approach. The results indicate that the identification of simple entity types such as chemicals and genes are robustly supported by single semantic resources, whereas phenotypes require combinations. Altogether we conclude that our approach coped well with the compositional structure of phenotypes in the auto-immune domain.
Collapse
Affiliation(s)
- Nigel Collier
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom
- National Institute of Informatics, Tokyo, Japan
- * E-mail:
| | - Mai-vu Tran
- National Institute of Informatics, Tokyo, Japan
- Knowledge Technology Laboratory, University of Engineering and Technology - VNU, Hanoi, Vietnam
| | - Hoang-quynh Le
- National Institute of Informatics, Tokyo, Japan
- Knowledge Technology Laboratory, University of Engineering and Technology - VNU, Hanoi, Vietnam
| | - Quang-Thuy Ha
- Knowledge Technology Laboratory, University of Engineering and Technology - VNU, Hanoi, Vietnam
| | - Anika Oellrich
- Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Dietrich Rebholz-Schuhmann
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, United Kingdom
- Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
6
|
Schofield PN, Sundberg JP, Sundberg BA, McKerlie C, Gkoutos GV. The mouse pathology ontology, MPATH; structure and applications. J Biomed Semantics 2013; 4:18. [PMID: 24033988 PMCID: PMC3851164 DOI: 10.1186/2041-1480-4-18] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Accepted: 08/19/2013] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The capture and use of disease-related anatomic pathology data for both model organism phenotyping and human clinical practice requires a relatively simple nomenclature and coding system that can be integrated into data collection platforms (such as computerized medical record-keeping systems) to enable the pathologist to rapidly screen and accurately record observations. The MPATH ontology was originally constructed in 2,000 by a committee of pathologists for the annotation of rodent histopathology images, but is now widely used for coding and analysis of disease and phenotype data for rodents, humans and zebrafish. CONSTRUCTION AND CONTENT MPATH is divided into two main branches describing pathological processes and structures based on traditional histopathological principles. It does not aim to include definitive diagnoses, which would generally be regarded as disease concepts. It contains 888 core pathology terms in an almost exclusively is_a hierarchy nine layers deep. Currently, 86% of the terms have textual definitions and contain relationships as well as logical axioms to other ontologies such the Gene Ontology. APPLICATION AND UTILITY MPATH was originally devised for the annotation of histopathological images from mice but is now being used much more widely in the recording of diagnostic and phenotypic data from both mice and humans, and in the construction of logical definitions for phenotype and disease ontologies. We discuss the use of MPATH to generate cross-products with qualifiers derived from a subset of the Phenotype and Trait Ontology (PATO) and its application to large-scale high-throughput phenotyping studies. MPATH provides a largely species-agnostic ontology for the descriptions of anatomic pathology, which can be applied to most amniotes and is now finding extensive use in species other than mice. It enables investigators to interrogate large datasets at a variety of depths, use semantic analysis to identify the relations between diseases in different species and integrate pathology data with other data types, such as pharmacogenomics.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK.
| | | | | | | | | |
Collapse
|
7
|
Sojic A, Kutz O. Open biomedical pluralism: formalising knowledge about breast cancer phenotypes. J Biomed Semantics 2012; 3 Suppl 2:S3. [PMID: 23046572 PMCID: PMC3448532 DOI: 10.1186/2041-1480-3-s2-s3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We demonstrate a heterogeneity of representation types for breast cancer phenotypes and stress that the characterisation of a tumour phenotype often includes parameters that go beyond the representation of a corresponding empirically observed tumour, thus reflecting significant functional features of the phenotypes as well as epistemic interests that drive the modes of representation. Accordingly, the represented features of cancer phenotypes function as epistemic vehicles aiding various classifications, explanations, and predictions. In order to clarify how the plurality of epistemic motivations can be integrated on a formal level, we give a distinction between six categories of human agents as individuals and groups focused around particular epistemic interests. We analyse the corresponding impact of these groups and individuals on representation types, mapping and reasoning scenarios. Respecting the plurality of representations, related formalisms, expressivities and aims, as they are found across diverse scientific communities, we argue for a pluralistic ontology integration. Moreover, we discuss and illustrate to what extent such a pluralistic integration is supported by the distributed ontology language DOL, a meta-language for heterogeneous ontology representation that is currently under standardisation as ISO WD 17347 within the OntoIOp (Ontology Integration and Interoperability) activity of ISO/TC 37/SC 3. We particularly illustrate how DOL supports representations of parthood on various levels of logical expressivity, mapping of terms, merging of ontologies, as well as non-monotonic extensions based on circumscription allowing a transparent formal modelling of the normal/abnormal distinction in phenotypes.
Collapse
Affiliation(s)
- Aleksandra Sojic
- European School of Molecular Medicine; European Institute of Oncology; University of Milan; Milan, Italy.
| | | |
Collapse
|
8
|
Abstract
Ontologies are now pervasive in biomedicine, where they serve as a means to standardize terminology, to enable access to domain knowledge, to verify data consistency and to facilitate integrative analyses over heterogeneous biomedical data. For this purpose, research on biomedical ontologies applies theories and methods from diverse disciplines such as information management, knowledge representation, cognitive science, linguistics and philosophy. Depending on the desired applications in which ontologies are being applied, the evaluation of research in biomedical ontologies must follow different strategies. Here, we provide a classification of research problems in which ontologies are being applied, focusing on the use of ontologies in basic and translational research, and we demonstrate how research results in biomedical ontologies can be evaluated. The evaluation strategies depend on the desired application and measure the success of using an ontology for a particular biomedical problem. For many applications, the success can be quantified, thereby facilitating the objective evaluation and comparison of research in biomedical ontology. The objective, quantifiable comparison of research results based on scientific applications opens up the possibility for systematically improving the utility of ontologies in biomedical research.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, SY23 3DB, UK.
| | | | | |
Collapse
|
9
|
Schofield PN, Hoehndorf R, Gkoutos GV. Mouse genetic and phenotypic resources for human genetics. Hum Mutat 2012; 33:826-36. [PMID: 22422677 DOI: 10.1002/humu.22077] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The use of model organisms to provide information on gene function has proved to be a powerful approach to our understanding of both human disease and fundamental mammalian biology. Large-scale community projects using mice, based on forward and reverse genetics, and now the pan-genomic phenotyping efforts of the International Mouse Phenotyping Consortium, are generating resources on an unprecedented scale, which will be extremely valuable to human genetics and medicine. We discuss the nature and availability of data, mice and embryonic stem cells from these large-scale programmes, the use of these resources to help prioritize and validate candidate genes in human genetic association studies, and how they can improve our understanding of the underlying pathobiology of human disease.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom.
| | | | | |
Collapse
|
10
|
Gkoutos GV, Schofield PN, Hoehndorf R. Computational tools for comparative phenomics: the role and promise of ontologies. Mamm Genome 2012; 23:669-79. [PMID: 22814867 DOI: 10.1007/s00335-012-9404-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 05/21/2012] [Indexed: 11/28/2022]
Abstract
A major aim of the biological sciences is to gain an understanding of human physiology and disease. One important step towards such a goal is the discovery of the function of genes that will lead to a better understanding of the physiology and pathophysiology of organisms, which will ultimately lead to better diagnosis and therapy. Our increasing ability to phenotypically characterise genetic variants of model organisms coupled with systematic and hypothesis-driven mutagenesis is resulting in a wealth of information that could potentially provide insight into the functions of all genes in an organism. The challenge we are now facing is to develop computational methods that can integrate and analyse such data. The introduction of formal ontologies that make their semantics explicit and accessible to automated reasoning provides the tantalizing possibility of standardizing biomedical knowledge allowing for novel, powerful queries that bridge multiple domains, disciplines, species, and levels of granularity. We review recent computational approaches that facilitate the integration of experimental data from model organisms with clinical observations in humans. These methods foster novel cross-species analysis approaches, thereby enabling comparative phenomics and leading to the potential of translating basic discoveries from the model systems into diagnostic and therapeutic advances at the clinical level.
Collapse
Affiliation(s)
- Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.
| | | | | |
Collapse
|
11
|
Shimoyama M, Nigam R, McIntosh LS, Nagarajan R, Rice T, Rao DC, Dwinell MR. Three ontologies to define phenotype measurement data. Front Genet 2012; 3:87. [PMID: 22654893 PMCID: PMC3361058 DOI: 10.3389/fgene.2012.00087] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 04/30/2012] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND There is an increasing need to integrate phenotype measurement data across studies for both human studies and those involving model organisms. Current practices allow researchers to access only those data involved in a single experiment or multiple experiments utilizing the same protocol. RESULTS Three ontologies were created: Clinical Measurement Ontology, Measurement Method Ontology and Experimental Condition Ontology. These ontologies provided the framework for integration of rat phenotype data from multiple studies into a single resource as well as facilitated data integration from multiple human epidemiological studies into a centralized repository. CONCLUSION An ontology based framework for phenotype measurement data affords the ability to successfully integrate vital phenotype data into critical resources, regardless of underlying technological structures allowing the user to easily query and retrieve data from multiple studies.
Collapse
Affiliation(s)
- Mary Shimoyama
- Human and Molecular Genetics Center, Medical College of Wisconsin Milwaukee, WI, USA.
| | | | | | | | | | | | | |
Collapse
|
12
|
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biol 2012; 13:R5. [PMID: 22293552 PMCID: PMC3334586 DOI: 10.1186/gb-2012-13-1-r5] [Citation(s) in RCA: 395] [Impact Index Per Article: 32.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Accepted: 01/31/2012] [Indexed: 01/20/2023] Open
Abstract
We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at http://uberon.org
Collapse
Affiliation(s)
- Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cycltotron Road MS 64-121, Berkeley, CA 94720, USA.
| | | | | | | | | |
Collapse
|
13
|
Köhler S, Bauer S, Mungall CJ, Carletti G, Smith CL, Schofield P, Gkoutos GV, Robinson PN. Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics 2011; 12:418. [PMID: 22032770 PMCID: PMC3224779 DOI: 10.1186/1471-2105-12-418] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Accepted: 10/27/2011] [Indexed: 01/16/2023] Open
Abstract
Background Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly being adopted as a method for quality control as well as for facilitating interoperability and data integration. Results We show how automated reasoning over logical definitions of ontology terms can be used to improve ontology structure. We provide the Java software package GULO (Getting an Understanding of LOgical definitions), which allows fast and easy evaluation for any kind of logically decomposed ontology by generating a composite OWL ontology from appropriate subsets of the referenced ontologies and comparing the inferred relationships with the relationships asserted in the target ontology. As a case study we show how to use GULO to evaluate the logical definitions that have been developed for the Mammalian Phenotype Ontology (MPO). Conclusions Logical definitions of terms from biomedical ontologies represent an important resource for error and disagreement detection. GULO gives ontology curators a fast and simple tool for validation of their work.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Schofield PN, Sundberg JP, Hoehndorf R, Gkoutos GV. New approaches to the representation and analysis of phenotype knowledge in human diseases and their animal models. Brief Funct Genomics 2011; 10:258-65. [PMID: 21987712 PMCID: PMC3189694 DOI: 10.1093/bfgp/elr031] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The systematic investigation of the phenotypes associated with genotypes in model organisms holds the promise of revealing genotype-phenotype relations directly and without additional, intermediate inferences. Large-scale projects are now underway to catalog the complete phenome of a species, notably the mouse. With the increasing amount of phenotype information becoming available, a major challenge that biology faces today is the systematic analysis of this information and the translation of research results across species and into an improved understanding of human disease. The challenge is to integrate and combine phenotype descriptions within a species and to systematically relate them to phenotype descriptions in other species, in order to form a comprehensive understanding of the relations between those phenotypes and the genotypes involved in human disease. We distinguish between two major approaches for comparative phenotype analyses: the first relies on evolutionary relations to bridge the species gap, while the other approach compares phenotypes directly. In particular, the direct comparison of phenotypes relies heavily on the quality and coherence of phenotype and disease databases. We discuss major achievements and future challenges for these databases in light of their potential to contribute to the understanding of the molecular mechanisms underlying human disease. In particular, we discuss how the use of ontologies and automated reasoning can significantly contribute to the analysis of phenotypes and demonstrate their potential for enabling translational research.
Collapse
|
15
|
Hoehndorf R, Dumontier M, Gennari JH, Wimalaratne S, de Bono B, Cook DL, Gkoutos GV. Integrating systems biology models and biomedical ontologies. BMC SYSTEMS BIOLOGY 2011; 5:124. [PMID: 21835028 PMCID: PMC3170340 DOI: 10.1186/1752-0509-5-124] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Accepted: 08/11/2011] [Indexed: 01/30/2023]
Abstract
BACKGROUND Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. RESULTS We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. CONCLUSIONS We establish an information flow between biomedical ontologies and biosimulation models and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Michel Dumontier
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
- School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
| | - John H Gennari
- Biomedical & Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA
| | - Sarala Wimalaratne
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Bernard de Bono
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Daniel L Cook
- Department of Physiology & Biophysics, University of Washington, 1705 NE Pacific Street, Box 357290, Seattle, Washington 98195, USA
- Department of Biological Structure, University of Washington, 1959 NE Pacific Street, Box 357420, Seattle, Washington 98195, USA
| | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| |
Collapse
|
16
|
Hoehndorf R, Dumontier M, Oellrich A, Rebholz-Schuhmann D, Schofield PN, Gkoutos GV. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning. PLoS One 2011; 6:e22006. [PMID: 21789201 PMCID: PMC3138764 DOI: 10.1371/journal.pone.0022006] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 06/12/2011] [Indexed: 11/18/2022] Open
Abstract
Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Michel Dumontier
- Department of Biology, Institute of Biochemistry and School of Computer Science, Carleton University, Ottawa, Ontario, Canada
| | - Anika Oellrich
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Paul N. Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | | |
Collapse
|
17
|
Hoehndorf R, Dumontier M, Oellrich A, Wimalaratne S, Rebholz-Schuhmann D, Schofield P, Gkoutos GV. A common layer of interoperability for biomedical ontologies based on OWL EL. Bioinformatics 2011; 27:1001-8. [PMID: 21343142 DOI: 10.1093/bioinformatics/btr058] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Ontologies are essential in biomedical research due to their ability to semantically integrate content from different scientific databases and resources. Their application improves capabilities for querying and mining biological knowledge. An increasing number of ontologies is being developed for this purpose, and considerable effort is invested into formally defining them in order to represent their semantics explicitly. However, current biomedical ontologies do not facilitate data integration and interoperability yet, since reasoning over these ontologies is very complex and cannot be performed efficiently or is even impossible. We propose the use of less expressive subsets of ontology representation languages to enable efficient reasoning and achieve the goal of genuine interoperability between ontologies. RESULTS We present and evaluate EL Vira, a framework that transforms OWL ontologies into the OWL EL subset, thereby enabling the use of tractable reasoning. We illustrate which OWL constructs and inferences are kept and lost following the conversion and demonstrate the performance gain of reasoning indicated by the significant reduction of processing time. We applied EL Vira to the open biomedical ontologies and provide a repository of ontologies resulting from this conversion. EL Vira creates a common layer of ontological interoperability that, for the first time, enables the creation of software solutions that can employ biomedical ontologies to perform inferences and answer complex queries to support scientific analyses. AVAILABILITY AND IMPLEMENTATION The EL Vira software is available from http://el-vira.googlecode.com and converted OBO ontologies and their mappings are available from http://bioonto.gen.cam.ac.uk/el-ont.
Collapse
|
18
|
Hoehndorf R, Oellrich A, Rebholz-Schuhmann D. Interoperability between phenotype and anatomy ontologies. Bioinformatics 2010; 26:3112-8. [PMID: 20971987 PMCID: PMC2995119 DOI: 10.1093/bioinformatics/btq578] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Phenotypic information is important for the analysis of the molecular mechanisms underlying disease. A formal ontological representation of phenotypic information can help to identify, interpret and infer phenotypic traits based on experimental findings. The methods that are currently used to represent data and information about phenotypes fail to make the semantics of the phenotypic trait explicit and do not interoperate with ontologies of anatomy and other domains. Therefore, valuable resources for the analysis of phenotype studies remain unconnected and inaccessible to automated analysis and reasoning. RESULTS We provide a framework to formalize phenotypic descriptions and make their semantics explicit. Based on this formalization, we provide the means to integrate phenotypic descriptions with ontologies of other domains, in particular anatomy and physiology. We demonstrate how our framework leads to the capability to represent disease phenotypes, perform powerful queries that were not possible before and infer additional knowledge. AVAILABILITY http://bioonto.de/pmwiki.php/Main/PheneOntology.
Collapse
Affiliation(s)
- Robert Hoehndorf
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | |
Collapse
|
19
|
Schofield PN, Gkoutos GV, Gruenberger M, Sundberg JP, Hancock JM. Phenotype ontologies for mouse and man: bridging the semantic gap. Dis Model Mech 2010; 3:281-9. [PMID: 20427557 DOI: 10.1242/dmm.002790] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A major challenge of the post-genomic era is coding phenotype data from humans and model organisms such as the mouse, to permit the meaningful translation of phenotype descriptions between species. This ability is essential if we are to facilitate phenotype-driven gene function discovery and empower comparative pathobiology. Here, we review the current state of the art for phenotype and disease description in mice and humans, and discuss ways in which the semantic gap between coding systems might be bridged to facilitate the discovery and exploitation of new mouse models of human diseases.
Collapse
Affiliation(s)
- Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK.
| | | | | | | | | |
Collapse
|
20
|
Gkoutos GV, Green ECJ, Mallon AM, Blake A, Greenaway S, Hancock JM, Davidson D. Ontologies for the description of mouse phenotypes. Comp Funct Genomics 2010; 5:545-51. [PMID: 18629136 PMCID: PMC2447424 DOI: 10.1002/cfg.430] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2004] [Accepted: 10/18/2004] [Indexed: 11/12/2022] Open
Abstract
Ontologies are becoming increasingly important for the efficient storage, retrieval
and mining of biological data. The description of phenotypes using ontologies is a
particularly complex problem. We outline a schema that can be used to describe
phenotypes by combining orthologous axiomatic ontologies. We also describe tools for
storing, browsing and searching such complex ontologies. Central to this approach is
that assays (protocols for measuring phenotypic characters) describe what has been
measured as well as how this was done, allowing assays to link individual organisms to
ontologies describing phenotypes. We have evaluated this approach by automatically
annotating data on 600 000 mutant mice phenotypes using the SHIRPA protocol. We
believe this approach will enable the flexible, extensible and detailed description of
phenotypes from any organism.
Collapse
Affiliation(s)
- G V Gkoutos
- MRC Mammalian Genetics Unit, Harwell, Oxfordshire OX11 0RD, UK.
| | | | | | | | | | | | | |
Collapse
|
21
|
Mungall CJ. Obol: integrating language and meaning in bio-ontologies. Comp Funct Genomics 2010; 5:509-20. [PMID: 18629143 PMCID: PMC2447432 DOI: 10.1002/cfg.435] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2004] [Revised: 11/02/2004] [Accepted: 11/03/2004] [Indexed: 11/29/2022] Open
Abstract
Ontologies are intended to capture and formalize a domain of knowledge. The
ontologies comprising the Open Biological Ontologies (OBO) project, which includes
the Gene Ontology (GO), are formalizations of various domains of biological
knowledge. Ontologies within OBO typically lack computable definitions that serve to
differentiate a term from other similar terms. The computer is unable to determine the
meaning of a term, which presents problems for tools such as automated reasoners.
Reasoners can be of enormous benefit in managing a complex ontology. OBO term
names frequently implicitly encode the kind of definitions that can be used by
computational tools, such as automated reasoners. The definitions encoded in the
names are not easily amenable to computation, because the names are ostensibly
natural language phrases designed for human users. These names are highly regular
in their grammar, and can thus be treated as valid sentences in some formal or
computable language.With a description of the rules underlying this formal language,
term names can be parsed to derive computable definitions, which can then be
reasoned over. This paper describes the effort to elucidate that language, called Obol,
and the attempts to reason over the resulting definitions. The current implementation
finds unique non-trivial definitions for around half of the terms in the GO, and
has been used to find 223 missing relationships, which have since been added to
the ontology. Obol has utility as an ontology maintenance tool, and as a means of
generating computable definitions for a whole ontology. The software is available under an open-source license from: http://www.fruitfly.
org/~cjm/obol. Supplementary material for this article can be found at: http://www.
interscience.wiley.com/jpages/1531-6912/suppmat.
Collapse
Affiliation(s)
- Christopher J Mungall
- HHMI, Department of Molecular and Cellular Biology, 142 Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA.
| |
Collapse
|
22
|
Baker EJ, Jay JJ, Philip VM, Zhang Y, Li Z, Kirova R, Langston MA, Chesler EJ. Ontological Discovery Environment: a system for integrating gene-phenotype associations. Genomics 2009; 94:377-87. [PMID: 19733230 DOI: 10.1016/j.ygeno.2009.08.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 08/19/2009] [Accepted: 08/27/2009] [Indexed: 10/20/2022]
Abstract
The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental systems or species domain.
Collapse
Affiliation(s)
- Erich J Baker
- Department of Computer Science, Baylor University, Waco, TX, USA
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Hancock JM, Mallon AM, Beck T, Gkoutos GV, Mungall C, Schofield PN. Mouse, man, and meaning: bridging the semantics of mouse phenotype and human disease. Mamm Genome 2009; 20:457-61. [PMID: 19649761 PMCID: PMC2759022 DOI: 10.1007/s00335-009-9208-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Accepted: 07/10/2009] [Indexed: 12/16/2022]
Abstract
Now that the laboratory mouse genome is sequenced and the annotation of its gene content is improving, the next major challenge is the annotation of the phenotypic associations of mouse genes. This requires the development of systematic phenotyping pipelines that use standardized phenotyping procedures which allow comparison across laboratories. It also requires the development of a sophisticated informatics infrastructure for the description and interchange of phenotype data. Here we focus on the current state of the art in the description of data produced by systematic phenotyping approaches using ontologies, in particular, the EQ (Entity-Quality) approach, and what developments are required to facilitate the linking of phenotypic descriptions of mutant mice to human diseases.
Collapse
Affiliation(s)
- John M Hancock
- Bioinformatics Group, MRC Harwell, Harwell, Oxfordshire, OX11 0RD, UK.
| | | | | | | | | | | |
Collapse
|
24
|
Abstract
EuroPhenome (http://www.europhenome.org) and EMPReSS (http://empress.har.mrc.ac.uk/) form an integrated resource to provide access to data and procedures for mouse phenotyping. EMPReSS describes 96 Standard Operating Procedures for mouse phenotyping. EuroPhenome contains data resulting from carrying out EMPReSS protocols on four inbred laboratory mouse strains. As well as web interfaces, both resources support web services to enable integration with other mouse phenotyping and functional genetics resources, and are committed to initiatives to improve integration of mouse phenotype databases. EuroPhenome will be the repository for a recently initiated effort to carry out large-scale phenotyping on a large number of knockout mouse lines (EUMODIC).
Collapse
Affiliation(s)
- Ann-Marie Mallon
- MRC Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK.
| | | | | |
Collapse
|
25
|
Friedman C, Borlawsky T, Shagina L, Xing HR, Lussier YA. Bio-Ontology and text: bridging the modeling gap. Bioinformatics 2006; 22:2421-9. [PMID: 16870928 PMCID: PMC2879055 DOI: 10.1093/bioinformatics/btl405] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. Yet, information represented in NLP data structures is classically very different from information organized with ontologies as found in model organisms or genetic databases. To facilitate the computational reuse and integration of information buried in unstructured text with that of genetic databases, we propose and evaluate a translational schema that represents a comprehensive set of phenotypic and genetic entities, as well as their closely related biomedical entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides mappings from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination and knowledge management of heterogeneous phenotypic information. A common comprehensive representation for otherwise heterogeneous phenotypic and genetic datasets, such as the one proposed, is critical for advancing systems biology because it enables acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text. RESULTS A novel representational schema, PGschema, was developed that enables translation of phenotypic, genetic and their closely related information found in textual narratives to a well-defined data structure comprising phenotypic and genetic concepts from established ontologies along with modifiers and relationships. Evaluation for coverage of a selected set of entities showed that 90% of the information could be represented (95% confidence interval: 86-93%; n = 268). Moreover, PGschema can be expressed automatically in an XML format using natural language techniques to process the text. To our knowledge, we are providing the first evaluation of a translational schema for NLP that contains declarative knowledge about genes and their associated biomedical data (e.g. phenotypes). AVAILABILITY http://zellig.cpmc.columbia.edu/PGschema
Collapse
Affiliation(s)
- Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | |
Collapse
|
26
|
Bao L, Wei L, Peirce JL, Homayouni R, Li H, Zhou M, Chen H, Lu L, Williams RW, Pfeffer LM, Goldowitz D, Cui Y. Combining gene expression QTL mapping and phenotypic spectrum analysis to uncover gene regulatory relationships. Mamm Genome 2006; 17:575-83. [PMID: 16783639 DOI: 10.1007/s00335-005-0172-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 02/21/2006] [Indexed: 01/17/2023]
Abstract
Gene expression QTL (eQTL) mapping can suggest candidate regulatory relationships between genes. Recent advances in mammalian phenotype annotation such as mammalian phenotype ontology (MPO) enable systematic analysis of the phenotypic spectrum subserved by many genes. In this study we combined eQTL mapping and phenotypic spectrum analysis to predict gene regulatory relationships. Five pairs of genes with similar phenotypic effects and potential regulatory relationships suggested by eQTL mapping were identified. Lines of evidence supporting some of the predicted regulatory relationships were obtained from biological literature. A particularly notable example is that promoter sequence analysis and real-time PCR assays support the predicted regulation of protein kinase C epsilon (Prkce) by cAMP responsive element binding protein 1 (Creb1). Our results show that the combination of gene eQTL mapping and phenotypic spectrum analysis may provide a valuable approach to uncovering gene regulatory relations underlying mammalian phenotypes.
Collapse
Affiliation(s)
- Lei Bao
- Department of Molecular Sciences, University of Tennessee Health Science Center, 858 Madison Avenue, Memphis, TN 38163, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Köhler J, Munn K, Rüegg A, Skusa A, Smith B. Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics 2006; 7:212. [PMID: 16623942 PMCID: PMC1482721 DOI: 10.1186/1471-2105-7-212] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2005] [Accepted: 04/19/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO), the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way. RESULTS We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO. CONCLUSION Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.
Collapse
Affiliation(s)
- Jacob Köhler
- Biomathematics and Bioinformatics Division, Rothamsted Research, Harpenden, UK
| | - Katherine Munn
- Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken, Germany
| | | | - Andre Skusa
- Technical Faculty, Bielefeld University, Bielefeld, Germany
| | - Barry Smith
- Department of Philosophy, University at Buffalo, NY, USA
- Institute for Formal Ontology and Medical Information Science, Saarland University, Saarbrücken, Germany
| |
Collapse
|
28
|
Abstract
Computational resources are now using the tissue names of the major model organisms so that tissue-associated data can be archived in and retrieved from databases on the basis of developing and adult anatomy. For this to be done, the set of tissues in that organism (its anatome) has to be organized in a way that is computer-comprehensible. Indeed, such formalization is a necessary part of what is becoming known as systems biology, in which explanations of high-level biological phenomena are not only sought in terms of lower-level events, but are articulated within a computational framework. Lists of tissue names alone, however, turn out to be inadequate for this formalization because tissue organization is essentially hierarchical and thus cannot easily be put into tables, the natural format of relational databases. The solution now adopted is to organize the anatomy of each organism as a hierarchy of tissue names and linking relationships (e.g. the tibia is PART OF the leg, the tibia IS-A bone) within what are known as ontologies. In these, a unique ID is assigned to each tissue and this can be used within, for example, gene-expression databases to link data to tissue organization, and also used to query other data sources (interoperability), while inferences about the anatomy can be made within the ontology on the basis of the relationships. There are now about 15 such anatomical ontologies, many of which are linked to organism databases; these ontologies are now publicly available at the Open Biological Ontologies website (http://obo.sourceforge.net) from where they can be freely downloaded and viewed using standard tools. This review considers how anatomy is formalized within ontologies, together with the problems that have had to be solved for this to be done. It is suggested that the appropriate term for the analysis, computer formulation and use of the anatome is anatomics.
Collapse
|
29
|
Gkoutos GV, Green ECJ, Mallon AM, Hancock JM, Davidson D. Using ontologies to describe mouse phenotypes. Genome Biol 2004; 6:R8. [PMID: 15642100 PMCID: PMC549069 DOI: 10.1186/gb-2004-6-1-r8] [Citation(s) in RCA: 168] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2004] [Revised: 11/11/2004] [Accepted: 12/06/2004] [Indexed: 11/29/2022] Open
Abstract
By combining ontologies from different sources the authors developed a novel approach to describing phenotypes of mutant mice in a standard, structured manner. The mouse is an important model of human genetic disease. Describing phenotypes of mutant mice in a standard, structured manner that will facilitate data mining is a major challenge for bioinformatics. Here we describe a novel, compositional approach to this problem which combines core ontologies from a variety of sources. This produces a framework with greater flexibility, power and economy than previous approaches. We discuss some of the issues this approach raises.
Collapse
Affiliation(s)
- Georgios V Gkoutos
- Bioinformatics Group, MRC Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK
| | - Eain CJ Green
- Bioinformatics Group, MRC Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK
| | - Ann-Marie Mallon
- Bioinformatics Group, MRC Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK
| | - John M Hancock
- Bioinformatics Group, MRC Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK
| | | |
Collapse
|
30
|
Appels R, Bellgard M. Looking through genomics--from the editors. Funct Integr Genomics 2004; 5:1-3. [PMID: 15580368 DOI: 10.1007/s10142-004-0129-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|