1
|
Zheng L, Min H, Chen Y, Keloth V, Geller J, Perl Y, Hripcsak G. Outlier concepts auditing methodology for a large family of biomedical ontologies. BMC Med Inform Decis Mak 2020; 20:296. [PMID: 33319713 PMCID: PMC7737254 DOI: 10.1186/s12911-020-01311-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 10/28/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Summarization networks are compact summaries of ontologies. The "Big Picture" view offered by summarization networks enables to identify sets of concepts that are more likely to have errors than control concepts. For ontologies that have outgoing lateral relationships, we have developed the "partial-area taxonomy" summarization network. Prior research has identified one kind of outlier concepts, concepts of small partials-areas within partial-area taxonomies. Previously we have shown that the small partial-area technique works successfully for four ontologies (or their hierarchies). METHODS To improve the Quality Assurance (QA) scalability, a family-based QA framework, where one QA technique is potentially applicable to a whole family of ontologies with similar structural features, was developed. The 373 ontologies hosted at the NCBO BioPortal in 2015 were classified into a collection of families based on structural features. A meta-ontology represents this family collection, including one family of ontologies having outgoing lateral relationships. The process of updating the current meta-ontology is described. To conclude that one QA technique is applicable for at least half of the members for a family F, this technique should be demonstrated as successful for six out of six ontologies in F. We describe a hypothesis setting the condition required for a technique to be successful for a given ontology. The process of a study to demonstrate such success is described. This paper intends to prove the scalability of the small partial-area technique. RESULTS We first updated the meta-ontology classifying 566 BioPortal ontologies. There were 371 ontologies in the family with outgoing lateral relationships. We demonstrated the success of the small partial-area technique for two ontology hierarchies which belong to this family, SNOMED CT's Specimen hierarchy and NCIt's Gene hierarchy. Together with the four previous ontologies from the same family, we fulfilled the "six out of six" condition required to show the scalability for the whole family. CONCLUSIONS We have shown that the small partial-area technique can be potentially successful for the family of ontologies with outgoing lateral relationships in BioPortal, thus improve the scalability of this QA technique.
Collapse
Affiliation(s)
- Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, 07764, USA.
| | - Hua Min
- Department of Health Administration and Policy, George Mason University, Fairfax, VA, 22030, USA
| | - Yan Chen
- CIS Department, Borough of Manhattan Community College, CUNY, New York, NY, 10007, USA
| | - Vipina Keloth
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, 10032, USA
| |
Collapse
|
2
|
Zheng L, Chen Y, Min H, Hildebrand PL, Liu H, Halper M, Geller J, de Coronado S, Perl Y. Missing lateral relationships in top-level concepts of an ontology. BMC Med Inform Decis Mak 2020; 20:305. [PMID: 33319709 PMCID: PMC7737264 DOI: 10.1186/s12911-020-01319-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. METHODS The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. RESULTS Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT's Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. CONCLUSIONS Quality assurance is a critical part of an ontology's lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt's Biological Process hierarchy and SNOMED CT's Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel.
Collapse
Affiliation(s)
- Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, 07764, USA.
| | - Yan Chen
- CIS Department, Borough of Manhattan Community College, CUNY, New York, NY, 10007, USA
| | - Hua Min
- Department of Health Administration and Policy, George Mason University, Fairfax, VA, 22030, USA
| | | | - Hao Liu
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Michael Halper
- Department of Informatics, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Sherri de Coronado
- National Cancer Institute, Center for Biomedical Informatics and Information Technology, National Institutes of Health, Rockville, MD, 20850, USA
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| |
Collapse
|
3
|
Haendel MA, McMurry JA, Relevo R, Mungall CJ, Robinson PN, Chute CG. A Census of Disease Ontologies. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013459] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.
Collapse
Affiliation(s)
- Melissa A. Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon 97239, USA
- Linus Pauling Institute, Oregon State University, Corvallis, Oregon 97331, USA
| | - Julie A. McMurry
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Rose Relevo
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | | | - Christopher G. Chute
- School of Medicine, School of Public Health, and School of Nursing, Johns Hopkins University, Baltimore, Maryland 21205, USA
| |
Collapse
|
4
|
Amith M, He Z, Bian J, Lossio-Ventura JA, Tao C. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities. J Biomed Inform 2018; 80:1-13. [PMID: 29462669 PMCID: PMC5882531 DOI: 10.1016/j.jbi.2018.02.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 02/12/2018] [Accepted: 02/16/2018] [Indexed: 11/26/2022]
Abstract
With the proliferation of heterogeneous health care data in the last three decades, biomedical ontologies and controlled biomedical terminologies play a more and more important role in knowledge representation and management, data integration, natural language processing, as well as decision support for health information systems and biomedical research. Biomedical ontologies and controlled terminologies are intended to assure interoperability. Nevertheless, the quality of biomedical ontologies has hindered their applicability and subsequent adoption in real-world applications. Ontology evaluation is an integral part of ontology development and maintenance. In the biomedicine domain, ontology evaluation is often conducted by third parties as a quality assurance (or auditing) effort that focuses on identifying modeling errors and inconsistencies. In this work, we first organized four categorical schemes of ontology evaluation methods in the existing literature to create an integrated taxonomy. Further, to understand the ontology evaluation practice in the biomedicine domain, we reviewed a sample of 200 ontologies from the National Center for Biomedical Ontology (NCBO) BioPortal-the largest repository for biomedical ontologies-and observed that only 15 of these ontologies have documented evaluation in their corresponding inception papers. We then surveyed the recent quality assurance approaches for biomedical ontologies and their use. We also mapped these quality assurance approaches to the ontology evaluation criteria. It is our anticipation that ontology evaluation and quality assurance approaches will be more widely adopted in the development life cycle of biomedical ontologies.
Collapse
Affiliation(s)
- Muhammad Amith
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | | | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
5
|
Taxonomy-Based Approaches to Quality Assurance of Ontologies. JOURNAL OF HEALTHCARE ENGINEERING 2017; 2017:3495723. [PMID: 29158885 PMCID: PMC5660792 DOI: 10.1155/2017/3495723] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2017] [Accepted: 08/06/2017] [Indexed: 11/17/2022]
Abstract
Ontologies are important components of health information management systems. As such, the quality of their content is of paramount importance. It has been proven to be practical to develop quality assurance (QA) methodologies based on automated identification of sets of concepts expected to have higher likelihood of errors. Four kinds of such sets (called QA-sets) organized around the themes of complex and uncommonly modeled concepts are introduced. A survey of different methodologies based on these QA-sets and the results of applying them to various ontologies are presented. Overall, following these approaches leads to higher QA yields and better utilization of QA personnel. The formulation of additional QA-set methodologies will further enhance the suite of available ontology QA tools.
Collapse
|