1
|
Zheng L, Perl Y, He Y. Big knowledge visualization of the COVID-19 CIDO ontology evolution. BMC Med Inform Decis Mak 2023; 23:88. [PMID: 37161560 PMCID: PMC10169115 DOI: 10.1186/s12911-023-02184-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/20/2023] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND The extensive international research for medications and vaccines for the devastating COVID-19 pandemic requires a standard reference ontology. Among the current COVID-19 ontologies, the Coronavirus Infectious Disease Ontology (CIDO) is the largest one. Furthermore, it keeps growing very frequently. Researchers using CIDO as a reference ontology, need a quick update about the content added in a recent release to know how relevant the new concepts are to their research needs. Although CIDO is only a medium size ontology, it is still a large knowledge base posing a challenge for a user interested in obtaining the "big picture" of content changes between releases. Both a theoretical framework and a proper visualization are required to provide such a "big picture". METHODS The child-of-based layout of the weighted aggregate partial-area taxonomy summarization network (WAT) provides a "big picture" convenient visualization of the content of an ontology. In this paper we address the "big picture" of content changes between two releases of an ontology. We introduce a new DIFF framework named Diff Weighted Aggregate Taxonomy (DWAT) to display the differences between the WATs of two releases of an ontology. We use a layered approach which consists first of a DWAT of major subjects in CIDO, and then drill down a major subject of interest in the top-level DWAT to obtain a DWAT of secondary subjects and even further refined layers. RESULTS A visualization of the Diff Weighted Aggregate Taxonomy is demonstrated on the CIDO ontology. The evolution of CIDO between 2020 and 2022 is demonstrated in two perspectives. Drilling down for a DWAT of secondary subject networks is also demonstrated. We illustrate how the DWAT of CIDO provides insight into its evolution. CONCLUSIONS The new Diff Weighted Aggregate Taxonomy enables a layered approach to view the "big picture" of the changes in the content between two releases of an ontology.
Collapse
Affiliation(s)
- Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA.
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
2
|
Amith M, He Z, Bian J, Lossio-Ventura JA, Tao C. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities. J Biomed Inform 2018; 80:1-13. [PMID: 29462669 PMCID: PMC5882531 DOI: 10.1016/j.jbi.2018.02.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 02/12/2018] [Accepted: 02/16/2018] [Indexed: 11/26/2022]
Abstract
With the proliferation of heterogeneous health care data in the last three decades, biomedical ontologies and controlled biomedical terminologies play a more and more important role in knowledge representation and management, data integration, natural language processing, as well as decision support for health information systems and biomedical research. Biomedical ontologies and controlled terminologies are intended to assure interoperability. Nevertheless, the quality of biomedical ontologies has hindered their applicability and subsequent adoption in real-world applications. Ontology evaluation is an integral part of ontology development and maintenance. In the biomedicine domain, ontology evaluation is often conducted by third parties as a quality assurance (or auditing) effort that focuses on identifying modeling errors and inconsistencies. In this work, we first organized four categorical schemes of ontology evaluation methods in the existing literature to create an integrated taxonomy. Further, to understand the ontology evaluation practice in the biomedicine domain, we reviewed a sample of 200 ontologies from the National Center for Biomedical Ontology (NCBO) BioPortal-the largest repository for biomedical ontologies-and observed that only 15 of these ontologies have documented evaluation in their corresponding inception papers. We then surveyed the recent quality assurance approaches for biomedical ontologies and their use. We also mapped these quality assurance approaches to the ontology evaluation criteria. It is our anticipation that ontology evaluation and quality assurance approaches will be more widely adopted in the development life cycle of biomedical ontologies.
Collapse
Affiliation(s)
- Muhammad Amith
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | | | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
3
|
Zheng L, Yumak H, Chen L, Ochs C, Geller J, Kapusnik-Uner J, Perl Y. Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology. J Biomed Inform 2017; 73:30-42. [PMID: 28723580 DOI: 10.1016/j.jbi.2017.07.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 07/13/2017] [Accepted: 07/14/2017] [Indexed: 02/04/2023]
Abstract
The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of action, dosage form and physiological effects. Within NDF-RT such information is represented using tens of thousands of roles connecting drugs to classifications. In previous studies, we have introduced various kinds of Abstraction Networks to summarize the content and structure of terminologies in order to facilitate their visual comprehension, and support quality assurance of terminologies. However, these previous kinds of Abstraction Networks are not appropriate for summarizing the NDF-RT classification hierarchies, due to its unique structure. In this paper, we present the novel Ingredient Abstraction Network (IAbN) to summarize, visualize and support the audit of NDF-RT's Chemical Ingredients hierarchy and its associated drugs. A common theme in our quality assurance framework is to use characterizations of sets of concepts, revealed by the Abstraction Network structure, to capture concepts, the modeling of which is more complex than for other concepts. For the IAbN, we characterize drug ingredient concepts as more complex if they belong to IAbN groups with multiple parent groups. We show that such concepts have a statistically significantly higher rate of errors than a control sample and identify two especially common patterns of errors.
Collapse
Affiliation(s)
- Ling Zheng
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| | - Hasan Yumak
- BMCC, CUNY, New York, NY 10007, United States.
| | - Ling Chen
- BMCC, CUNY, New York, NY 10007, United States.
| | - Christopher Ochs
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| | | | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| |
Collapse
|
4
|
An empirical analysis of ontology reuse in BioPortal. J Biomed Inform 2017; 71:165-177. [PMID: 28583809 DOI: 10.1016/j.jbi.2017.05.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 05/27/2017] [Accepted: 05/29/2017] [Indexed: 01/16/2023]
Abstract
Biomedical ontologies often reuse content (i.e., classes and properties) from other ontologies. Content reuse enables a consistent representation of a domain and reusing content can save an ontology author significant time and effort. Prior studies have investigated the existence of reused terms among the ontologies in the NCBO BioPortal, but as of yet there has not been a study investigating how the ontologies in BioPortal utilize reused content in the modeling of their own content. In this study we investigate how 355 ontologies hosted in the NCBO BioPortal reuse content from other ontologies for the purposes of creating new ontology content. We identified 197 ontologies that reuse content. Among these ontologies, 108 utilize reused classes in the modeling of their own classes and 116 utilize reused properties in class restrictions. Current utilization of reuse and quality issues related to reuse are discussed.
Collapse
|
5
|
Ochs C, Case JT, Perl Y. Analyzing structural changes in SNOMED CT's Bacterial infectious diseases using a visual semantic delta. J Biomed Inform 2017; 67:101-116. [PMID: 28215561 DOI: 10.1016/j.jbi.2017.02.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/08/2017] [Accepted: 02/09/2017] [Indexed: 12/23/2022]
Abstract
Thousands of changes are applied to SNOMED CT's concepts during each release cycle. These changes are the result of efforts to improve or expand the coverage of health domains in the terminology. Understanding which concepts changed, how they changed, and the overall impact of a set of changes is important for editors and end users. Each SNOMED CT release comes with delta files, which identify all of the individual additions and removals of concepts and relationships. These files typically contain tens of thousands of individual entries, overwhelming users. They also do not identify the editorial processes that were applied to individual concepts and they do not capture the overall impact of a set of changes on a subhierarchy of concepts. In this paper we introduce a methodology and accompanying software tool called a SNOMED CT Visual Semantic Delta ("semantic delta" for short) to enable a comprehensive review of changes in SNOMED CT. The semantic delta displays a graphical list of editing operations that provides semantics and context to the additions and removals in the delta files. However, there may still be thousands of editing operations applied to a set of concepts. To address this issue, a semantic delta includes a visual summary of changes that affected sets of structurally and semantically similar concepts. The software tool for creating semantic deltas offers views of various granularities, allowing a user to control how much change information they view. In this tool a user can select a set of structurally and semantically similar concepts and review the editing operations that affected their modeling. The semantic delta methodology is demonstrated on SNOMED CT's Bacterial infectious disease subhierarchy, which has undergone a significant remodeling effort over the last two years.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.
| | - James T Case
- National Library of Medicine/National Institutes of Health, Bethesda, MD 20894, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| |
Collapse
|
6
|
Ochs C, Case JT, Perl Y. Tracking the Remodeling of SNOMED CT's Bacterial Infectious Diseases. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:974-983. [PMID: 28269894 PMCID: PMC5333319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
SNOMED CT's content undergoes many changes from one release to the next. Over the last year SNOMED CT's Bacterial infectious disease subhierarchy has undergone significant editing to bring consistent modeling to its concepts. In this paper we analyze the stated and inferred structural modifications that affected the Bacterial infectious disease subhierarchy between the Jan 2015 and Jan 2016 SNOMED CT releases using a two-phased approach. First, we introduce a methodology for creating a human readable list of changes. Next, we utilize partial-area taxonomies, which are compact summaries of SNOMED CT's content and structure, to identify the "big picture" changes that occurred in the subhierarchy. We illustrate how partial-area taxonomies can be used to help identify groups of concepts that were affected by these editing operations and the nature of these changes. Modeling issues identified using our two-phase methodology are discussed.
Collapse
|
7
|
Modelling Medications for Public Health Research. Online J Public Health Inform 2017; 8:e190. [PMID: 28149446 PMCID: PMC5266755 DOI: 10.5210/ojphi.v8i2.6809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Most patients with chronic disease are prescribed multiple medications, which are recorded in
their personal health records. This is rich information for clinical public health researchers but
also a challenge to analyse. This paper describes the method that was undertaken within the Public
Health Research Data Management System (PHReDMS) to map medication data retrieved from individual
patient health records for population health researcher’s use. The PHReDMS manages clinical,
health service, community and survey research data within a secure web environment that allows for
data sharing amongst researchers. The PHReDMS is currently used by researchers to answer a broad
range of questions, including monitoring of prescription patterns in different population groups and
geographic areas with high incidence/prevalence of chronic renal, cardiovascular, metabolic and
mental health issues. In this paper, we present the general notion of abstraction network, a higher
level network that sits above a terminology and offers compact and more easily understandable view
of its content. We demonstrate the utilisation of abstraction network methodology to examine
medication data from electronic medical records to allow a compact and more easily understandable
view of its content.
Collapse
|
8
|
Groß A, Pruski C, Rahm E. Evolution of biomedical ontologies and mappings: Overview of recent approaches. Comput Struct Biotechnol J 2016; 14:333-40. [PMID: 27642503 PMCID: PMC5018063 DOI: 10.1016/j.csbj.2016.08.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 08/19/2016] [Accepted: 08/23/2016] [Indexed: 11/16/2022] Open
Abstract
Biomedical ontologies are heavily used to annotate data, and different ontologies are often interlinked by ontology mappings. These ontology-based mappings and annotations are used in many applications and analysis tasks. Since biomedical ontologies are continuously updated dependent artifacts can become outdated and need to undergo evolution as well. Hence there is a need for largely automated approaches to keep ontology-based mappings up-to-date in the presence of evolving ontologies. In this article, we survey current approaches and novel directions in the context of ontology and mapping evolution. We will discuss requirements for mapping adaptation and provide a comprehensive overview on existing approaches. We will further identify open challenges and outline ideas for future developments.
Collapse
Affiliation(s)
- Anika Groß
- Institute of Computer Science, Universität Leipzig, P.O. Box 100920, 04009 Leipzig, Germany
| | - Cédric Pruski
- Luxembourg Institute of Science and Technology, 5 Avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg
| | - Erhard Rahm
- Institute of Computer Science, Universität Leipzig, P.O. Box 100920, 04009 Leipzig, Germany
| |
Collapse
|
9
|
Ochs C, Geller J, Perl Y, Musen MA. A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies. J Biomed Inform 2016; 62:90-105. [PMID: 27345947 DOI: 10.1016/j.jbi.2016.06.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 06/02/2016] [Accepted: 06/22/2016] [Indexed: 11/27/2022]
Abstract
Software tools play a critical role in the development and maintenance of biomedical ontologies. One important task that is difficult without software tools is ontology quality assurance. In previous work, we have introduced different kinds of abstraction networks to provide a theoretical foundation for ontology quality assurance tools. Abstraction networks summarize the structure and content of ontologies. One kind of abstraction network that we have used repeatedly to support ontology quality assurance is the partial-area taxonomy. It summarizes structurally and semantically similar concepts within an ontology. However, the use of partial-area taxonomies was ad hoc and not generalizable. In this paper, we describe the Ontology Abstraction Framework (OAF), a unified framework and software system for deriving, visualizing, and exploring partial-area taxonomy abstraction networks. The OAF includes support for various ontology representations (e.g., OWL and SNOMED CT's relational format). A Protégé plugin for deriving "live partial-area taxonomies" is demonstrated.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA.
| | - James Geller
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Mark A Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
10
|
|
11
|
Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies. J Biomed Inform 2016; 61:63-76. [PMID: 26988001 DOI: 10.1016/j.jbi.2016.03.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 02/05/2016] [Accepted: 03/04/2016] [Indexed: 11/22/2022]
Abstract
An Abstraction Network is a compact summary of an ontology's structure and content. In previous research, we showed that Abstraction Networks support quality assurance (QA) of biomedical ontologies. The development of an Abstraction Network and its associated QA methodologies, however, is a labor-intensive process that previously was applicable only to one ontology at a time. To improve the efficiency of the Abstraction-Network-based QA methodology, we introduced a QA framework that uses uniform Abstraction Network derivation techniques and QA methodologies that are applicable to whole families of structurally similar ontologies. For the family-based framework to be successful, it is necessary to develop a method for classifying ontologies into structurally similar families. We now describe a structural meta-ontology that classifies ontologies according to certain structural features that are commonly used in the modeling of ontologies (e.g., object properties) and that are important for Abstraction Network derivation. Each class of the structural meta-ontology represents a family of ontologies with identical structural features, indicating which types of Abstraction Networks and QA methodologies are potentially applicable to all of the ontologies in the family. We derive a collection of 81 families, corresponding to classes of the structural meta-ontology, that enable a flexible, streamlined family-based QA methodology, offering multiple choices for classifying an ontology. The structure of 373 ontologies from the NCBO BioPortal is analyzed and each ontology is classified into multiple families modeled by the structural meta-ontology.
Collapse
|
12
|
Thessen AE, Bunker DE, Buttigieg PL, Cooper LD, Dahdul WM, Domisch S, Franz NM, Jaiswal P, Lawrence-Dill CJ, Midford PE, Mungall CJ, Ramírez MJ, Specht CD, Vogt L, Vos RA, Walls RL, White JW, Zhang G, Deans AR, Huala E, Lewis SE, Mabee PM. Emerging semantics to link phenotype and environment. PeerJ 2015; 3:e1470. [PMID: 26713234 PMCID: PMC4690371 DOI: 10.7717/peerj.1470] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 11/12/2015] [Indexed: 11/20/2022] Open
Abstract
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.
Collapse
Affiliation(s)
- Anne E. Thessen
- Ronin Institute for Independent Scholarship, Monclair, NJ, United States
- The Data Detektiv, Waltham, MA, United States
| | - Daniel E. Bunker
- Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, United States
| | - Pier Luigi Buttigieg
- HGF-MPG Group for Deep Sea Ecology and Technology, Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar-und Meeresforschung, Bremerhaven, Germany
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Wasila M. Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| | - Sami Domisch
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Carolyn J. Lawrence-Dill
- Departments of Genetics, Development and Cell Biology and Agronomy, Iowa State University, Ames, IA, United States
| | | | | | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales–CONICET, Buenos Aires, Argentina
| | - Chelsea D. Specht
- Departments of Plant and Microbial Biology & Integrative Biology, University of California, Berkeley, CA, United States
| | - Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany
| | | | - Ramona L. Walls
- iPlant Collaborative, University of Arizona, Tucson, AZ, United States
| | - Jeffrey W. White
- US Arid Land Agricultural Research Center, United States Department of Agriculture—ARS, Maricopa, AZ, United States
| | - Guanyang Zhang
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, PA, United States
| | - Eva Huala
- Phoenix Bioinformatics, Redwood City, CA, United States
| | - Suzanna E. Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| |
Collapse
|