1
|
Dirkson A, Verberne S, van Oortmerssen G, Gelderblom H, Kraaij W. How do others cope? Extracting coping strategies for adverse drug events from social media. J Biomed Inform 2023; 139:104228. [PMID: 36309197 DOI: 10.1016/j.jbi.2022.104228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 09/09/2022] [Accepted: 10/09/2022] [Indexed: 02/16/2023]
Abstract
Patients advise their peers on how to cope with their illness in daily life on online support groups. To date, no efforts have been made to automatically extract recommended coping strategies from online patient discussion groups. We introduce this new task, which poses a number of challenges including complex, long entities, a large long-tailed label space, and cross-document relations. We present an initial ontology for coping strategies as a starting point for future research on coping strategies, and the first end-to-end pipeline for extracting coping strategies for side effects. We also compared two possible computational solutions for this novel and highly challenging task; multi-label classification and named entity recognition (NER) with entity linking (EL). We evaluated our methods on the discussion forum from the Facebook group of the worldwide patient support organization 'GIST support international' (GSI); GIST support international donated the data to us. We found that coping strategy extraction is difficult and both methods attain limited performance (measured with F1 score) on held out test sets; multi-label classification outperforms NER+EL (F1=0.220 vs F1=0.155). An inspection of the multi-label classification output revealed that for some of the incorrect predictions, the reference label is close to the predicted label in the ontology (e.g. the predicted label 'juice' instead of the more specific reference label 'grapefruit juice'). Performance increased to F1=0.498 when we evaluated at a coarser level of the ontology. We conclude that our pipeline can be used in a semi-automatic setting, in interaction with domain experts to discover coping strategies for side effects from a patient forum. For example, we found that patients recommend ginger tea for nausea and magnesium and potassium supplements for cramps. This information can be used as input for patient surveys or clinical studies.
Collapse
Affiliation(s)
- Anne Dirkson
- Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, Netherlands.
| | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, Netherlands.
| | - Gerard van Oortmerssen
- Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, Netherlands.
| | - Hans Gelderblom
- Department of Medical Oncology, Leiden University Medical Centre, Albinusdreef 2, 2333 ZA Leiden, Netherlands.
| | - Wessel Kraaij
- Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, Netherlands.
| |
Collapse
|
2
|
Zheng L, Chen Y, Min H, Hildebrand PL, Liu H, Halper M, Geller J, de Coronado S, Perl Y. Missing lateral relationships in top-level concepts of an ontology. BMC Med Inform Decis Mak 2020; 20:305. [PMID: 33319709 PMCID: PMC7737264 DOI: 10.1186/s12911-020-01319-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored. METHODS The area taxonomy and the subtaxonomy are deployed to help reveal concepts that have a high likelihood of exhibiting missing relationship errors. A specific top-level grouping unit found within the area taxonomy and subtaxonomy, when deemed to be anomalous, is used as an indicator that missing relationship errors are likely to be found among certain concepts. Two hypotheses pertaining to the effectiveness of our Quality Assurance approach are studied. RESULTS Our Quality Assurance methodology was applied to the Biological Process hierarchy of the National Cancer Institute thesaurus (NCIt) and SNOMED CT's Eye/vision finding subhierarchy within its Clinical finding hierarchy. Many missing relationship errors were discovered and confirmed in our analysis. For both test-bed hierarchies, our Quality Assurance methodology yielded a statistically significantly higher number of concepts with missing relationship errors in comparison to a control sample of concepts. Two hypotheses are confirmed by these findings. CONCLUSIONS Quality assurance is a critical part of an ontology's lifecycle, and automated or semi-automated tools for supporting this process are invaluable. We introduced a Quality Assurance methodology targeted at missing relationship errors. Its successful application to the NCIt's Biological Process hierarchy and SNOMED CT's Eye/vision finding subhierarchy indicates that it can be a useful addition to the arsenal of tools available to ontology maintenance personnel.
Collapse
Affiliation(s)
- Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, 07764, USA.
| | - Yan Chen
- CIS Department, Borough of Manhattan Community College, CUNY, New York, NY, 10007, USA
| | - Hua Min
- Department of Health Administration and Policy, George Mason University, Fairfax, VA, 22030, USA
| | | | - Hao Liu
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Michael Halper
- Department of Informatics, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Sherri de Coronado
- National Cancer Institute, Center for Biomedical Informatics and Information Technology, National Institutes of Health, Rockville, MD, 20850, USA
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| |
Collapse
|
3
|
Liu H, Perl Y, Geller J. Concept placement using BERT trained by transforming and summarizing biomedical ontology structure. J Biomed Inform 2020; 112:103607. [PMID: 33098987 DOI: 10.1016/j.jbi.2020.103607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 09/07/2020] [Accepted: 10/17/2020] [Indexed: 11/17/2022]
Abstract
The comprehensive modeling and hierarchical positioning of a new concept in an ontology heavily relies on its set of proper subsumption relationships (IS-As) to other concepts. Identifying a concept's IS-A relationships is a laborious task requiring curators to have both domain knowledge and terminology skills. In this work, we propose a method to automatically predict the presence of IS-A relationships between a new concept and pre-existing concepts based on the language representation model BERT. This method converts the neighborhood network of a concept into "sentences" and harnesses BERT's Next Sentence Prediction (NSP) capability of predicting the adjacency of two sentences. To augment our method's performance, we refined the training data by employing an ontology summarization technique. We trained our model with the two largest hierarchies of the SNOMED CT 2017 July release and applied it to predicting the parents of new concepts added in the SNOMED CT 2018 January release. The results showed that our method achieved an average F1 score of 0.88, and the average Recall score improves slightly from 0.94 to 0.96 by using the ontology summarization technique.
Collapse
Affiliation(s)
- Hao Liu
- Dept of Computer Science, NJIT, Newark, NJ, USA.
| | | | | |
Collapse
|
4
|
Agrawal A, Qazi K. Detecting modeling inconsistencies in SNOMED CT using a machine learning technique. Methods 2020; 179:111-118. [PMID: 32442671 DOI: 10.1016/j.ymeth.2020.05.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 05/07/2020] [Accepted: 05/18/2020] [Indexed: 11/19/2022] Open
Abstract
SNOMED CT is a comprehensive and evolving clinical reference terminology that has been widely adopted as a common vocabulary to promote interoperability between Electronic Health Records. Owing to its importance in healthcare, quality assurance becomes an integral part of the lifecycle of SNOMED CT. While, manual auditing of every concept in SNOMED CT is difficult and labor intensive, identifying inconsistencies in the modeling of concepts without any context can be challenging. Algorithmic techniques are needed to identify modeling inconsistencies, if any, in SNOMED CT. This study proposes a context-based, machine learning quality assurance technique to identify concepts in SNOMED CT that may be in need of auditing. The Clinical Finding and the Procedure hierarchies are used as a testbed to check the efficacy of the method. Results of auditing show that the method identified inconsistencies in 72% of the concept pairs that were deemed inconsistent by the algorithm. The method is shown to be effective in both maximizing the yield of correction, as well as providing a context to identify the inconsistencies. Such methods, along with SNOMED International's own efforts, can greatly help reduce inconsistencies in SNOMED CT.
Collapse
Affiliation(s)
- Ankur Agrawal
- Department of Computer Science, Manhattan College, NY, USA.
| | | |
Collapse
|
5
|
Agrawal A. Evaluating lexical similarity and modeling discrepancies in the procedure hierarchy of SNOMED CT. BMC Med Inform Decis Mak 2018; 18:88. [PMID: 30537959 PMCID: PMC6290591 DOI: 10.1186/s12911-018-0673-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND SNOMED CT is a standardized and comprehensive clinical terminology that is used in Electronic Health Records to capture, store and access clinical data of patients. Studies have, however, shown that there are inconsistencies inherent in the modeling of concepts in SNOMED CT that can have an impact on its usage to record clinical data and in clinical decision-making tools. METHODS An effective lexical approach to identifying inconsistencies with high likelihood in the structural modeling of the concepts of SNOMED CT is discussed and assessed. The approach uses the two or more concepts in the context of their lexical similarity to compare their modeling in order to identify inconsistencies. A sample of 50 sets is randomly picked from the Procedure hierarchy of SNOMED CT and evaluated for inconsistencies. RESULTS Of the 50 randomly picked sets, 58% are found to exhibit one or more concepts with inconsistencies. In terms of concepts, 29% of the 146 concepts are found to exhibit one or more inconsistencies. CONCLUSIONS The assessment of the sample concepts shows that SNOMED CT is not free from inconsistencies which may affect its use in clinical care and decision support systems. The proposed methodology is found to be effective in identifying areas of SNOMED CT that may be in need of quality assessment.
Collapse
Affiliation(s)
- Ankur Agrawal
- Department of Computer Science, Manhattan College, New York, NY, USA.
| |
Collapse
|
6
|
Zheng L, Liu H, Perl Y, Geller J, Ochs C, Case JT. Overlapping Complex Concepts Have More Commission Errors, Especially in Intensive Terminology Auditing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:1157-1166. [PMID: 30815158 PMCID: PMC6371375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
SNOMED CT is a large, complex and widely-used terminology. Auditing is part of the life cycle of terminologies. A review of terminologies' content can identify two error categories: commission errors, such as an incorrect parent or attribute relationship, indicating errors in a concept's modeling, and omission errors, such as missing a parent or attribute relationship, representing incomplete modeling of a concept. According to our experience, terminology curators are mostly interested in commission errors. In recent years, a long-term remodeling project has addressed modeling issues in SNOMED CT's Infectious disease and Congenital disease subhierarchies. In this longitudinal study, we investigated a posteriori the efficacy of complex concepts, called overlapping concepts, to identify commission errors during intensive auditing periods and during maintenance periods over several releases. The algorithmic implication is that when auditing resources are scarce, a methodology of auditing first, or only, the overlapping concepts will obtain a higher auditing yield.
Collapse
Affiliation(s)
- Ling Zheng
- Monmouth University, West Long Branch, NJ, US
| | - Hao Liu
- New Jersey Institute of Technology, Newark, NJ, US
| | | | - James Geller
- New Jersey Institute of Technology, Newark, NJ, US
| | | | | |
Collapse
|
7
|
Quality assurance of biomedical terminologies and ontologies. J Biomed Inform 2018; 86:106-108. [PMID: 30205171 DOI: 10.1016/j.jbi.2018.09.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 09/07/2018] [Indexed: 11/22/2022]
|
8
|
Amith M, He Z, Bian J, Lossio-Ventura JA, Tao C. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities. J Biomed Inform 2018; 80:1-13. [PMID: 29462669 PMCID: PMC5882531 DOI: 10.1016/j.jbi.2018.02.010] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 02/12/2018] [Accepted: 02/16/2018] [Indexed: 11/26/2022]
Abstract
With the proliferation of heterogeneous health care data in the last three decades, biomedical ontologies and controlled biomedical terminologies play a more and more important role in knowledge representation and management, data integration, natural language processing, as well as decision support for health information systems and biomedical research. Biomedical ontologies and controlled terminologies are intended to assure interoperability. Nevertheless, the quality of biomedical ontologies has hindered their applicability and subsequent adoption in real-world applications. Ontology evaluation is an integral part of ontology development and maintenance. In the biomedicine domain, ontology evaluation is often conducted by third parties as a quality assurance (or auditing) effort that focuses on identifying modeling errors and inconsistencies. In this work, we first organized four categorical schemes of ontology evaluation methods in the existing literature to create an integrated taxonomy. Further, to understand the ontology evaluation practice in the biomedicine domain, we reviewed a sample of 200 ontologies from the National Center for Biomedical Ontology (NCBO) BioPortal-the largest repository for biomedical ontologies-and observed that only 15 of these ontologies have documented evaluation in their corresponding inception papers. We then surveyed the recent quality assurance approaches for biomedical ontologies and their use. We also mapped these quality assurance approaches to the ontology evaluation criteria. It is our anticipation that ontology evaluation and quality assurance approaches will be more widely adopted in the development life cycle of biomedical ontologies.
Collapse
Affiliation(s)
- Muhammad Amith
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| | | | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
9
|
Zheng L, Yumak H, Chen L, Ochs C, Geller J, Kapusnik-Uner J, Perl Y. Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology. J Biomed Inform 2017; 73:30-42. [PMID: 28723580 DOI: 10.1016/j.jbi.2017.07.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 07/13/2017] [Accepted: 07/14/2017] [Indexed: 02/04/2023]
Abstract
The National Drug File - Reference Terminology (NDF-RT) is a large and complex drug terminology consisting of several classification hierarchies on top of an extensive collection of drug concepts. These hierarchies provide important information about clinical drugs, e.g., their chemical ingredients, mechanisms of action, dosage form and physiological effects. Within NDF-RT such information is represented using tens of thousands of roles connecting drugs to classifications. In previous studies, we have introduced various kinds of Abstraction Networks to summarize the content and structure of terminologies in order to facilitate their visual comprehension, and support quality assurance of terminologies. However, these previous kinds of Abstraction Networks are not appropriate for summarizing the NDF-RT classification hierarchies, due to its unique structure. In this paper, we present the novel Ingredient Abstraction Network (IAbN) to summarize, visualize and support the audit of NDF-RT's Chemical Ingredients hierarchy and its associated drugs. A common theme in our quality assurance framework is to use characterizations of sets of concepts, revealed by the Abstraction Network structure, to capture concepts, the modeling of which is more complex than for other concepts. For the IAbN, we characterize drug ingredient concepts as more complex if they belong to IAbN groups with multiple parent groups. We show that such concepts have a statistically significantly higher rate of errors than a control sample and identify two especially common patterns of errors.
Collapse
Affiliation(s)
- Ling Zheng
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| | - Hasan Yumak
- BMCC, CUNY, New York, NY 10007, United States.
| | - Ling Chen
- BMCC, CUNY, New York, NY 10007, United States.
| | - Christopher Ochs
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| | | | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, United States.
| |
Collapse
|
10
|
An empirical analysis of ontology reuse in BioPortal. J Biomed Inform 2017; 71:165-177. [PMID: 28583809 DOI: 10.1016/j.jbi.2017.05.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 05/27/2017] [Accepted: 05/29/2017] [Indexed: 01/16/2023]
Abstract
Biomedical ontologies often reuse content (i.e., classes and properties) from other ontologies. Content reuse enables a consistent representation of a domain and reusing content can save an ontology author significant time and effort. Prior studies have investigated the existence of reused terms among the ontologies in the NCBO BioPortal, but as of yet there has not been a study investigating how the ontologies in BioPortal utilize reused content in the modeling of their own content. In this study we investigate how 355 ontologies hosted in the NCBO BioPortal reuse content from other ontologies for the purposes of creating new ontology content. We identified 197 ontologies that reuse content. Among these ontologies, 108 utilize reused classes in the modeling of their own classes and 116 utilize reused properties in class restrictions. Current utilization of reuse and quality issues related to reuse are discussed.
Collapse
|
11
|
Nissim N, Shahar Y, Elovici Y, Hripcsak G, Moskovitch R. Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods. Artif Intell Med 2017; 81:12-32. [PMID: 28456512 DOI: 10.1016/j.artmed.2017.03.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 03/03/2017] [Indexed: 01/20/2023]
Abstract
BACKGROUND AND OBJECTIVES Labeling instances by domain experts for classification is often time consuming and expensive. To reduce such labeling efforts, we had proposed the application of active learning (AL) methods, introduced our CAESAR-ALE framework for classifying the severity of clinical conditions, and shown its significant reduction of labeling efforts. The use of any of three AL methods (one well known [SVM-Margin], and two that we introduced [Exploitation and Combination_XA]) significantly reduced (by 48% to 64%) condition labeling efforts, compared to standard passive (random instance-selection) SVM learning. Furthermore, our new AL methods achieved maximal accuracy using 12% fewer labeled cases than the SVM-Margin AL method. However, because labelers have varying levels of expertise, a major issue associated with learning methods, and AL methods in particular, is how to best to use the labeling provided by a committee of labelers. First, we wanted to know, based on the labelers' learning curves, whether using AL methods (versus standard passive learning methods) has an effect on the Intra-labeler variability (within the learning curve of each labeler) and inter-labeler variability (among the learning curves of different labelers). Then, we wanted to examine the effect of learning (either passively or actively) from the labels created by the majority consensus of a group of labelers. METHODS We used our CAESAR-ALE framework for classifying the severity of clinical conditions, the three AL methods and the passive learning method, as mentioned above, to induce the classifications models. We used a dataset of 516 clinical conditions and their severity labeling, represented by features aggregated from the medical records of 1.9 million patients treated at Columbia University Medical Center. We analyzed the variance of the classification performance within (intra-labeler), and especially among (inter-labeler) the classification models that were induced by using the labels provided by seven labelers. We also compared the performance of the passive and active learning models when using the consensus label. RESULTS The AL methods: produced, for the models induced from each labeler, smoother Intra-labeler learning curves during the training phase, compared to the models produced when using the passive learning method. The mean standard deviation of the learning curves of the three AL methods over all labelers (mean: 0.0379; range: [0.0182 to 0.0496]), was significantly lower (p=0.049) than the Intra-labeler standard deviation when using the passive learning method (mean: 0.0484; range: [0.0275-0.0724). Using the AL methods resulted in a lower mean Inter-labeler AUC standard deviation among the AUC values of the labelers' different models during the training phase, compared to the variance of the induced models' AUC values when using passive learning. The Inter-labeler AUC standard deviation, using the passive learning method (0.039), was almost twice as high as the Inter-labeler standard deviation using our two new AL methods (0.02 and 0.019, respectively). The SVM-Margin AL method resulted in an Inter-labeler standard deviation (0.029) that was higher by almost 50% than that of our two AL methods The difference in the inter-labeler standard deviation between the passive learning method and the SVM-Margin learning method was significant (p=0.042). The difference between the SVM-Margin and Exploitation method was insignificant (p=0.29), as was the difference between the Combination_XA and Exploitation methods (p=0.67). Finally, using the consensus label led to a learning curve that had a higher mean intra-labeler variance, but resulted eventually in an AUC that was at least as high as the AUC achieved using the gold standard label and that was always higher than the expected mean AUC of a randomly selected labeler, regardless of the choice of learning method (including a passive learning method). Using a paired t-test, the difference between the intra-labeler AUC standard deviation when using the consensus label, versus that value when using the other two labeling strategies, was significant only when using the passive learning method (p=0.014), but not when using any of the three AL methods. CONCLUSIONS The use of AL methods, (a) reduces intra-labeler variability in the performance of the induced models during the training phase, and thus reduces the risk of halting the process at a local minimum that is significantly different in performance from the rest of the learned models; and (b) reduces Inter-labeler performance variance, and thus reduces the dependence on the use of a particular labeler. In addition, the use of a consensus label, agreed upon by a rather uneven group of labelers, might be at least as good as using the gold standard labeler, who might not be available, and certainly better than randomly selecting one of the group's individual labelers. Finally, using the AL methods: when provided by the consensus label reduced the intra-labeler AUC variance during the learning phase, compared to using passive learning.
Collapse
Affiliation(s)
- Nir Nissim
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Yuval Shahar
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Yuval Elovici
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Robert Moskovitch
- Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
12
|
Ochs C, Case JT, Perl Y. Analyzing structural changes in SNOMED CT's Bacterial infectious diseases using a visual semantic delta. J Biomed Inform 2017; 67:101-116. [PMID: 28215561 DOI: 10.1016/j.jbi.2017.02.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/08/2017] [Accepted: 02/09/2017] [Indexed: 12/23/2022]
Abstract
Thousands of changes are applied to SNOMED CT's concepts during each release cycle. These changes are the result of efforts to improve or expand the coverage of health domains in the terminology. Understanding which concepts changed, how they changed, and the overall impact of a set of changes is important for editors and end users. Each SNOMED CT release comes with delta files, which identify all of the individual additions and removals of concepts and relationships. These files typically contain tens of thousands of individual entries, overwhelming users. They also do not identify the editorial processes that were applied to individual concepts and they do not capture the overall impact of a set of changes on a subhierarchy of concepts. In this paper we introduce a methodology and accompanying software tool called a SNOMED CT Visual Semantic Delta ("semantic delta" for short) to enable a comprehensive review of changes in SNOMED CT. The semantic delta displays a graphical list of editing operations that provides semantics and context to the additions and removals in the delta files. However, there may still be thousands of editing operations applied to a set of concepts. To address this issue, a semantic delta includes a visual summary of changes that affected sets of structurally and semantically similar concepts. The software tool for creating semantic deltas offers views of various granularities, allowing a user to control how much change information they view. In this tool a user can select a set of structurally and semantically similar concepts and review the editing operations that affected their modeling. The semantic delta methodology is demonstrated on SNOMED CT's Bacterial infectious disease subhierarchy, which has undergone a significant remodeling effort over the last two years.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.
| | - James T Case
- National Library of Medicine/National Institutes of Health, Bethesda, MD 20894, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| |
Collapse
|
13
|
Nissim N, Boland MR, Tatonetti NP, Elovici Y, Hripcsak G, Shahar Y, Moskovitch R. Improving condition severity classification with an efficient active learning based framework. J Biomed Inform 2016; 61:44-54. [PMID: 27016383 DOI: 10.1016/j.jbi.2016.03.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 01/31/2016] [Accepted: 03/21/2016] [Indexed: 02/07/2023]
Abstract
Classification of condition severity can be useful for discriminating among sets of conditions or phenotypes, for example when prioritizing patient care or for other healthcare purposes. Electronic Health Records (EHRs) represent a rich source of labeled information that can be harnessed for severity classification. The labeling of EHRs is expensive and in many cases requires employing professionals with high level of expertise. In this study, we demonstrate the use of Active Learning (AL) techniques to decrease expert labeling efforts. We employ three AL methods and demonstrate their ability to reduce labeling efforts while effectively discriminating condition severity. We incorporate three AL methods into a new framework based on the original CAESAR (Classification Approach for Extracting Severity Automatically from Electronic Health Records) framework to create the Active Learning Enhancement framework (CAESAR-ALE). We applied CAESAR-ALE to a dataset containing 516 conditions of varying severity levels that were manually labeled by seven experts. Our dataset, called the "CAESAR dataset," was created from the medical records of 1.9 million patients treated at Columbia University Medical Center (CUMC). All three AL methods decreased labelers' efforts compared to the learning methods applied by the original CAESER framework in which the classifier was trained on the entire set of conditions; depending on the AL strategy used in the current study, the reduction ranged from 48% to 64% that can result in significant savings, both in time and money. As for the PPV (precision) measure, CAESAR-ALE achieved more than 13% absolute improvement in the predictive capabilities of the framework when classifying conditions as severe. These results demonstrate the potential of AL methods to decrease the labeling efforts of medical experts, while increasing accuracy given the same (or even a smaller) number of acquired conditions. We also demonstrated that the methods included in the CAESAR-ALE framework (Exploitation and Combination_XA) are more robust to the use of human labelers with different levels of professional expertise.
Collapse
Affiliation(s)
- Nir Nissim
- Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Malware Lab, Cyber Security Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| | - Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Department of Systems Biology, Columbia University, New York, NY, USA; Department of Medicine, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Yuval Elovici
- Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA
| | - Yuval Shahar
- Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Robert Moskovitch
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; Department of Systems Biology, Columbia University, New York, NY, USA; Department of Medicine, Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|
14
|
Chandar P, Yaman A, Hoxha J, He Z, Weng C. Similarity-Based Recommendation of New Concepts to a Terminology. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:386-395. [PMID: 26958170 PMCID: PMC4765685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Terminologies can suffer from poor concept coverage due to delays in addition of new concepts. This study tests a similarity-based approach to recommending concepts from a text corpus to a terminology. Our approach involves extraction of candidate concepts from a given text corpus, which are represented using a set of features. The model learns the important features to characterize a concept and recommends new concepts to a terminology. Further, we propose a cost-effective evaluation methodology to estimate the effectiveness of terminology enrichment methods. To test our methodology, we use the clinical trial eligibility criteria free-text as an example text corpus to recommend concepts for SNOMED CT. We computed precision at various rank intervals to measure the performance of the methods. Results indicate that our automated algorithm is an effective method for concept recommendation.
Collapse
Affiliation(s)
- Praveen Chandar
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| | - Anil Yaman
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| | - Julia Hoxha
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| | - Zhe He
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY USA
| |
Collapse
|
15
|
Kim HH, Lee SY, Baik SY, Kim JH. MELLO: Medical lifelog ontology for data terms from self-tracking and lifelog devices. Int J Med Inform 2015; 84:1099-110. [PMID: 26383495 DOI: 10.1016/j.ijmedinf.2015.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 08/06/2015] [Accepted: 08/11/2015] [Indexed: 11/27/2022]
Abstract
OBJECTIVE The increasing use of health self-tracking devices is making the integration of heterogeneous data and shared decision-making more challenging. Computational analysis of lifelog data has been hampered by the lack of semantic and syntactic consistency among lifelog terms and related ontologies. Medical lifelog ontology (MELLO) was developed by identifying lifelog concepts and relationships between concepts, and it provides clear definitions by following ontology development methods. MELLO aims to support the classification and semantic mapping of lifelog data from diverse health self-tracking devices. METHODS MELLO was developed using the General Formal Ontology method with a manual iterative process comprising five steps: (1) defining the scope of lifelog data, (2) identifying lifelog concepts, (3) assigning relationships among MELLO concepts, (4) developing MELLO properties (e.g., synonyms, preferred terms, and definitions) for each MELLO concept, and (5) evaluating representative layers of the ontology content. An evaluation was performed by classifying 11 devices into 3 classes by subjects, and performing pairwise comparisons of lifelog terms among 5 devices in each class as measured using the Jaccard similarity index. RESULTS MELLO represents a comprehensive knowledge base of 1998 lifelog concepts, with 4996 synonyms for 1211 (61%) concepts and 1395 definitions for 926 (46%) concepts. The MELLO Browser and MELLO Mapper provide convenient access and annotating non-standard proprietary terms with MELLO (http://mello.snubi.org/). MELLO covers 88.1% of lifelog terms from 11 health self-tracking devices and uses simple string matching to match semantically similar terms provided by various devices that are not yet integrated. The results from the comparisons of Jaccard similarities between simple string matching and MELLO matching revealed increases of 2.5, 2.2, and 5.7 folds for physical activity,body measure, and sleep classes, respectively. CONCLUSIONS MELLO is the first ontology for representing health-related lifelog data with rich contents including definitions, synonyms, and semantic relationships. MELLO fills the semantic gap between heterogeneous lifelog terms that are generated by diverse health self-tracking devices. The unified representation of lifelog terms facilitated by MELLO can help describe an individual's lifestyle and environmental factors, which can be included with user-generated data for clinical research and thereby enhance data integration and sharing.
Collapse
Affiliation(s)
- Hye Hyeon Kim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, South Korea
| | - Soo Youn Lee
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, South Korea
| | - Su Youn Baik
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, South Korea
| | - Ju Han Kim
- Seoul National University Biomedical Informatics (SNUBI), Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, South Korea; Systems Biomedical Informatics National Core Research Center (SBI-NCRC), Seoul National University College of Medicine, Seoul 110799, South Korea.
| |
Collapse
|
16
|
Wei D, Helen Gu H, Perl Y, Halper M, Ochs C, Elhanan G, Chen Y. Structural measures to track the evolution of SNOMED CT hierarchies. J Biomed Inform 2015; 57:278-87. [PMID: 26260003 DOI: 10.1016/j.jbi.2015.08.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 08/01/2015] [Accepted: 08/01/2015] [Indexed: 11/28/2022]
Abstract
The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is an extensive reference terminology with an attendant amount of complexity. It has been updated continuously and revisions have been released semi-annually to meet users' needs and to reflect the results of quality assurance (QA) activities. Two measures based on structural features are proposed to track the effects of both natural terminology growth and QA activities based on aspects of the complexity of SNOMED CT. These two measures, called the structural density measure and accumulated structural measure, are derived based on two abstraction networks, the area taxonomy and the partial-area taxonomy. The measures derive from attribute relationship distributions and various concept groupings that are associated with the abstraction networks. They are used to track the trends in the complexity of structures as SNOMED CT changes over time. The measures were calculated for consecutive releases of five SNOMED CT hierarchies, including the Specimen hierarchy. The structural density measure shows that natural growth tends to move a hierarchy's structure toward a more complex state, whereas the accumulated structural measure shows that QA processes tend to move a hierarchy's structure toward a less complex state. It is also observed that both the structural density and accumulated structural measures are useful tools to track the evolution of an entire SNOMED CT hierarchy and reveal internal concept migration within it.
Collapse
Affiliation(s)
- Duo Wei
- Computer Science and Information Systems-BUSN, Stockton University, Galloway, NJ 08205, United States.
| | - Huanying Helen Gu
- Computer Science Dept., New York Institute of Technology, New York, NY 10023, United States
| | - Yehoshua Perl
- Computer Science Dept., New Jersey Institute of Technology, Newark, NJ 07102, United States
| | - Michael Halper
- Information Technology Dept., New Jersey Institute of Technology, Newark, NJ 07102, United States
| | - Christopher Ochs
- Computer Science Dept., New Jersey Institute of Technology, Newark, NJ 07102, United States
| | - Gai Elhanan
- Computer Science Dept., New Jersey Institute of Technology, Newark, NJ 07102, United States; Halfpenny Technologies Inc., Blue Bell, PA 19422, United States
| | - Yan Chen
- Computer Information Systems Dept., BMCC, CUNY, New York, NY 10007, United States
| |
Collapse
|
17
|
He Z, Geller J, Chen Y. A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization. Artif Intell Med 2015; 64:29-40. [PMID: 25890688 DOI: 10.1016/j.artmed.2015.03.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Revised: 03/20/2015] [Accepted: 03/25/2015] [Indexed: 11/17/2022]
Abstract
OBJECTIVES Medical terminologies vary in the amount of concept information (the "density") represented, even in the same sub-domains. This causes problems in terminology mapping, semantic harmonization and terminology integration. Moreover, complex clinical scenarios need to be encoded by a medical terminology with comprehensive content. SNOMED Clinical Terms (SNOMED CT), a leading clinical terminology, was reported to lack concepts and synonyms, problems that cannot be fully alleviated by using post-coordination. Therefore, a scalable solution is needed to enrich the conceptual content of SNOMED CT. We are developing a structure-based, algorithmic method to identify potential concepts for enriching the conceptual content of SNOMED CT and to support semantic harmonization of SNOMED CT with selected other Unified Medical Language System (UMLS) terminologies. METHODS We first identified a subset of English terminologies in the UMLS that have 'PAR' relationship labeled with 'IS_A' and over 10% overlap with one or more of the 19 hierarchies of SNOMED CT. We call these "reference terminologies" and we note that our use of this name is different from the standard use. Next, we defined a set of topological patterns across pairs of terminologies, with SNOMED CT being one terminology in each pair and the other being one of the reference terminologies. We then explored how often these topological patterns appear between SNOMED CT and each reference terminology, and how to interpret them. RESULTS Four viable reference terminologies were identified. Large density differences between terminologies were found. Expected interpretations of these differences were indeed observed, as follows. A random sample of 299 instances of special topological patterns ("2:3 and 3:2 trapezoids") showed that 39.1% and 59.5% of analyzed concepts in SNOMED CT and in a reference terminology, respectively, were deemed to be alternative classifications of the same conceptual content. In 30.5% and 17.6% of the cases, it was found that intermediate concepts could be imported into SNOMED CT or into the reference terminology, respectively, to enhance their conceptual content, if approved by a human curator. Other cases included synonymy and errors in one of the terminologies. CONCLUSION These results show that structure-based algorithmic methods can be used to identify potential concepts to enrich SNOMED CT and the four reference terminologies. The comparative analysis has the future potential of supporting terminology authoring by suggesting new content to improve content coverage and semantic harmonization between terminologies.
Collapse
Affiliation(s)
- Zhe He
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Yan Chen
- Department of Computer Information Systems, Borough of Manhattan Community College, City University New York, New York, NY 10007, USA
| |
Collapse
|
18
|
An Active Learning Framework for Efficient Condition Severity Classification. Artif Intell Med 2015. [DOI: 10.1007/978-3-319-19551-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
19
|
Ochs C, Geller J, Perl Y, Chen Y, Xu J, Min H, Case JT, Wei Z. Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies. J Am Med Inform Assoc 2014; 22:507-18. [PMID: 25336594 DOI: 10.1136/amiajnl-2014-003151] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2014] [Accepted: 09/27/2014] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE Standards terminologies may be large and complex, making their quality assurance challenging. Some terminology quality assurance (TQA) methodologies are based on abstraction networks (AbNs), compact terminology summaries. We have tested AbNs and the performance of related TQA methodologies on small terminology hierarchies. However, some standards terminologies, for example, SNOMED, are composed of very large hierarchies. Scaling AbN TQA techniques to such hierarchies poses a significant challenge. We present a scalable subject-based approach for AbN TQA. METHODS An innovative technique is presented for scaling TQA by creating a new kind of subject-based AbN called a subtaxonomy for large hierarchies. New hypotheses about concentrations of erroneous concepts within the AbN are introduced to guide scalable TQA. RESULTS We test the TQA methodology for a subject-based subtaxonomy for the Bleeding subhierarchy in SNOMED's large Clinical finding hierarchy. To test the error concentration hypotheses, three domain experts reviewed a sample of 300 concepts. A consensus-based evaluation identified 87 erroneous concepts. The subtaxonomy-based TQA methodology was shown to uncover statistically significantly more erroneous concepts when compared to a control sample. DISCUSSION The scalability of TQA methodologies is a challenge for large standards systems like SNOMED. We demonstrated innovative subject-based TQA techniques by identifying groups of concepts with a higher likelihood of having errors within the subtaxonomy. Scalability is achieved by reviewing a large hierarchy by subject. CONCLUSIONS An innovative methodology for scaling the derivation of AbNs and a TQA methodology was shown to perform successfully for the largest hierarchy of SNOMED.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - James Geller
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Yan Chen
- Computer Information Systems Department, BMCC, CUNY, New York, New York, USA
| | - Junchuan Xu
- Division of Knowledge Informatics, NYU, New York, New York, USA
| | - Hua Min
- Department of Health Administration and Policy, George Mason University, Fairfax, Virginia, USA
| | | | - Zhi Wei
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| |
Collapse
|
20
|
Ochs C, Geller J, Perl Y, Chen Y, Agrawal A, Case JT, Hripcsak G. A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships. J Am Med Inform Assoc 2014; 22:628-39. [PMID: 25332354 DOI: 10.1136/amiajnl-2014-003173] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 09/20/2014] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE Large and complex terminologies, such as Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT), are prone to errors and inconsistencies. Abstraction networks are compact summarizations of the content and structure of a terminology. Abstraction networks have been shown to support terminology quality assurance. In this paper, we introduce an abstraction network derivation methodology which can be applied to SNOMED CT target hierarchies whose classes are defined using only hierarchical relationships (ie, without attribute relationships) and similar description-logic-based terminologies. METHODS We introduce the tribal abstraction network (TAN), based on the notion of a tribe-a subhierarchy rooted at a child of a hierarchy root, assuming only the existence of concepts with multiple parents. The TAN summarizes a hierarchy that does not have attribute relationships using sets of concepts, called tribal units that belong to exactly the same multiple tribes. Tribal units are further divided into refined tribal units which contain closely related concepts. A quality assurance methodology that utilizes TAN summarizations is introduced. RESULTS A TAN is derived for the Observable entity hierarchy of SNOMED CT, summarizing its content. A TAN-based quality assurance review of the concepts of the hierarchy is performed, and erroneous concepts are shown to appear more frequently in large refined tribal units than in small refined tribal units. Furthermore, more erroneous concepts appear in large refined tribal units of more tribes than of fewer tribes. CONCLUSIONS In this paper we introduce the TAN for summarizing SNOMED CT target hierarchies. A TAN was derived for the Observable entity hierarchy of SNOMED CT. A quality assurance methodology utilizing the TAN was introduced and demonstrated.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - James Geller
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, Newark, New Jersey, USA
| | - Yan Chen
- Computer Information Systems Department, BMCC, CUNY, New York, New York, USA
| | - Ankur Agrawal
- Department of Computer Science, Manhattan College, Riverdale, New York, USA
| | | | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New Jersey, USA
| |
Collapse
|
21
|
|
22
|
Kim TY, Hardiker N, Coenen A. Inter-terminology mapping of nursing problems. J Biomed Inform 2014; 49:213-20. [PMID: 24632297 DOI: 10.1016/j.jbi.2014.03.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Revised: 02/28/2014] [Accepted: 03/01/2014] [Indexed: 11/27/2022]
Abstract
PURPOSE The purpose of this study was to determine the degree of overlap between the International Classification for Nursing Practice (ICNP®) and the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT), with a specific focus on nursing problems, as a first step towards harmonization of content between the two terminologies. METHODS Work within this study was divided across two ICNP subsets. The first subset (n=238) was made up of ICNP diagnosis/outcome concepts that had been included in previous experimental mapping activities with Clinical Care Classification (CCC) and NANDA-International (NANDA-I). These ICNP concepts and their equivalent concepts within CCC and NANDA-I were used within the Unified Medical Language System (UMLS) framework to derive automatically candidate mappings to SNOMED-CT for validation by two reviewers. The second subset (n=565) included all other ICNP diagnosis/outcome concepts plus those concepts from the first subset where the candidate mappings were rejected. Mappings from the second subset to SNOMED-CT were manually identified independently by the same two reviewers. Differences between the reviewers were resolved through discussion. The observed agreement between the two reviewers was calculated along with the inter-rater reliability using Cohen's Kappa (κ). RESULTS For the first semi-automated mapping, according to the two reviewers the great majority of ICNP concepts (91.6%) correctly mapped to SNOMED-CT in UMLS. There was a good level of agreement between the reviewers in this part of the exercise (κ=0.7). For the second manual mapping, nearly two-thirds of ICNP concepts (61.4%) could not be mapped to any SNOMED-CT concept. There was only a moderate level of agreement between the reviewers (κ=0.45). While most of the mappings were one-to-one mappings, there were ambiguities in both terminologies which led to difficulties. The absence of mappings was due to a large extent to differences in content coverage, although lexical variations and semantic differences also played a part. CONCLUSIONS This study demonstrated a degree of overlap between ICNP and SNOMED-CT; it also identified significant differences in content coverage. The results from the semi-automated mapping were encouraging, particularly for 'older' ICNP content. The results from the manual mapping were less favorable suggesting a need for further enhancement of both terminologies, content development within SNOMED-CT and further research on mechanisms for harmonization.
Collapse
Affiliation(s)
- Tae Youn Kim
- Betty Irene Moore School Nursing, University of California Davis, 4610 X Street, Sacramento, CA 95817, USA.
| | - Nicholas Hardiker
- School of Nursing, Midwifery & Social Work, University of Salford, Mary Seacole Building, Greater Manchester M5 4WT, UK
| | - Amy Coenen
- College of Nursing, University of Wisconsin-Milwaukee, 1921 E. Hartford Avenue, P.O. Box 413, Milwaukee, WI 53201, USA
| |
Collapse
|
23
|
Agrawal A, Perl Y, Chen Y, Elhanan G, Liu M. Identifying inconsistencies in SNOMED CT problem lists using structural indicators. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:17-26. [PMID: 24551319 PMCID: PMC3900119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The National Library of Medicine has published the CORE and the VA/KP problem lists to facilitate the usage of SNOMED CT for encoding diagnoses and clinical data of patients in electronic health records. Therefore, it is essential for the content of the problem lists to be as accurate and consistent as possible. This study assesses the effectiveness of using a concept's word length and number of parents, two structural indicators for measuring concept complexity, to identify inconsistencies with high probability. The method is able to isolate concepts with over 40% expected of being erroneous. A structural indicator for concepts which is able to identify 52% of the examined concepts as having errors in synonyms is also presented. The results demonstrate that the concepts in problem lists are not free of inconsistencies and further quality assurance is needed to improve the quality of these concepts.
Collapse
Affiliation(s)
| | | | - Yan Chen
- Borough of Manhattan Community College, New York, NY
| | | | - Mei Liu
- New Jersey Institute of Technology, Newark, NJ
| |
Collapse
|
24
|
Ochs C, Perl Y, Geller J, Halper M, Gu H, Chen Y, Elhanan G. Scalability of abstraction-network-based quality assurance to large SNOMED hierarchies. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:1071-1080. [PMID: 24551393 PMCID: PMC3900129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Abstraction networks are compact summarizations of terminologies used to support orientation and terminology quality assurance (TQA). Area taxonomies and partial-area taxonomies are abstraction networks that have been successfully employed in support of TQA of small SNOMED CT hierarchies. However, nearly half of SNOMED CT's concepts are in the large Procedure and Clinical Finding hierarchies. Abstraction network derivation methodologies applied to those hierarchies resulted in taxonomies that were too large to effectively support TQA. A methodology for deriving sub-taxonomies from large taxonomies is presented, and the resultant smaller abstraction networks are shown to facilitate TQA, allowing for the scaling of our taxonomy-based TQA regimen to large hierarchies. Specifically, sub-taxonomies are derived for the Procedure hierarchy and a review for errors and inconsistencies is performed. Concepts are divided into groups within the sub-taxonomy framework, and it is shown that small groups are statistically more likely to harbor erroneous and inconsistent concepts than large groups.
Collapse
Affiliation(s)
| | | | | | | | - Huanying Gu
- New York Institute of Technology, New York, NY
| | | | - Gai Elhanan
- New Jersey Institute of Technology, Newark, NJ
| |
Collapse
|
25
|
Agrawal A, He Z, Perl Y, Wei D, Halper M, Elhanan G, Chen Y. The readiness of SNOMED problem list concepts for meaningful use of electronic health records. Artif Intell Med 2013; 58:73-80. [PMID: 23602702 DOI: 10.1016/j.artmed.2013.03.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Revised: 03/05/2013] [Accepted: 03/17/2013] [Indexed: 11/24/2022]
Abstract
OBJECTIVE By 2015, SNOMED CT (SCT) will become the USA's standard for encoding diagnoses and problem lists in electronic health records (EHRs). To facilitate this effort, the National Library of Medicine has published the "SCT Clinical Observations Recording and Encoding" and the "Veterans Health Administration and Kaiser Permanente" problem lists (collectively, the "PL"). The PL is studied in regard to its readiness to support meaningful use of EHRs. In particular, we wish to determine if inconsistencies appearing in SCT, in general, occur as frequently in the PL, and whether further quality-assurance (QA) efforts on the PL are required. METHODS AND MATERIALS A study is conducted where two random samples of SCT concepts are compared. The first consists of concepts strictly from the PL and the second contains general SCT concepts distributed proportionally to the PL's in terms of their hierarchies. Each sample is analyzed for its percentage of primitive concepts and for frequency of modeling errors of various severity levels as quality measures. A simple structural indicator, namely, the number of parents, is suggested to locate high likelihood inconsistencies in hierarchical relationships. The effectiveness of this indicator is evaluated. RESULTS PL concepts are found to be slightly better than other concepts in the respective SCT hierarchies with regards to the quality measure of the percentage of primitive concepts and the frequency of modeling errors. There were 58% primitive concepts in the PL sample versus 62% in the control sample. The structural indicator of number of parents is shown to be statistically significant in its ability to identify concepts having a higher likelihood of inconsistencies in their hierarchical relationships. The absolute number of errors in the group of concepts having 1-3 parents was shown to be significantly lower than that for concepts with 4-6 parents and those with 7 or more parents based on Chi-squared analyses. CONCLUSION PL concepts suffer from the same issues as general SCT concepts, although to a slightly lesser extent, and do require further QA efforts to promote meaningful use of EHRs. To support such efforts, a structural indicator is shown to effectively ferret out potentially problematic concepts where those QA efforts should be focused.
Collapse
Affiliation(s)
- Ankur Agrawal
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA.
| | | | | | | | | | | | | |
Collapse
|
26
|
Geller J, Ochs C, Perl Y, Xu J. New abstraction networks and a new visualization tool in support of auditing the SNOMED CT content. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:237-246. [PMID: 23304293 PMCID: PMC3540556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Medical terminologies are large and complex. Frequently, errors are hidden in this complexity. Our objective is to find such errors, which can be aided by deriving abstraction networks from a large terminology. Abstraction networks preserve important features but eliminate many minor details, which are often not useful for identifying errors. Providing visualizations for such abstraction networks aids auditors by allowing them to quickly focus on elements of interest within a terminology. Previously we introduced area taxonomies and partial area taxonomies for SNOMED CT. In this paper, two advanced, novel kinds of abstraction networks, the relationship-constrained partial area subtaxonomy and the root-constrained partial area subtaxonomy are defined and their benefits are demonstrated. We also describe BLUSNO, an innovative software tool for quickly generating and visualizing these SNOMED CT abstraction networks. BLUSNO is a dynamic, interactive system that provides quick access to well organized information about SNOMED CT.
Collapse
Affiliation(s)
- James Geller
- New Jersey Institute of Technology, Newark, NJ, USA
| | | | | | | |
Collapse
|
27
|
Lee D, Cornet R, Lau F, de Keizer N. A survey of SNOMED CT implementations. J Biomed Inform 2012; 46:87-96. [PMID: 23041717 PMCID: PMC7185627 DOI: 10.1016/j.jbi.2012.09.006] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Revised: 09/14/2012] [Accepted: 09/15/2012] [Indexed: 11/21/2022]
Abstract
The Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) has been designated as the recommended clinical reference terminology for use in clinical information systems around the world and is reported to be used in over 50 countries. However, there are still few implementation details. This study examined the implementation of SNOMED CT in terms of design, use and maintenance issues involved in 13 healthcare organisations across eight countries through a series of interviews with 14 individuals. While a great deal of effort has been spent on developing and refining SNOMED CT, there is still much work ahead to bring SNOMED CT into routine clinical use.
Collapse
Affiliation(s)
- Dennis Lee
- School of Health Information Science, University of Victoria, Victoria, BC, Canada.
| | | | | | | |
Collapse
|
28
|
Abstract
Clinical research informatics is the rapidly evolving sub-discipline within biomedical informatics that focuses on developing new informatics theories, tools, and solutions to accelerate the full translational continuum: basic research to clinical trials (T1), clinical trials to academic health center practice (T2), diffusion and implementation to community practice (T3), and ‘real world’ outcomes (T4). We present a conceptual model based on an informatics-enabled clinical research workflow, integration across heterogeneous data sources, and core informatics tools and platforms. We use this conceptual model to highlight 18 new articles in the JAMIA special issue on clinical research informatics.
Collapse
Affiliation(s)
- Michael G Kahn
- Department of Pediatrics, University of Colorado, Aurora, Colorado 80045, USA.
| | | |
Collapse
|
29
|
He Z, Halper M, Perl Y, Elhanan G. Clinical Clarity versus Terminological Order - The Readiness of SNOMED CT Concept Descriptors for Primary Care. MIX-HS'12 : PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MANAGING INTEROPERABILITY AND COMPLEXITY IN HEALTH SYSTEMS OCTOBER 29, 2012, MAUI, HAWAII, USA. INTERNATIONAL WORKSHOP ON MANAGING INTEROPERABILITY AND COMPLEXITY IN HEALTH SY... 2012; 2012:1-6. [PMID: 26870837 DOI: 10.1145/2389672.2389674] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
As SNOMED usage becomes more ingrained within applications, its range of concept descriptors, and particularly its synonym adequacy, becomes more important. A simulated clinical scenario involving various term-based concept searches is used to assess whether SNOMED's concept descriptors provide sufficient differentiation to enable possible concept selection between similar terms. Four random samples from different SNOMED concept populations are utilized. Of particular interest are concepts mapped duplicately into UMLS concepts due to shared term patterns. While overall synonym problems are rare (1%), some concept populations exhibited a high rate of potential problems for clinical use (17-62%). The vast majority of issues are due to SNOMED's inherent structure and fine granularity. Many findings hint at a lack of clear delineation between reference and interface terminological qualities. Closer attention should be given to practical clinical use-case scenarios. Reducing SNOMED's structural complexity may alleviate many of the described findings and encourage clinical adoption.
Collapse
Affiliation(s)
- Zhe He
- Computer Science Dept., NJIT Newark, NJ 07102 1-973-596-2867
| | - Michael Halper
- Information Technology Department, NJIT Newark, NJ 07102 1-973-596-5752
| | - Yehoshua Perl
- Computer Science Dept., NJIT Newark, NJ 07102 1-973-596-2867
| | - Gai Elhanan
- Halfpenny Technologies, Inc. Blue Bell, PA 19422 1-347-443-9741
| |
Collapse
|