1
|
Natural Language Processing to extract SNOMED-CT codes from pathological reports. Pathologica 2023; 115:318-324. [PMID: 38180139 PMCID: PMC10767798 DOI: 10.32074/1591-951x-952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024] Open
Abstract
Objective The use of standardized structured reports (SSR) and suitable terminologies like SNOMED-CT can enhance data retrieval and analysis, fostering large-scale studies and collaboration. However, the still large prevalence of narrative reports in our laboratories warrants alternative and automated labeling approaches. In this project, natural language processing (NLP) methods were used to associate SNOMED-CT codes to structured and unstructured reports from an Italian Digital Pathology Department. Methods Two NLP-based automatic coding systems (support vector machine, SVM, and long-short term memory, LSTM) were trained and applied to a series of narrative reports. Results The 1163 cases were tested with both algorithms, showing good performances in terms of accuracy, precision, recall, and F1 score, with SVM showing slightly better performances as compared to LSTM (0.84, 0.87, 0.83, 0.82 vs 0.83, 0.85, 0.83, 0.82, respectively). The integration of an explainability allowed identification of terms and groups of words of importance, enabling fine-tuning, balancing semantic meaning and model performance. Conclusions AI tools allow the automatic SNOMED-CT labeling of the pathology archives, providing a retrospective fix to the large lack of organization of narrative reports.
Collapse
|
2
|
The suitability of UMLS and SNOMED-CT for encoding outcome concepts. J Am Med Inform Assoc 2023; 30:1895-1903. [PMID: 37615994 PMCID: PMC10654851 DOI: 10.1093/jamia/ocad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/14/2023] [Accepted: 08/02/2023] [Indexed: 08/25/2023] Open
Abstract
OBJECTIVE Outcomes are important clinical study information. Despite progress in automated extraction of PICO (Population, Intervention, Comparison, and Outcome) entities from PubMed, rarely are these entities encoded by standard terminology to achieve semantic interoperability. This study aims to evaluate the suitability of the Unified Medical Language System (UMLS) and SNOMED-CT in encoding outcome concepts in randomized controlled trial (RCT) abstracts. MATERIALS AND METHODS We iteratively developed and validated an outcome annotation guideline and manually annotated clinically significant outcome entities in the Results and Conclusions sections of 500 randomly selected RCT abstracts on PubMed. The extracted outcomes were fully, partially, or not mapped to the UMLS via MetaMap based on established heuristics. Manual UMLS browser search was performed for select unmapped outcome entities to further differentiate between UMLS and MetaMap errors. RESULTS Only 44% of 2617 outcome concepts were fully covered in the UMLS, among which 67% were complex concepts that required the combination of 2 or more UMLS concepts to represent them. SNOMED-CT was present as a source in 61% of the fully mapped outcomes. DISCUSSION Domains such as Metabolism and Nutrition, and Infections and Infectious Diseases need expanded outcome concept coverage in the UMLS and MetaMap. Future work is warranted to similarly assess the terminology coverage for P, I, C entities. CONCLUSION Computational representation of clinical outcomes is important for clinical evidence extraction and appraisal and yet faces challenges from the inherent complexity and lack of coverage of these concepts in UMLS and SNOMED-CT, as demonstrated in this study.
Collapse
|
3
|
Mapping Exposome Derived Phenotypes into SNOMED Codes. Stud Health Technol Inform 2023; 302:1073-1074. [PMID: 37203585 DOI: 10.3233/shti230351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Human phenotypes define the healthy or diseased status of an individual and they arise from the complex interactions between environmental and genetic factors. The whole set of human exposures constitute the human exposome. These exposures have multiple sources including physical and socioeconomic factors. In this manuscript we have used text mining techniques to retrieve 1295 and 1903 Human Phenotype Ontology terms associated with these exposome factors and we have subsequently mapped 83% and 90% of the HPO terms respectively) into SNOMED as a clinically actionable code. We have developed a proof-of-concept approach to facilitate the integration of exposomic and clinical data.
Collapse
|
4
|
Targeting stopwords for quality assurance of SNOMED-CT. Int J Med Inform 2022; 167:104870. [PMID: 36148752 DOI: 10.1016/j.ijmedinf.2022.104870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/08/2022] [Accepted: 09/12/2022] [Indexed: 11/23/2022]
Abstract
OBJECTIVE We assess the potential of exploiting stopwords in biomedical concept names to complete the logical definitions of concepts that are not sufficiently defined. METHODS Concepts containing stopwords are selected from the Disorder hierarchy of Systematized NOmenclature of MEDicine (SNOMED-CT). SNOMED-CT consists of two types of concepts: Fully Defined (FD) concepts which are sufficiently defined and Partially Defined (PD) concepts which are not sufficiently defined. In this work, FD concepts containing stopwords are treated as a source of ground truth to complete the definitions of, lexically and semantically similar, PD concepts. FD and PD concepts are lexically and semantically analysed to create sample-sets. Mandatory attribute-relationships are calculated by using an intersection-set logic for each FD sample-set. PD sample-sets are audited against this mandatory attribute-relationship template to identify inconsistencies in modelling styles and potentially missing attribute-relationships. RESULTS Lexical and semantic patterns around 11 stopwords were analysed. 26 sample-sets were extracted for the 11 stopwords. Mandatory attribute-relationships were identified for 24 of the 26 sample-sets. The method identified 62.5% - 72.22% of the PD concepts, containing the stopwords in and due to, to be inconsistent in their modelling style and potentially missing at least one attribute-relationship according to the created template.
Collapse
|
5
|
Evaluation and Challenges of Medical Procedure Data Harmonization to SNOMED-CT for Observational Research. Stud Health Technol Inform 2022; 294:405-406. [PMID: 35612106 DOI: 10.3233/shti220484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The relevance of health data research on real world data (RWD) is increasing. To prepare national RWD for international research, harmonization with standard terminologies is required. In this paper, we evaluate to what extent the German OPS vocabulary in OHDSI covers codes present in RWD and mappings to SNOMED-CT. The evaluation identified a mapping gap of 21.1% in the RWD set.
Collapse
|
6
|
Aggregations of Substance in Virtual Drug Models Based on ISO/CEN Standards for Identification of Medicinal Products (IDMP). Stud Health Technol Inform 2022; 294:377-381. [PMID: 35612100 DOI: 10.3233/shti220478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this study representation of chemical substances in IDMP is reviewed, with an exploration of aggregation levels for substance used in the virtual drug data models of RxNorm, SNOMED-CT, ATC/INN, and the Belgian SAM database, for products with a single substance and combinations of substances. Active moiety and available solid states forms are explored for diclofenac, amoxicillin, carbamazepine, amlodipine, with regard to their representation in coding systems such as WHODrug, SMS, UNII, CAS, and SNOMED-CT. By counting the number of medicinal products in Belgium for amlodipine in each level of aggregation, concepts for grouper of substances and two levels of grouper of medicinal products are illustrated. Recommendations are made for the further development of IDMP and its link to international drug classifications.
Collapse
|
7
|
HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
|
8
|
Combining word embeddings to extract chemical and drug entities in biomedical literature. BMC Bioinformatics 2021; 22:599. [PMID: 34920708 PMCID: PMC8684055 DOI: 10.1186/s12859-021-04188-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 05/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. METHODS In this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge. RESULTS For the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code. CONCLUSION On the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position.
Collapse
|
9
|
Preparing Laboratories for Interconnected Health Care. Diagnostics (Basel) 2021; 11:diagnostics11081487. [PMID: 34441421 PMCID: PMC8391810 DOI: 10.3390/diagnostics11081487] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/26/2021] [Accepted: 07/28/2021] [Indexed: 12/01/2022] Open
Abstract
In an increasingly interconnected health care system, laboratory medicine can facilitate diagnosis and treatment of patients effectively. This article describes necessary changes and points to potential challenges on a technical, content, and organizational level. As a technical precondition, electronic laboratory reports have to become machine-readable and interpretable. Terminologies such as Logical Observation Identifiers Names and Codes (LOINC), Nomenclature for Properties and Units (NPU), Unified Code for Units of Measure (UCUM), and SNOMED-CT can lead to the necessary semantic interoperability. Even if only single “atomized” results of the whole report are extracted, the necessary information for correct interpretation must be available. Therefore, interpretive comments, e.g., concerns about an increased measurement uncertainty must be electronically attached to every affected measurement result. Standardization of laboratory analyses with traceable standards and reference materials will enable knowledge transfer and safe interpretation of laboratory analyses from multiple laboratories. In an interconnected health care system, laboratories should strive to transform themselves into a data hub that not only receives samples but also extensive information about the patient. On that basis, they can return measurement results enriched with high-quality interpretive comments tailored to the individual patient and unlock the full potential of laboratory medicine.
Collapse
|
10
|
The Inadequacy of Coding Nomenclature to Represent the Timeline of a Disease (Like Diabetes). J Diabetes Sci Technol 2020; 14:978-979. [PMID: 32522033 PMCID: PMC7753851 DOI: 10.1177/1932296820929674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Building an I2B2-Based Population Repository for Clinical Research. Stud Health Technol Inform 2020; 270:78-82. [PMID: 32570350 DOI: 10.3233/shti200126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The present work provides a real-world case of the connection process of a hospital, 12 de Octubre University Hospital in Spain, to the TriNetX research network, transforming a compilation of disparate sources into a single harmonized repository which is automatically refreshed every day. It describes the different integration phases: terminology core datasets, specialized sources and eventually automatic refreshment. It also explains the work performed on semantic normalization of the involved clinical terminologies; as well as the resulting benefits the InSite platform services have enabled in the form of research opportunities for the hospital.
Collapse
|
12
|
Data Integration into OMOP CDM for Heterogeneous Clinical Data Collections via HL7 FHIR Bundles and XSLT. Stud Health Technol Inform 2020; 270:138-142. [PMID: 32570362 DOI: 10.3233/shti200138] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Data integration is an important task in medical informatics and highly impacts the gain out of existing health information data. These tasks are using implemented as extract transform and load processes. By introducing HL7 FHIR as an intermediate format, our aim was to integrate heterogeneous data from a German pulmonary hypertension registry into an OMOP Common Data Model. First, domain knowledge experts defined a common parameter set, which was subsequently mapped to standardized terminologies like LOINC or SNOMED-CT. Data was extracted as HL7 FHIR Bundle to be transformed to OMOP CDM by using XSLT. We successfully transformed the majority of data elements to the OMOP CDM in a feasible time.
Collapse
|
13
|
Defining a Standardized Information Model for Multi-Source Representation of Breast Cancer Data. Stud Health Technol Inform 2020; 270:1243-1244. [PMID: 32570600 DOI: 10.3233/shti200383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This work aims to define a standardized information model for representation of multiple data sources in breast cancer. A set of data elements has been identified using ICHOM Breast Cancer as the minimum data set and adapting it to the needs of Hospital Universitario 12 de Octubre. With this, an information model has been defined according to ISO 13606 and SNOMED CT standards.
Collapse
|
14
|
Abstract
The objective of this study was to determine how well a subset of SNODENT, specifically designed for general dentistry, meets the needs of dental practitioners. Participants were asked to locate their written diagnosis for tooth conditions among the SNODENT terminology uploaded into an electronic dental record. Investigators found that 65% of providers’ original written diagnoses were in “agreement” with their selected SNODENT dental diagnostic subset concept(s).
Collapse
|
15
|
Role of Nursing Informatics in Implementation of SNOMED-CT in India. Stud Health Technol Inform 2019; 264:1718-1719. [PMID: 31438309 DOI: 10.3233/shti190613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
SNOMED-CT project under the Ministry of Health and Family Welfare is operational at AIIMS since July 2016. A team of nurses were recruited under SNOMED project who actively works for integrating existing EHR with SNOMED-CT, monitoring, training of users auditing the data, resets creation and development of National Drug Database. This paper emphasizes role of Nursing Informatics in implementation of SNOMED-CT project in India as well as in any other country.
Collapse
|
16
|
Abstract
BACKGROUND A Cardiac-centered Frailty Ontology can be an important foundation for using NLP to assess patient frailty. Frailty is an important consideration when making patient treatment decisions, particularly in older adults, those with a cardiac diagnosis, or when major surgery is a consideration. Clinicians often report patient's frailty in progress notes and other documentation. Frailty is recorded in many different ways in patient records and many different validated frailty-measuring instruments are available, with little consistency across instruments. We specifically explored concepts relevant to decisions regarding cardiac interventions. We based our work on text found in a large corpus of clinical notes from the Department of Veterans Affairs (VA) national Electronic Health Record (EHR) database. RESULTS The full ontology has 156 concepts, with 246 terms. It includes 86 concepts we expect to find in clinical documents, with 12 qualifier values. The remaining 58 concepts represent hierarchical groups (e.g., physical function findings). Our top-level class is clinical finding, which has children clinical history finding, instrument finding, and physical examination finding, reflecting the OGMS definition of clinical finding. Instrument finding is any score found for the existing frailty instruments. Within our ontology, we used SNOMED-CT concepts where possible. Some of the 86 concepts we expect to find in clinical documents are associated with the properties like ability interpretation. The concept ability to walk can either be able, assisted or unable. Each concept-property level pairing gets a different frailty score. Each scored concept received three scores: a frailty score, a relevance to cardiac decisions score, and a likelihood of resolving after the recommended intervention score. The ontology includes the relationship between scores from ten frailty instruments and frailty as assessed using ontology concepts. It also included rules for mapping ontology elements to instrument items for three common frailty assessment instruments. Ontology elements are used in two clinical NLP systems. CONCLUSIONS We developed and validated a Cardiac-centered Frailty Ontology, which is a machine-interoperable description of frailty that reflects all the areas that clinicians consider when deciding which cardiac intervention will best serve the patient as well as frailty indications generally relevant to medical decisions. The ontology owl file is available on Bioportal at http://bioportal.bioontology.org/ontologies/CCFO .
Collapse
|
17
|
Disease mentions in airport and hospital geolocations expose dominance of news events for disease concerns. J Biomed Semantics 2018; 9:18. [PMID: 29895320 PMCID: PMC5996486 DOI: 10.1186/s13326-018-0186-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 05/25/2018] [Indexed: 11/26/2022] Open
Abstract
Background In recent years, Twitter has been applied to monitor diseases through its facility to monitor users’ comments and concerns in real-time. The analysis of tweets for disease mentions should reflect not only user specific concerns but also disease outbreaks. This requires the use of standard terminological resources and can be focused on selected geographic locations. In our study, we differentiate between hospital and airport locations to better distinguish disease outbreaks from background mentions of disease concerns. Results Our analysis covers all geolocated tweets over a 6 months time period, uses SNOMED-CT as a standard medical terminology, and explores language patterns (as well as MetaMap) to identify mentions of diseases in reference to the geolocation of tweets. Contrary to our expectation, hospital and airport geolocations are not suitable to collect significant portions of tweets concerned with disease outcomes. Overall, geolocated tweets exposed a large number of messages commenting on disease-related news articles. Furthermore, the geolocated messages exposed an over-representation of non-communicable diseases in contrast to infectious diseases. Conclusions Our findings suggest that disease mentions on Twitter not only serve the purpose to share personal statements but also to share concerns about news articles. In particular, our assumption about the relevance of hospital and airport geolocations for an increased frequency of diseases mentions has not been met. To further address the linguistic cues, we propose the study of health forums to understand how a change in medium affects the language applied by the users. Finally, our research on the language use may provide essential clues to distinguish complementary trends in the use of language in Twitter when analysing health-related topics.
Collapse
|
18
|
Abstract
A number of strategies have been published to accelerate the use of electronic health records in caring for patients across the UK. These visions of 'eHealth' have a common requirement for robust interoperability between different systems with the use of appropriate information and data standards. SNOMED CT, a comprehensive terminology that NHS England intends to adopt across all care settings by 2020, is a key component of these standards but there is currently limited experience in its use in live clinical settings. Within NHS Wales, an electronic patient record system has been developed since 2009 with a focus on a core generic clinical information model built using SNOMED CT. Our experience is that SNOMED CT is a usable and clinician-friendly terminology but that its size and scope must be considered during implementation.
Collapse
|
19
|
|
20
|
Frequency analysis of medical concepts in clinical trials and their coverage in MeSH and SNOMED-CT. Methods Inf Med 2014; 54:83-92. [PMID: 25346408 DOI: 10.3414/me14-01-0046] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 10/05/2014] [Indexed: 11/09/2022]
Abstract
BACKGROUND Eligibility criteria (EC) of clinical trials play a key role in selecting appropriate study candidates and the validity of the outcome of a clinical trial. However, in most cases EC are provided in unstandardised ways such as free text, which raises significant challenges for machine-readability. OBJECTIVES To establish a list of most frequent medical concepts in clinical trials with semantic annotations. This concept list contributes to standardisation of EC and identifies relevant data items in electronic health records (EHRs) for clinical research. The coverage of the list in two major clinical vocabularies, MeSH and SNOMED-CT, will be assessed. METHODS Four hundred and twenty-five clinical trials conducted between 2000 and 2011 at a German university hospital were analysed. 6671 EC were manually annotated by a medical coder using Concept Unique Identifiers (CUIs) provided by the Unified Medical Language System. Two physicians performed a semi-automatic CUI code revision. Concept frequency was analysed and clusters of concepts were manually identified.A binomial significance test was applied to quantify coverage differences of the most frequent concepts in MeSH and SNOMED-CT. RESULTS Based on manual medical coding of 425 clinical trials, 7588 concepts were identified, of which 5236 were distinct. A top 100 list containing 101 most frequent medical concepts was established. The concepts of this list cover 25 % of all concept occurrences in all analysed clinical trials. This list reveals six missing entries in SNOMED-CT, 12 in MeSH. The median of EC frequency per trial has increased throughout the trial years (2000 -2005: 8 EC/trial, 2011: 14 EC/trial). CONCLUSIONS Relatively few concepts cover one quarter of concept occurrences that represent EC in recent studies. Therefore, these concepts can serve as candidate data elements for integration into EHRs to optimise patient recruitment in clinical research.
Collapse
|
21
|
Formalizing MedDRA to support semantic reasoning on adverse drug reaction terms. J Biomed Inform 2014; 49:282-91. [PMID: 24680984 DOI: 10.1016/j.jbi.2014.03.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Revised: 03/10/2014] [Accepted: 03/16/2014] [Indexed: 11/27/2022]
Abstract
Although MedDRA has obvious advantages over previous terminologies for coding adverse drug reactions and discovering potential signals using data mining techniques, its terminological organization constrains users to search terms according to predefined categories. Adding formal definitions to MedDRA would allow retrieval of terms according to a case definition that may correspond to novel categories that are not currently available in the terminology. To achieve semantic reasoning with MedDRA, we have associated formal definitions to MedDRA terms in an OWL file named OntoADR that is the result of our first step for providing an "ontologized" version of MedDRA. MedDRA five-levels original hierarchy was converted into a subsumption tree and formal definitions of MedDRA terms were designed using several methods: mappings to SNOMED-CT, semi-automatic definition algorithms or a fully manual way. This article presents the main steps of OntoADR conception process, its structure and content, and discusses problems and limits raised by this attempt to "ontologize" MedDRA.
Collapse
|
22
|
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J Biomed Inform 2013; 48:54-65. [PMID: 24316051 DOI: 10.1016/j.jbi.2013.11.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 08/16/2013] [Accepted: 11/17/2013] [Indexed: 11/16/2022]
Abstract
Rapid, automated determination of the mapping of free text phrases to pre-defined concepts could assist in the annotation of clinical notes and increase the speed of natural language processing systems. The aim of this study was to design and evaluate a token-order-specific naïve Bayes-based machine learning system (RapTAT) to predict associations between phrases and concepts. Performance was assessed using a reference standard generated from 2860 VA discharge summaries containing 567,520 phrases that had been mapped to 12,056 distinct Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) concepts by the MCVS natural language processing system. It was also assessed on the manually annotated, 2010 i2b2 challenge data. Performance was established with regard to precision, recall, and F-measure for each of the concepts within the VA documents using bootstrapping. Within that corpus, concepts identified by MCVS were broadly distributed throughout SNOMED CT, and the token-order-specific language model achieved better performance based on precision, recall, and F-measure (0.95±0.15, 0.96±0.16, and 0.95±0.16, respectively; mean±SD) than the bag-of-words based, naïve Bayes model (0.64±0.45, 0.61±0.46, and 0.60±0.45, respectively) that has previously been used for concept mapping. Precision, recall, and F-measure on the i2b2 test set were 92.9%, 85.9%, and 89.2% respectively, using the token-order-specific model. RapTAT required just 7.2ms to map all phrases within a single discharge summary, and mapping rate did not decrease as the number of processed documents increased. The high performance attained by the tool in terms of both accuracy and speed was encouraging, and the mapping rate should be sufficient to support near-real-time, interactive annotation of medical narratives. These results demonstrate the feasibility of rapidly and accurately mapping phrases to a wide range of medical concepts based on a token-order-specific naïve Bayes model and machine learning.
Collapse
|
23
|
A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 2013; 48:38-53. [PMID: 24269894 DOI: 10.1016/j.jbi.2013.11.006] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 11/06/2013] [Accepted: 11/09/2013] [Indexed: 10/26/2022]
Abstract
Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context.
Collapse
|
24
|
Medical image retrieval: past and present. Healthc Inform Res 2012; 18:3-9. [PMID: 22509468 PMCID: PMC3324753 DOI: 10.4258/hir.2012.18.1.3] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 03/24/2012] [Accepted: 03/26/2012] [Indexed: 11/23/2022] Open
Abstract
With the widespread dissemination of picture archiving and communication systems (PACSs) in hospitals, the amount of imaging data is rapidly increasing. Effective image retrieval systems are required to manage these complex and large image databases. The authors reviewed the past development and the present state of medical image retrieval systems including text-based and content-based systems. In order to provide a more effective image retrieval service, the intelligent content-based retrieval systems combined with semantic systems are required.
Collapse
|
25
|
Review of semantically interoperable electronic health records for ubiquitous healthcare. Healthc Inform Res 2010; 16:1-5. [PMID: 21818417 PMCID: PMC3089838 DOI: 10.4258/hir.2010.16.1.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2010] [Accepted: 03/19/2010] [Indexed: 11/23/2022] Open
Abstract
In order to provide more effective and personalized healthcare services to patients and healthcare professionals, intelligent active knowledge management and reasoning systems with semantic interoperability are needed. Technological developments have changed ubiquitous healthcare making it more semantically interoperable and individual patient-based; however, there are also limitations to these methodologies. Based upon an extensive review of international literature, this paper describes two technological approaches to semantically interoperable electronic health records for ubiquitous healthcare data management: the ontology-based model and the information, or openEHR archetype model, and the link to standard terminologies such as SNOMED-CT.
Collapse
|
26
|
Policy agenda for the next decade: creating a path for graceful evolution and harmonized classifications and terminologies used for encoding health information in electronic environments. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2009; 6:1c. [PMID: 20169015 PMCID: PMC2804458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Health information management (HIM) professionals' involvement with disease classification and nomenclature in the United States can be traced back to the early 20th century. In 1914, Grace Whiting Myers, the founder of the association known today as the American Health Information Management Association (AHIMA), served on the Committee on Uniform Nomenclature, which developed a disease classification system based upon etiological groupings. The profession's expertise and leadership in the collection, classification, and reporting of health data has continued since then. For example, in the early 1960s, another HIM professional (a medical record librarian) served as the associate editor of the fifth edition of the Standard Nomenclature of Disease (SNDO), a forerunner of the widely used clinical terminology, Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT). During the same period in history, the medical record professionals working in hospitals throughout the country were responsible for manually collecting and reporting disease and procedure information from medical records using SNDO. Because coded data have played a pivotal role in the ability to record and share health information through the years, creating the appropriate policy framework for the graceful evolution and harmonization of classification systems and clinical terminologies is essential.
Collapse
|