Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

38
(from Reference Citation Analysis)

Article PDFs (11)

Cited by > 0 (26)

Searched Name

Unified Medical Language System

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Zhou H, Austin R, Lu SC, Silverman GM, Zhou Y, Kilicoglu H, Xu H, Zhang R. Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition. J Am Med Inform Assoc 2024;31:426-434. [PMID: 37952122 PMCID: PMC10797266 DOI: 10.1093/jamia/ocad216] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/20/2023] [Accepted: 11/08/2023] [Indexed: 11/14/2023] Open

Newbury A, Liu H, Idnay B, Weng C. The suitability of UMLS and SNOMED-CT for encoding outcome concepts. J Am Med Inform Assoc 2023;30:1895-1903. [PMID: 37615994 PMCID: PMC10654851 DOI: 10.1093/jamia/ocad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/14/2023] [Accepted: 08/02/2023] [Indexed: 08/25/2023] Open

Chen A, Huang R, Wu E, Han R, Wen J, Li Q, Zhang Z, Shen B. The Generation of a Lung Cancer Health Factor Distribution Using Patient Graphs Constructed From Electronic Medical Records: Retrospective Study. J Med Internet Res 2022;24:e40361. [PMID: 36427233 PMCID: PMC9736747 DOI: 10.2196/40361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/09/2022] [Accepted: 10/25/2022] [Indexed: 11/27/2022] Open

Abstract

BACKGROUND

Electronic medical records (EMRs) of patients with lung cancer (LC) capture a variety of health factors. Understanding the distribution of these factors will help identify key factors for risk prediction in preventive screening for LC.

OBJECTIVE

We aimed to generate an integrated biomedical graph from EMR data and Unified Medical Language System (UMLS) ontology for LC, and to generate an LC health factor distribution from a hospital EMR of approximately 1 million patients.

METHODS

The data were collected from 2 sets of 1397 patients with and those without LC. A patient-centered health factor graph was plotted with 108,000 standardized data, and a graph database was generated to integrate the graphs of patient health factors and the UMLS ontology. With the patient graph, we calculated the connection delta ratio (CDR) for each of the health factors to measure the relative strength of the factor's relationship to LC.

RESULTS

The patient graph had 93,000 relations between the 2794 patient nodes and 650 factor nodes. An LC graph with 187 related biomedical concepts and 188 horizontal biomedical relations was plotted and linked to the patient graph. Searching the integrated biomedical graph with any number or category of health factors resulted in graphical representations of relationships between patients and factors, while searches using any patient presented the patient's health factors from the EMR and the LC knowledge graph (KG) from the UMLS in the same graph. Sorting the health factors by CDR in descending order generated a distribution of health factors for LC. The top 70 CDR-ranked factors of disease, symptom, medical history, observation, and laboratory test categories were verified to be concordant with those found in the literature.

CONCLUSIONS

By collecting standardized data of thousands of patients with and those without LC from the EMR, it was possible to generate a hospital-wide patient-centered health factor graph for graph search and presentation. The patient graph could be integrated with the UMLS KG for LC and thus enable hospitals to bring continuously updated international standard biomedical KGs from the UMLS for clinical use in hospitals. CDR analysis of the graph of patients with LC generated a CDR-sorted distribution of health factors, in which the top CDR-ranked health factors were concordant with the literature. The resulting distribution of LC health factors can be used to help personalize risk evaluation and preventive screening recommendations.

Collapse

Güngör B, Deppenwiese N, Mang JM, Toddenroth D. Analysis of the Representation of Frequent Clinical Attributes in the Unified Medical Language System. Stud Health Technol Inform 2022;299:217-222. [PMID: 36325866 DOI: 10.3233/shti220987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Nguyen V, Bodenreider O. Adding an Attention Layer Improves the Performance of a Neural Network Architecture for Synonymy Prediction in the UMLS Metathesaurus. Stud Health Technol Inform 2022;290:116-119. [PMID: 35672982 PMCID: PMC9484765 DOI: 10.3233/shti220043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Ulrich H, Uzunova H, Handels H, Ingenerf J. Proposal of Semantic Annotation for German Metadata Using Bidirectional Recurrent Neural Networks. Stud Health Technol Inform 2022;294:357-361. [PMID: 35612096 DOI: 10.3233/shti220474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Humphreys BL, Tuttle MS. Something New and Different: The Unified Medical Language System. Stud Health Technol Inform 2022;288:100-112. [PMID: 35102832 DOI: 10.3233/shti210985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Abdollahi M, Gao X, Mei Y, Ghosh S, Li J, Narag M. Substituting clinical features using synthetic medical phrases: Medical text data augmentation techniques. Artif Intell Med 2021;120:102167. [PMID: 34629150 DOI: 10.1016/j.artmed.2021.102167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 09/02/2021] [Accepted: 09/03/2021] [Indexed: 11/22/2022]

Jing X. The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis. JMIR Med Inform 2021;9:e20675. [PMID: 34236337 PMCID: PMC8433943 DOI: 10.2196/20675] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 11/25/2020] [Accepted: 07/02/2021] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications.

OBJECTIVE

Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years.

METHODS

PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.

RESULTS

A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%).

CONCLUSIONS

The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.

Collapse

Newman-Griffis D, Divita G, Desmet B, Zirikly A, Rosé CP, Fosler-Lussier E. Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets. J Am Med Inform Assoc 2021;28:516-532. [PMID: 33319905 DOI: 10.1093/jamia/ocaa269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 09/13/2020] [Accepted: 11/17/2020] [Indexed: 12/18/2022] Open

Abstract

OBJECTIVES

Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research.

MATERIALS AND METHODS

We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language.

RESULTS

We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena.

DISCUSSION

Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods.

CONCLUSIONS

Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.

Collapse

Chang E, Mostafa J. The use of SNOMED CT, 2013-2020: a literature review. J Am Med Inform Assoc 2021;28:2017-2026. [PMID: 34151978 DOI: 10.1093/jamia/ocab084] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/30/2021] [Accepted: 04/26/2021] [Indexed: 11/12/2022] Open

Kang T, Perotte A, Tang Y, Ta C, Weng C. UMLS-based data augmentation for natural language processing of clinical research literature. J Am Med Inform Assoc 2021;28:812-823. [PMID: 33367705 DOI: 10.1093/jamia/ocaa309] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 11/23/2020] [Indexed: 01/17/2023] Open

Tran TTT, Nghiem SV, Le VT, Quan TT, Nguyen V, Yip HY, Bodenreider O. Siamese KG-LSTM: A deep learning model for enriching UMLS Metathesaurus synonymy. Int Conf Knowl Syst Eng 2020;2020:281-286. [PMID: 36277606 PMCID: PMC9584311 DOI: 10.1109/kse50997.2020.9287797] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Xu S, Xu D, Wen L, Zhu C, Yang Y, Han S, Guan P. Integrating Unified Medical Language System and Kleinberg's Burst Detection Algorithm into Research Topics of Medications for Post-Traumatic Stress Disorder. Drug Des Devel Ther 2020;14:3899-3913. [PMID: 33061296 PMCID: PMC7522601 DOI: 10.2147/dddt.s270379] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 09/07/2020] [Indexed: 11/23/2022]

Abstract

Background

The treatment of post-traumatic stress disorder (PTSD) has long been a challenge because the symptoms of PTSD are multifaceted. PTSD is primarily treated with psychotherapy and medication, or a combination of psychotherapy and medication. The present study was designed to analyze the literature on medications for PTSD and explore high-frequency common drugs and low-frequency burst drugs by burst detection algorithm combined with Unified Medical Language System (UMLS) and provide references for developing new drugs for PTSD.

Methods

Publications related to medications for PTSD from 2010 to 2019 were identified through PubMed, Web of Science Core Collection, and BIOSIS Previews. SemRep and SemRep semantic result processing system were performed to extract the set of drug concepts with therapeutic relationship according to the semantic relationship of UMLS. Kleinberg’s burst detection algorithm was applied to calculate the burst weight index of drug concepts by a Java-based program. These concepts were sorted according to the frequency and the burst weight index.

Results

Four hundred and fifty-nine treatment-related drug concepts were extracted. The drug with the highest burst weight index was “Psilocybine”, a hallucinogen, which was more likely to be a hotspot for the pharmacotherapy of PTSD. The highest frequency concept was “prazosin”, which was more likely to be the focus of research in the medications for PTSD.

Conclusion

The present study assessed the medication-related literature on PTSD treatment, providing a framework of burst words detection-based method, a baseline of information for future research and the new attempt for the discovery of textual knowledge. The bibliometric analysis based on the burst detection algorithm combined with UMLS has shown certain feasibility in amplifying the microscopic changes of a specific research direction in a field, it can also be used in other aspects of disease and to explore the trends of various disciplines.

Collapse

Zheng F, Shi J, Yang Y, Zheng WJ, Cui L. A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System. J Am Med Inform Assoc 2020;27:1568-1575. [PMID: 32918476 PMCID: PMC7566369 DOI: 10.1093/jamia/ocaa123] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/09/2020] [Accepted: 05/20/2020] [Indexed: 01/06/2023] Open

Amos L, Anderson D, Brody S, Ripple A, Humphreys BL. UMLS users and uses: a current overview. J Am Med Inform Assoc 2020;27:ocaa084. [PMID: 32683453 PMCID: PMC7580803 DOI: 10.1093/jamia/ocaa084] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 03/31/2020] [Accepted: 05/01/2020] [Indexed: 11/16/2022] Open

Grosjean J, Billey K, Charlet J, Darmoni SJ. Manual Evaluation of the Automatic Mapping of International Classification of Diseases (ICD)-11 (in French). Stud Health Technol Inform 2020;270:1335-1336. [PMID: 32570646 DOI: 10.3233/shti200429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Naderi H, Madani S, Kiani B, Etminani K. Similarity of medical concepts in question and answering of health communities. Health Informatics J 2019;26:1443-1454. [PMID: 31635510 DOI: 10.1177/1460458219881333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Li Y, Yao L, Mao C, Srivastava A, Jiang X, Luo Y. Early Prediction of Acute Kidney Injury in Critical Care Setting Using Clinical Notes. Proceedings (IEEE Int Conf Bioinformatics Biomed) 2018;2018:683-686. [PMID: 33376624 PMCID: PMC7768909 DOI: 10.1109/bibm.2018.8621574] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Varghese J, Sandmann S, Dugas M. Web-Based Information Infrastructure Increases the Interrater Reliability of Medical Coders: Quasi-Experimental Study. J Med Internet Res 2018;20:e274. [PMID: 30322834 PMCID: PMC6231825 DOI: 10.2196/jmir.9644] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 05/03/2018] [Accepted: 06/28/2018] [Indexed: 01/05/2023] Open

Abstract

Background

Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and Natural Language Processing tools. However, the abundance of ambiguous codes leads to low rates of uniform coding among different coders.

Objective

The objective of our study was to measure uniform coding among different medical experts in terms of interrater reliability and analyze the effect on interrater reliability using an expert- and Web-based code suggestion system.

Methods

We conducted a quasi-experimental study in which 6 medical experts coded 602 medical items from structured quality assurance forms or free-text eligibility criteria of 20 different clinical trials. The medical item content was selected on the basis of mortality-leading diseases according to World Health Organization data. The intervention comprised using a semiautomatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of >300,000 medical form items with expert-assigned semantic codes. Krippendorff alpha (K_alpha) with bootstrap analysis was used for the interrater reliability analysis, and coding times were measured before and after the intervention.

Results

The intervention improved interrater reliability in structured quality assurance form items (from K_alpha=0.50, 95% CI 0.43-0.57 to K_alpha=0.62 95% CI 0.55-0.69) and free-text eligibility criteria (from K_alpha=0.19, 95% CI 0.14-0.24 to K_alpha=0.43, 95% CI 0.37-0.50) while preserving or slightly reducing the mean coding time per item for all 6 coders. Regardless of the intervention, precoordination and structured items were associated with significantly high interrater reliability, but the proportion of items that were precoordinated significantly increased after intervention (eligibility criteria: OR 4.92, 95% CI 2.78-8.72; quality assurance: OR 1.96, 95% CI 1.19-3.25).

Conclusions

The Web-based code suggestion mechanism improved interrater reliability toward moderate or even substantial intercoder agreement. Precoordination and the use of structured versus free-text data elements are key drivers of higher interrater reliability.

Collapse

Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries. Clin Epidemiol 2018;10:961-970. [PMID: 30127646 PMCID: PMC6089100 DOI: 10.2147/clep.s170075] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Chen D, Zhang R, Liu K, Hou L. Knowledge Discovery from Posts in Online Health Communities Using Unified Medical Language System. Int J Environ Res Public Health 2018;15:E1291. [PMID: 29921824 PMCID: PMC6025155 DOI: 10.3390/ijerph15061291] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 06/15/2018] [Accepted: 06/16/2018] [Indexed: 12/03/2022]

Weng WH, Wagholikar KB, McCray AT, Szolovits P, Chueh HC. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach. BMC Med Inform Decis Mak 2017;17:155. [PMID: 29191207 PMCID: PMC5709846 DOI: 10.1186/s12911-017-0556-8] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 11/19/2017] [Indexed: 01/18/2023] Open

Abstract

BACKGROUND

The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note.

METHODS

We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets.

RESULTS

The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied.

CONCLUSION

Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.

Collapse

Hegselmann S, Gessner S, Neuhaus P, Henke J, Schmidt CO, Dugas M. Automatic Conversion of Metadata from the Study of Health in Pomerania to ODM. Stud Health Technol Inform 2017;236:88-96. [PMID: 28508783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Raje S, Bodenreider O. Interoperability of Disease Concepts in Clinical and Research Ontologies: Contrasting Coverage and Structure in the Disease Ontology and SNOMED CT. Stud Health Technol Inform 2017;245:925-929. [PMID: 29295235 PMCID: PMC5881393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Yu Z, Wallace BC, Johnson T, Cohen T. Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness. Stud Health Technol Inform 2017;245:657-661. [PMID: 29295178 PMCID: PMC6464117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Festag S, Spreckelsen C. Word Sense Disambiguation of Medical Terms via Recurrent Convolutional Neural Networks. Stud Health Technol Inform 2017;236:8-15. [PMID: 28508773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Lu CJ, Tormey D, McCreedy L, Browne AC. Enhanced LexSynonym Acquisition for Effective UMLS Concept Mapping. Stud Health Technol Inform 2017;245:501-505. [PMID: 29295145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Duque A, Martinez-Romo J, Araujo L. Can multilinguality improve Biomedical Word Sense Disambiguation? J Biomed Inform 2016;64:320-332. [PMID: 27815227 DOI: 10.1016/j.jbi.2016.10.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 10/24/2016] [Accepted: 10/31/2016] [Indexed: 10/20/2022]

Mowery DL, South BR, Christensen L, Leng J, Peltonen LM, Salanterä S, Suominen H, Martinez D, Velupillai S, Elhadad N, Savova G, Pradhan S, Chapman WW. Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. J Biomed Semantics 2016;7:43. [PMID: 27370271 PMCID: PMC4930590 DOI: 10.1186/s13326-016-0084-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 06/01/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term.

METHODS

In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms.

RESULTS

The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy.

CONCLUSION

Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.

Collapse

Scuba W, Tharp M, Mowery D, Tseytlin E, Liu Y, Drews FA, Chapman WW. Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semantics 2016;7:42. [PMID: 27338146 PMCID: PMC4919842 DOI: 10.1186/s13326-016-0086-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 06/01/2016] [Indexed: 11/26/2022] Open

Abstract

BACKGROUND

Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text.

RESULTS

Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76 %) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86 %) and varied recall for modifiers (certainty: 91 % sidedness: 80 %, neurovascular anatomy: 46 %).

CONCLUSION

Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts.

Collapse

Shivade C, Malewadkar P, Fosler-Lussier E, Lai AM. Comparison of UMLS terminologies to identify risk of heart disease using clinical notes. J Biomed Inform 2015;58 Suppl:S103-S110. [PMID: 26375493 DOI: 10.1016/j.jbi.2015.08.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 08/23/2015] [Accepted: 08/25/2015] [Indexed: 10/23/2022]

Hanauer DA, Saeed M, Zheng K, Mei Q, Shedden K, Aronson AR, Ramakrishnan N. Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis. J Am Med Inform Assoc 2014;21:925-37. [PMID: 24928177 PMCID: PMC4147617 DOI: 10.1136/amiajnl-2014-002767] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 05/23/2014] [Accepted: 05/27/2014] [Indexed: 02/07/2023] Open

Guo X, Yu Q, Alm CO, Calvelli C, Pelz JB, Shi P, Haake AR. From spoken narratives to domain knowledge: mining linguistic data for medical image understanding. Artif Intell Med 2014;62:79-90. [PMID: 25174882 DOI: 10.1016/j.artmed.2014.08.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2013] [Revised: 07/29/2014] [Accepted: 08/10/2014] [Indexed: 10/24/2022]

Abstract

OBJECTIVES

Extracting useful visual clues from medical images allowing accurate diagnoses requires physicians' domain knowledge acquired through years of systematic study and clinical training. This is especially true in the dermatology domain, a medical specialty that requires physicians to have image inspection experience. Automating or at least aiding such efforts requires understanding physicians' reasoning processes and their use of domain knowledge. Mining physicians' references to medical concepts in narratives during image-based diagnosis of a disease is an interesting research topic that can help reveal experts' reasoning processes. It can also be a useful resource to assist with design of information technologies for image use and for image case-based medical education systems.

METHODS AND MATERIALS

We collected data for analyzing physicians' diagnostic reasoning processes by conducting an experiment that recorded their spoken descriptions during inspection of dermatology images. In this paper we focus on the benefit of physicians' spoken descriptions and provide a general workflow for mining medical domain knowledge based on linguistic data from these narratives. The challenge of a medical image case can influence the accuracy of the diagnosis as well as how physicians pursue the diagnostic process. Accordingly, we define two lexical metrics for physicians' narratives--lexical consensus score and top N relatedness score--and evaluate their usefulness by assessing the diagnostic challenge levels of corresponding medical images. We also report on clustering medical images based on anchor concepts obtained from physicians' medical term usage. These analyses are based on physicians' spoken narratives that have been preprocessed by incorporating the Unified Medical Language System for detecting medical concepts.

RESULTS

The image rankings based on lexical consensus score and on top 1 relatedness score are well correlated with those based on challenge levels (Spearman correlation>0.5 and Kendall correlation>0.4). Clustering results are largely improved based on our anchor concept method (accuracy>70% and mutual information>80%).

CONCLUSIONS

Physicians' spoken narratives are valuable for the purpose of mining the domain knowledge that physicians use in medical image inspections. We also show that the semantic metrics introduced in the paper can be successfully applied to medical image understanding and allow discussion of additional uses of these metrics.

Collapse

Mougin F, Grabar N. Auditing the multiply-related concepts within the UMLS. J Am Med Inform Assoc 2014;21:e185-93. [PMID: 24464853 DOI: 10.1136/amiajnl-2013-002227] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Gobbel GT, Reeves R, Jayaramaraja S, Giuse D, Speroff T, Brown SH, Elkin PL, Matheny ME. Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J Biomed Inform 2013;48:54-65. [PMID: 24316051 DOI: 10.1016/j.jbi.2013.11.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 08/16/2013] [Accepted: 11/17/2013] [Indexed: 11/16/2022]

Abstract

Rapid, automated determination of the mapping of free text phrases to pre-defined concepts could assist in the annotation of clinical notes and increase the speed of natural language processing systems. The aim of this study was to design and evaluate a token-order-specific naïve Bayes-based machine learning system (RapTAT) to predict associations between phrases and concepts. Performance was assessed using a reference standard generated from 2860 VA discharge summaries containing 567,520 phrases that had been mapped to 12,056 distinct Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) concepts by the MCVS natural language processing system. It was also assessed on the manually annotated, 2010 i2b2 challenge data. Performance was established with regard to precision, recall, and F-measure for each of the concepts within the VA documents using bootstrapping. Within that corpus, concepts identified by MCVS were broadly distributed throughout SNOMED CT, and the token-order-specific language model achieved better performance based on precision, recall, and F-measure (0.95±0.15, 0.96±0.16, and 0.95±0.16, respectively; mean±SD) than the bag-of-words based, naïve Bayes model (0.64±0.45, 0.61±0.46, and 0.60±0.45, respectively) that has previously been used for concept mapping. Precision, recall, and F-measure on the i2b2 test set were 92.9%, 85.9%, and 89.2% respectively, using the token-order-specific model. RapTAT required just 7.2ms to map all phrases within a single discharge summary, and mapping rate did not decrease as the number of processed documents increased. The high performance attained by the tool in terms of both accuracy and speed was encouraging, and the mapping rate should be sufficient to support near-real-time, interactive annotation of medical narratives. These results demonstrate the feasibility of rapidly and accurately mapping phrases to a wide range of medical concepts based on a token-order-specific naïve Bayes model and machine learning.

Collapse

Affiliation(s)

Glenn T Gobbel Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
Ruth Reeves Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
Shrimalini Jayaramaraja Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
Dario Giuse Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
Theodore Speroff Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.
Steven H Brown Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
Peter L Elkin Department of Biomedical Informatics, University at Buffalo, SUNY, Buffalo, NY, USA.
Michael E Matheny Geriatric Research, Education and Clinical Center (GRECC), Department of Veterans Affairs Medical Center, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA; Division of General Internal Medicine & Public Health, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA.

Collapse

Wei WQ, Cronin RM, Xu H, Lasko TA, Bastarache L, Denny JC. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc 2013;20:954-61. [PMID: 23576672 PMCID: PMC3756263 DOI: 10.1136/amiajnl-2012-001431] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2012] [Revised: 02/25/2013] [Accepted: 03/18/2013] [Indexed: 11/09/2022] Open

Burgun A, Bodenreider O. Aspects of the Taxonomic Relation in the Biomedical Domain. Form Ontol Inf Syst 2001;2001:222-233. [PMID: 25635263 PMCID: PMC4307028 DOI: 10.1145/505168.505190] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]