1
|
Jing X. The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis. JMIR Med Inform 2021; 9:e20675. [PMID: 34236337 PMCID: PMC8433943 DOI: 10.2196/20675] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 11/25/2020] [Accepted: 07/02/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications. OBJECTIVE Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years. METHODS PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. RESULTS A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%). CONCLUSIONS The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| |
Collapse
|
2
|
Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020; 11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations. METHODS Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies' objectives were categorized by way of induction. These results were used to define recommendations. RESULTS Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed. CONCLUSION We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.
Collapse
Affiliation(s)
- Martijn G. Kersloot
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| | - Florentien J. P. van Putten
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
| | - Derk L. Arts
- Amsterdam UMC, University of Amsterdam, Department of Medical Informatics, Amsterdam Public Health Research Institute Castor EDC, Room J1B-109, PO Box 22700, 1100 DE Amsterdam, The Netherlands
- Castor EDC, Amsterdam, The Netherlands
| |
Collapse
|
3
|
Automatic Disease Annotation From Radiology Reports Using Artificial Intelligence Implemented by a Recurrent Neural Network. AJR Am J Roentgenol 2019; 212:734-740. [DOI: 10.2214/ajr.18.19869] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
4
|
Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield. AJR Am J Roentgenol 2017; 208:750-753. [PMID: 28140627 DOI: 10.2214/ajr.16.16128] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The purpose of this study is to evaluate the performance of a natural language processing (NLP) system in classifying a database of free-text knee MRI reports at two separate academic radiology practices. MATERIALS AND METHODS An NLP system that uses terms and patterns in manually classified narrative knee MRI reports was constructed. The NLP system was trained and tested on expert-classified knee MRI reports from two major health care organizations. Radiology reports were modeled in the training set as vectors, and a support vector machine framework was used to train the classifier. A separate test set from each organization was used to evaluate the performance of the system. We evaluated the performance of the system both within and across organizations. Standard evaluation metrics, such as accuracy, precision, recall, and F1 score (i.e., the weighted average of the precision and recall), and their respective 95% CIs were used to measure the efficacy of our classification system. RESULTS The accuracy for radiology reports that belonged to the model's clinically significant concept classes after training data from the same institution was good, yielding an F1 score greater than 90% (95% CI, 84.6-97.3%). Performance of the classifier on cross-institutional application without institution-specific training data yielded F1 scores of 77.6% (95% CI, 69.5-85.7%) and 90.2% (95% CI, 84.5-95.9%) at the two organizations studied. CONCLUSION The results show excellent accuracy by the NLP machine learning classifier in classifying free-text knee MRI reports, supporting the institution-independent reproducibility of knee MRI report classification. Furthermore, the machine learning classifier performed well on free-text knee MRI reports from another institution. These data support the feasibility of multiinstitutional classification of radiologic imaging text reports with a single machine learning classifier without requiring institution-specific training data.
Collapse
|
5
|
Cai T, Giannopoulos AA, Yu S, Kelil T, Ripley B, Kumamaru KK, Rybicki FJ, Mitsouras D. Natural Language Processing Technologies in Radiology Research and Clinical Applications. Radiographics 2016; 36:176-91. [PMID: 26761536 DOI: 10.1148/rg.2016150080] [Citation(s) in RCA: 111] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications.
Collapse
Affiliation(s)
- Tianrun Cai
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Andreas A Giannopoulos
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Sheng Yu
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Tatiana Kelil
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Beth Ripley
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Kanako K Kumamaru
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Frank J Rybicki
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| | - Dimitrios Mitsouras
- From the Applied Imaging Science Laboratory, Department of Radiology, Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115 (T.C., A.A.G., K.K.K., F.J.R., D.M.); Harvard T.H. Chan School of Public Health, Boston, Mass (S.Y.); and Department of Radiology, Brigham and Women's Hospital, Boston, Mass (T.K., B.R.)
| |
Collapse
|
6
|
Pons E, Braun LMM, Hunink MGM, Kors JA. Natural Language Processing in Radiology: A Systematic Review. Radiology 2016; 279:329-43. [PMID: 27089187 DOI: 10.1148/radiol.16142770] [Citation(s) in RCA: 273] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed.
Collapse
Affiliation(s)
- Ewoud Pons
- From the Departments of Radiology (E.P., L.M.M.B., M.G.M.H.) and Medical Informatics (J.A.K.), Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, the Netherlands
| | - Loes M M Braun
- From the Departments of Radiology (E.P., L.M.M.B., M.G.M.H.) and Medical Informatics (J.A.K.), Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, the Netherlands
| | - M G Myriam Hunink
- From the Departments of Radiology (E.P., L.M.M.B., M.G.M.H.) and Medical Informatics (J.A.K.), Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, the Netherlands
| | - Jan A Kors
- From the Departments of Radiology (E.P., L.M.M.B., M.G.M.H.) and Medical Informatics (J.A.K.), Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, the Netherlands
| |
Collapse
|
7
|
|
8
|
Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH. A text processing pipeline to extract recommendations from radiology reports. J Biomed Inform 2013; 46:354-62. [PMID: 23354284 DOI: 10.1016/j.jbi.2012.12.005] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 11/21/2012] [Accepted: 12/30/2012] [Indexed: 12/21/2022]
Abstract
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. The absence of an automated system to identify and track radiology recommendations is an important barrier to ensuring timely follow-up of patients especially with non-acute incidental findings on imaging examinations. In this paper, we present a text processing pipeline to automatically identify clinically important recommendation sentences in radiology reports. Our extraction pipeline is based on natural language processing (NLP) and supervised text classification methods. To develop and test the pipeline, we created a corpus of 800 radiology reports double annotated for recommendation sentences by a radiologist and an internist. We ran several experiments to measure the impact of different feature types and the data imbalance between positive and negative recommendation sentences. Our fully statistical approach achieved the best f-score 0.758 in identifying the critical recommendation sentences in radiology reports.
Collapse
Affiliation(s)
- Meliha Yetisgen-Yildiz
- Biomedical & Health Informatics, School of Medicine, University of Washington, Seattle, WA, United States.
| | | | | | | |
Collapse
|
9
|
Creation and storage of standards-based pre-scanning patient questionnaires in PACS as DICOM objects. J Digit Imaging 2012; 24:823-7. [PMID: 20976611 DOI: 10.1007/s10278-010-9348-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
Radiology departments around the country have completed the first evolution to digital imaging by becoming filmless. The next step in this evolution is to become truly paperless. Both patient and non-patient paperwork has to be eliminated in order for this transition to occur. A paper-based set of patient pre-scanning questionnaires were replaced with web-based forms for use in an outpatient imaging center. We discuss this process by which questionnaire elements are converted into SNOMED-CT terminology concepts, stored for future use, and sent to PACS in Digital Imaging and Communications in Medicine (DICOM) format to be permanently stored with the relevant study in the DICOM image database.
Collapse
|
10
|
O'Sullivan DM, Wilk SA, Michalowski WJ, Farion KJ. Automatic indexing and retrieval of encounter-specific evidence for point-of-care support. J Biomed Inform 2010; 43:623-31. [PMID: 20230908 DOI: 10.1016/j.jbi.2010.03.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2009] [Revised: 03/08/2010] [Accepted: 03/11/2010] [Indexed: 11/27/2022]
Abstract
Evidence-based medicine relies on repositories of empirical research evidence that can be used to support clinical decision making for improved patient care. However, retrieving evidence from such repositories at local sites presents many challenges. This paper describes a methodological framework for automatically indexing and retrieving empirical research evidence in the form of the systematic reviews and associated studies from The Cochrane Library, where retrieved documents are specific to a patient-physician encounter and thus can be used to support evidence-based decision making at the point of care. Such an encounter is defined by three pertinent groups of concepts - diagnosis, treatment, and patient, and the framework relies on these three groups to steer indexing and retrieval of reviews and associated studies. An evaluation of the indexing and retrieval components of the proposed framework was performed using documents relevant for the pediatric asthma domain. Precision and recall values for automatic indexing of systematic reviews and associated studies were 0.93 and 0.87, and 0.81 and 0.56, respectively. Moreover, precision and recall for the retrieval of relevant systematic reviews and associated studies were 0.89 and 0.81, and 0.92 and 0.89, respectively. With minor modifications, the proposed methodological framework can be customized for other evidence repositories.
Collapse
Affiliation(s)
- Dympna M O'Sullivan
- School of Engineering and Applied Science, Aston University, Aston Triangle, Birmingham, B4 7ET, UK.
| | | | | | | |
Collapse
|
11
|
Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform 2009; 42:760-72. [PMID: 19683066 PMCID: PMC2757540 DOI: 10.1016/j.jbi.2009.08.007] [Citation(s) in RCA: 266] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Revised: 08/10/2009] [Accepted: 08/11/2009] [Indexed: 11/29/2022]
Abstract
Computerized clinical decision support (CDS) aims to aid decision making of health care providers and the public by providing easily accessible health-related information at the point and time it is needed. natural language processing (NLP) is instrumental in using free-text information to drive CDS, representing clinical knowledge and CDS interventions in standardized formats, and leveraging clinical narrative. The early innovative NLP research of clinical narrative was followed by a period of stable research conducted at the major clinical centers and a shift of mainstream interest to biomedical NLP. This review primarily focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS. The current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals are discussed.
Collapse
Affiliation(s)
- Dina Demner-Fushman
- U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
12
|
Zheng B. Computer-Aided Diagnosis in Mammography Using Content-based Image Retrieval Approaches: Current Status and Future Perspectives. ALGORITHMS 2009; 2:828-849. [PMID: 20305801 PMCID: PMC2841362 DOI: 10.3390/a2020828] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
As the rapid advance of digital imaging technologies, the content-based image retrieval (CBIR) has became one of the most vivid research areas in computer vision. In the last several years, developing computer-aided detection and/or diagnosis (CAD) schemes that use CBIR to search for the clinically relevant and visually similar medical images (or regions) depicting suspicious lesions has also been attracting research interest. CBIR-based CAD schemes have potential to provide radiologists with "visual aid" and increase their confidence in accepting CAD-cued results in the decision making. The CAD performance and reliability depends on a number of factors including the optimization of lesion segmentation, feature selection, reference database size, computational efficiency, and relationship between the clinical relevance and visual similarity of the CAD results. By presenting and comparing a number of approaches commonly used in previous studies, this article identifies and discusses the optimal approaches in developing CBIR-based CAD schemes and assessing their performance. Although preliminary studies have suggested that using CBIR-based CAD schemes might improve radiologists' performance and/or increase their confidence in the decision making, this technology is still in the early development stage. Much research work is needed before the CBIR-based CAD schemes can be accepted in the clinical practice.
Collapse
Affiliation(s)
- Bin Zheng
- Imaging Research Center, Department of Radiology, University of Pittsburgh, 3362 Fifth Avenue, Room 128, Pittsburgh, PA 15213, USA
| |
Collapse
|
13
|
Chase HS, Kaufman DR, Johnson SB, Mendonca EA. Voice capture of medical residents' clinical information needs during an inpatient rotation. J Am Med Inform Assoc 2009; 16:387-94. [PMID: 19261939 PMCID: PMC2732238 DOI: 10.1197/jamia.m2940] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Accepted: 01/28/2009] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE To identify some of the challenges that medical residents face in addressing their information needs in an inpatient setting, by examining how voice capture in natural language of clinical questions fits into workflow, and by characterizing the focus, format, and semantic content and complexity of their questions. DESIGN Internal medicine residents captured information needs on a digital recorder while on a hospital inpatient service and then participated in semi-structured interviews. MEASUREMENTS Interviews were analyzed to identify emergent themes. Recorded questions were analyzed for focus (diagnosis, treatment, or epidemiology) and format, either foreground (specific knowledge relating to an individual patient) or background (general knowledge about a condition). Semantic concepts and types were identified using MetaMap (UMLS - Unified Medical Language System) and manually. RESULTS Voice recording of questions appeared to unmask residents' latent information needs. Although residents were able to record questions during workflow, there was a delay from the time questions materialized to when they were recorded. Question focus was distributed among diagnosis (32%), treatment (40%), and epidemiology (28%), and the majority of questions were background (69%). Questions were semantically complex; foreground and background questions averaged 12.6 (SD 6.0) and 9.1 (SD 6.0) UMLS concepts, respectively. MetaMap failed to recognize concepts when residents used acronyms or abbreviations or omitted key terms. CONCLUSIONS We found that it is feasible for residents to capture their clinical questions in natural language during workflow and that recording questions may prompt awareness of previously unrecognized information needs. However, the semantic complexity of typical questions and mapping failures due to residents' use of acronyms and abbreviations present challenges to machine-based extraction of semantic content.
Collapse
Affiliation(s)
- Herbert S Chase
- Herbert Chase, Department of Biomedical Informatics, CUMC, VC-5, 622 West 168 Street, New York, NY 10032, USA.
| | | | | | | |
Collapse
|
14
|
Kahn CE, Rubin DL. Automated semantic indexing of figure captions to improve radiology image retrieval. J Am Med Inform Assoc 2009; 16:380-6. [PMID: 19261938 DOI: 10.1197/jamia.m2945] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE We explored automated concept-based indexing of unstructured figure captions to improve retrieval of images from radiology journals. DESIGN The MetaMap Transfer program (MMTx) was used to map the text of 84,846 figure captions from 9,004 peer-reviewed, English-language articles to concepts in three controlled vocabularies from the UMLS Metathesaurus, version 2006AA. Sampling procedures were used to estimate the standard information-retrieval metrics of precision and recall, and to evaluate the degree to which concept-based retrieval improved image retrieval. MEASUREMENTS Precision was estimated based on a sample of 250 concepts. Recall was estimated based on a sample of 40 concepts. The authors measured the impact of concept-based retrieval to improve upon keyword-based retrieval in a random sample of 10,000 search queries issued by users of a radiology image search engine. RESULTS Estimated precision was 0.897 (95% confidence interval, 0.857-0.937). Estimated recall was 0.930 (95% confidence interval, 0.838-1.000). In 5,535 of 10,000 search queries (55%), concept-based retrieval found results not identified by simple keyword matching; in 2,086 searches (21%), more than 75% of the results were found by concept-based search alone. CONCLUSION Concept-based indexing of radiology journal figure captions achieved very high precision and recall, and significantly improved image retrieval.
Collapse
Affiliation(s)
- Charles E Kahn
- Division of Informatics, Department of Radiology, Medical College of Wisconsin, 9200 W. Wisconsin Ave., Milwaukee, WI 53226, USA.
| | | |
Collapse
|
15
|
Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Halpern EF, Dreyer KJ. Use of Radcube for extraction of finding trends in a large radiology practice. J Digit Imaging 2008; 22:629-40. [PMID: 18543033 DOI: 10.1007/s10278-008-9128-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Revised: 03/19/2008] [Accepted: 04/24/2008] [Indexed: 10/24/2022] Open
Abstract
The purpose of our study was to demonstrate the use of Natural Language Processing (Leximer), along with Online Analytic Processing, (NLP-OLAP), for extraction of finding trends in a large radiology practice. Prior studies have validated the Natural Language Processing (NLP) program, Leximer for classifying unstructured radiology reports based on the presence of positive radiology findings (F (POS)) and negative radiology findings (F (NEG)). The F (POS) included new relevant radiology findings and any change in status from prior imaging. Electronic radiology reports from 1995-2002 and data from analysis of these reports with NLP-Leximer were saved in a data warehouse and exported to a multidimensional structure called the Radcube. Various relational queries on the data in the Radcube were performed using OLAP technique. Thus, NLP-OLAP was applied to determine trends of F (POS) in different radiology exams for different patient and examination attributes. Pivot tables were exported from NLP-OLAP interface to Microsoft Excel for statistical analysis. Radcube allowed rapid and comprehensive analysis of F (POS) and F (NEG) trends in a large radiology report database. Trends of F (POS) were extracted for different patient attributes such as age groups, gender, clinical indications, diseases with ICD codes, patient types (inpatient, ambulatory), imaging characteristics such as imaging modalities, referring physicians, radiology subspecialties, and body regions. Data analysis showed substantial differences between F (POS) rates for different imaging modalities ranging from 23.1% (mammography, 49,163/212,906) to 85.8% (nuclear medicine, 93,852/109,374; p < 0.0001). In conclusion, NLP-OLAP can help in analysis of yield of different radiology exams from a large radiology report database.
Collapse
Affiliation(s)
- Pragya A Dang
- Department Of Radiology, Massachusetts General Hospital, 25 New Chardon St, Ste. 400E, Boston, MA 02114, USA
| | | | | | | | | | | | | |
Collapse
|
16
|
Natural Language Processing Using Online Analytic Processing for Assessing Recommendations in Radiology Reports. J Am Coll Radiol 2008; 5:197-204. [DOI: 10.1016/j.jacr.2007.09.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Indexed: 11/21/2022]
|
17
|
Bertaud V, Said W, Garcelon N, Marin F, Duvauferrier R. The value of using verbs in Medline searches. MEDICAL INFORMATICS AND THE INTERNET IN MEDICINE 2007; 32:117-22. [PMID: 17541861 DOI: 10.1080/14639230601140711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
New findings are continuously identified thanks to novel diagnostic procedures, among others in medical imaging. It would be useful to retrieve these new findings from literature. The aim of this work is to investigate if using verbs in MEDLINE queries can improve the retrieval of findings. Verbs used in the field of findings were selected: 'to show' (an examination shows a finding) and 'to confirm' (a finding confirms a diagnosis). For each of these verbs, semantically close verbs were researched on the WordNet website. Then, the extent to which adding these verbs to a query about various radiological pathologies can improve findings retrieval in Medline citations was studied. This method has been tested on two sets of MEDLINE citations regarding the diagnostic imaging of musculo-skeletal disorders. Using appropriate verbs in Medline queries enhances the precision from 53% to 61% and from 53% to 74%, respectively, in our first and second test set. A recall of 74% and 83% was reached in our two experiments. Using relevant verbs can be a rather simple way to improve the retrieval of findings related to diseases and diagnostic procedures from Medline citations.
Collapse
Affiliation(s)
- Valerie Bertaud
- EA 3888, School of Medicine, IFR 140, University of Rennes I, Rennes, France.
| | | | | | | | | |
Collapse
|
18
|
Huang Y, Lowe HJ. A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc 2007; 14:304-11. [PMID: 17329723 PMCID: PMC2244882 DOI: 10.1197/jamia.m2284] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Negation is common in clinical documents and is an important source of poor precision in automated indexing systems. Previous research has shown that negated terms may be difficult to identify if the words implying negations (negation signals) are more than a few words away from them. We describe a novel hybrid approach, combining regular expression matching with grammatical parsing, to address the above limitation in automatically detecting negations in clinical radiology reports. DESIGN Negations are classified based upon the syntactical categories of negation signals, and negation patterns, using regular expression matching. Negated terms are then located in parse trees using corresponding negation grammar. MEASUREMENTS A classification of negations and their corresponding syntactical and lexical patterns were developed through manual inspection of 30 radiology reports and validated on a set of 470 radiology reports. Another 120 radiology reports were randomly selected as the test set on which a modified Delphi design was used by four physicians to construct the gold standard. RESULTS In the test set of 120 reports, there were a total of 2,976 noun phrases, of which 287 were correctly identified as negated (true positives), along with 23 undetected true negations (false negatives) and 4 mistaken negations (false positives). The hybrid approach identified negated phrases with sensitivity of 92.6% (95% CI 90.9-93.4%), positive predictive value of 98.6% (95% CI 96.9-99.4%), and specificity of 99.87% (95% CI 99.7-99.9%). CONCLUSION This novel hybrid approach can accurately locate negated concepts in clinical radiology reports not only when in close proximity to, but also at a distance from, negation signals.
Collapse
Affiliation(s)
- Yang Huang
- Stanford Medical Informatics, Stanford, CA 94305-5479, USA.
| | | |
Collapse
|
19
|
Brown SH, Speroff T, Fielstein EM, Bauer BA, Wahner-Roedler DL, Greevy R, Elkin PL. eQuality: electronic quality assessment from narrative clinical reports. Mayo Clin Proc 2006; 81:1472-81. [PMID: 17120403 DOI: 10.4065/81.11.1472] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
OBJECTIVE To evaluate an electronic quality (eQuality) assessment tool for dictated disability examination records. METHODS We applied automated concept-based indexing techniques to automated quality screening of Department of Veterans Affairs spine disability examinations that had previously undergone gold standard quality review by human experts using established quality indicators. We developed automated quality screening rules and refined them iteratively on a training set of disability examination reports. We applied the resulting rules to a novel test set of spine disability examination reports. The initial data set was composed of all electronically available examination reports (N=125,576) finalized by the Veterans Health Administration between July and September 2001. RESULTS Sensitivity was 91% for the training set and 87% for the test set (P-.02). Specificity was 74% for the training set and 71% for the test set (P=.44). Human performance ranged from 4% to 6% higher (P<.001) than the eQuality tool in sensitivity and 13% to 16% higher in specificity (P<.001). In addition, the eQuality tool was equivalent or higher in sensitivity for 5 of 9 individual quality indicators. CONCLUSION The results demonstrate that a properly authored computer-based expert systems approach can perform quality measurement as well as human reviewers for many quality indicators. Although automation will likely always rely on expert guidance to be accurate and meaningful, eQuality is an important new method to assist clinicians in their efforts to practice safe and effective medicine.
Collapse
Affiliation(s)
- Steven H Brown
- Department of Veterans Affairs Compensation and Pension Examination Program, Nashville, Tenn, USA.
| | | | | | | | | | | | | |
Collapse
|
20
|
|
21
|
Sistrom C. The socioeconomic aspects of information technology for health care with emphasis on radiology. Acad Radiol 2005; 12:431-43. [PMID: 15831416 DOI: 10.1016/j.acra.2005.01.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2004] [Accepted: 01/10/2005] [Indexed: 11/23/2022]
Abstract
RATIONALE AND OBJECTIVES Information technology is the key to cost effective and error free medical care in the United States and the only problem is that there is not enough of it yet. During the past 15 years, billions of dollars have been spent on information technology for health care with very little benefit but significant adverse effects on patients, physicians, and nurses. The truth about health care information technology (HIT) probably lies somewhere between these extreme statements, representing technophile and skeptical views, respectively. MATERIALS AND METHODS There is no doubt that computer and communication hardware has reached a state of sophistication and availability in which any and all necessary information can be generated, stored, and distributed to health care workers in support of their patient care tasks. The barriers to rapid and widespread development and diffusion of cost effective and practically useful HIT are exclusively related to human factors. RESULTS This article explores some of the organizational, cultural, cognitive, and economic forces that interact to influence success of HIT initiatives in health care organizations. A key point to be recognized is that the intrinsically handcrafted nature of health care work combined with high degrees of complexity and contingency make it impossible to "computerize" with the same ease and completeness of other industries. The major thrust of the argument is that designers of information systems and health care informatics managers must meet needs of patients and care providers. The software they create and implement should promote, support, and enhance the existing processes of health care rather than seeking to dictate how direct care providers should do their work. CONCLUSIONS Instead of looking for "buy in" from physicians and nurses, the informatics community must return the authority over functional specification of patient care information systems to them--where it belonged in the first place. This same lesson about computer technology and organizational politics is also being learned in the business community, where executives are reclaiming responsibility for mission critical informatics decisions.
Collapse
Affiliation(s)
- Chris Sistrom
- Department of Radiology, P.O. Box 100374, University of Florida, Gainesville, FL 32602, USA.
| |
Collapse
|
22
|
Huang Y, Lowe HJ, Klein D, Cucina RJ. Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc 2005; 12:275-85. [PMID: 15684131 PMCID: PMC1090458 DOI: 10.1197/jamia.m1695] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing (NLP) and to investigate the effects of using the UMLS(R) Specialist Lexicon to improve noun phrase identification within clinical radiology documents. DESIGN The noun phrase identification (NPI) module is composed of a sentence boundary detector, a statistical natural language parser trained on a nonmedical domain, and a noun phrase (NP) tagger. The NPI module processed a set of 100 XML-represented clinical radiology reports in Health Level 7 (HL7)(R) Clinical Document Architecture (CDA)-compatible format. Computed output was compared with manual markups made by four physicians and one author for maximal (longest) NP and those made by one author for base (simple) NP, respectively. An extended lexicon of biomedical terms was created from the UMLS Specialist Lexicon and used to improve NPI performance. RESULTS The test set was 50 randomly selected reports. The sentence boundary detector achieved 99.0% precision and 98.6% recall. The overall maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist Lexicon and 82.1% and 84.6% after. The overall base NPI precision and recall were 88.2% and 86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false-positives by 31.1% and false-negatives by 34.3%. CONCLUSION The sentence boundary detector performs excellently. After the adaptation using the UMLS Specialist Lexicon, the statistical parser's NPI performance on radiology reports increased to levels comparable to the parser's native performance in its newswire training domain and to that reported by other researchers in the general nonmedical domain.
Collapse
Affiliation(s)
- Yang Huang
- Stanford Medical Informatics, MSOB X215, 251 Campus Drive, Stanford, CA 94305-5479, USA.
| | | | | | | |
Collapse
|
23
|
Dreyer KJ, Kalra MK, Maher MM, Hurier AM, Asfaw BA, Schultz T, Halpern EF, Thrall JH. Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 2004; 234:323-9. [PMID: 15591435 DOI: 10.1148/radiol.2341040049] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
PURPOSE To validate the accuracy of Lexicon Mediated Entropy Reduction (LEXIMER), a new information theory-based computer algorithm developed by the authors for independent analysis and classification of unstructured radiology reports based on the presence of clinically important findings (F(T), where (T) represents "true") and recommendations for subsequent action (R(T)). MATERIALS AND METHODS The study was approved by the Human Research Committee of the institutional review board. Consecutive de-identified radiology reports (n = 1059) comprising results of barium studies (n = 99), computed tomography (n = 107), mammography (n = 90), magnetic resonance imaging (n = 108), nuclear medicine (n = 99), positron emission tomography (n = 106), radiography (n = 212), ultrasonography (n = 131), and vascular procedures (n = 107) were independently analyzed by two radiologists and then with LEXIMER to categorize the reports into F(T) and F(T)0 (containing or not containing clinically important findings) categories and R(T) and R(T)0 (containing or not containing recommendations for subsequent action) categories. Accuracy, sensitivity, specificity, and positive and negative predictive values of LEXIMER for placing reports into F(T) and F(T)0 and R(T) and R(T)0 categories were assessed by using appropriate statistical tests. RESULTS There was strong interobserver concordance between the two radiologists for placing radiology reports into F(T) and R(T) categories (kappa = 0.9, P < .01). For the LEXIMER program, accuracy, sensitivity, specificity, and positive and negative predictive values, respectively, were 97.5% (95% confidence interval [CI]: 96.6%, 98.5%), 98.9% (95% CI: 97.9%, 99.6%), 94.9% (95% CI: 93.1%, 96.0%), 97.5% (95% CI: 96.6%, 98.0%), and 97.7% (95% CI: 95.8%, 98.8%) for placing radiology reports into F(T) and F(T)0 categories and 99.6% (95% CI: 99.2%, 99.9%), 98.2% (95% CI: 95.0%, 99.6%), 99.9% (95% CI: 99.4%, 99.99%), 99.4% (95% CI: 96.3%, 99.9%), and 99.7% (95% CI: 98.9%, 99.9%) for placing reports into R(T) and R(T)0 categories. CONCLUSION LEXIMER is an accurate automated engine for evaluating the percentage positivity of clinically important findings and rates of recommendation for subsequent action in unstructured radiology reports.
Collapse
Affiliation(s)
- Keith J Dreyer
- Division of Computing and Information Services, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, 100 Charles River Plaza, Suite 471, Cambridge St, Boston, MA 02114, USA.
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 2004; 11:392-402. [PMID: 15187068 PMCID: PMC516246 DOI: 10.1197/jamia.m1552] [Citation(s) in RCA: 301] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2004] [Accepted: 04/13/2004] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. METHODS An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. RESULTS Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. CONCLUSION Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
Collapse
Affiliation(s)
- Carol Friedman
- Department of Biomedical Informatics, Columbia University, 622 West 168 Street, VC-5, New York, NY 10032, USA.
| | | | | | | |
Collapse
|
25
|
Müller H, Michoux N, Bandon D, Geissbuhler A. A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform 2004; 73:1-23. [PMID: 15036075 DOI: 10.1016/j.ijmedinf.2003.11.024] [Citation(s) in RCA: 357] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2003] [Accepted: 11/13/2003] [Indexed: 11/20/2022]
Abstract
Content-based visual information retrieval (CBVIR) or content-based image retrieval (CBIR) has been one on the most vivid research areas in the field of computer vision over the last 10 years. The availability of large and steadily growing amounts of visual and multimedia data, and the development of the Internet underline the need to create thematic access methods that offer more than simple text-based queries or requests based on matching exact database fields. Many programs and tools have been developed to formulate and execute queries based on the visual or audio content and to help browsing large multimedia repositories. Still, no general breakthrough has been achieved with respect to large varied databases with documents of differing sorts and with varying characteristics. Answers to many questions with respect to speed, semantic descriptors or objective image interpretations are still unanswered. In the medical field, images, and especially digital images, are produced in ever-increasing quantities and used for diagnostics and therapy. The Radiology Department of the University Hospital of Geneva alone produced more than 12,000 images a day in 2002. The cardiology is currently the second largest producer of digital images, especially with videos of cardiac catheterization ( approximately 1800 exams per year containing almost 2000 images each). The total amount of cardiologic image data produced in the Geneva University Hospital was around 1 TB in 2002. Endoscopic videos can equally produce enormous amounts of data. With digital imaging and communications in medicine (DICOM), a standard for image communication has been set and patient information can be stored with the actual image(s), although still a few problems prevail with respect to the standardization. In several articles, content-based access to medical images for supporting clinical decision-making has been proposed that would ease the management of clinical data and scenarios for the integration of content-based access methods into picture archiving and communication systems (PACS) have been created. This article gives an overview of available literature in the field of content-based access to medical image data and on the technologies used in the field. Section 1 gives an introduction into generic content-based image retrieval and the technologies used. Section 2 explains the propositions for the use of image retrieval in medical practice and the various approaches. Example systems and application areas are described. Section 3 describes the techniques used in the implemented systems, their datasets and evaluations. Section 4 identifies possible clinical benefits of image retrieval systems in clinical practice as well as in research and education. New research directions are being defined that can prove to be useful. This article also identifies explanations to some of the outlined problems in the field as it looks like many propositions for systems are made from the medical domain and research prototypes are developed in computer science departments using medical datasets. Still, there are very few systems that seem to be used in clinical practice. It needs to be stated as well that the goal is not, in general, to replace text-based retrieval methods as they exist at the moment but to complement them with visual search tools.
Collapse
Affiliation(s)
- Henning Müller
- Service of Medical Informatics, University Hospital of Geneva, Rue Micheli-du-Crest 24, 1211 Geneva 14, Switzerland.
| | | | | | | |
Collapse
|
26
|
Leroy G, Chen H, Martinez JD. A shallow parser based on closed-class words to capture relations in biomedical text. J Biomed Inform 2003; 36:145-58. [PMID: 14615225 DOI: 10.1016/s1532-0464(03)00039-x] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract.
Collapse
Affiliation(s)
- Gondy Leroy
- Management Information Systems, The University of Arizona, McClelland Hall, Room 430, 1130 E. Helen St., Tucson, AZ 85721, USA.
| | | | | |
Collapse
|
27
|
Huang Y, Lowe HJ, Hersh WR. A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports. J Am Med Inform Assoc 2003; 10:580-7. [PMID: 12925544 PMCID: PMC264436 DOI: 10.1197/jamia.m1369] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Despite the advantages of structured data entry, much of the patient record is still stored as unstructured or semistructured narrative text. The issue of representing clinical document content remains problematic. The authors' prior work using an automated UMLS document indexing system has been encouraging but has been affected by the generally low indexing precision of such systems. In an effort to improve precision, the authors have developed a context-sensitive document indexing model to calculate the optimal subset of UMLS source vocabularies used to index each document section. This pilot study was performed to evaluate the utility of this indexing approach on a set of clinical radiology reports. DESIGN A set of clinical radiology reports that had been indexed manually using UMLS concept descriptors was indexed automatically by the SAPHIRE indexing engine. Using the data generated by this process the authors developed a system that simulated indexing, at the document section level, of the same document set using many permutations of a subset of the UMLS constituent vocabularies. MEASUREMENTS The precision and recall scores generated by simulated indexing for each permutation of two or three UMLS constituent vocabularies were determined. RESULTS While there was considerable variation in precision and recall values across the different subtypes of radiology reports, the overall effect of this indexing strategy using the best combination of two or three UMLS constituent vocabularies was an improvement in precision without significant impact of recall. CONCLUSION In this pilot study a contextual indexing strategy improved overall precision in a set of clinical radiology reports.
Collapse
Affiliation(s)
- Yang Huang
- Stanford Medical Informatics, The Office of Information Resources and Technology, Stanford University School of Medicine, California 94305, USA.
| | | | | |
Collapse
|
28
|
Friedman C, Kra P, Rzhetsky A. Two biomedical sublanguages: a description based on the theories of Zellig Harris. J Biomed Inform 2002; 35:222-35. [PMID: 12755517 DOI: 10.1016/s1532-0464(03)00012-1] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Natural language processing (NLP) systems have been developed to provide access to the tremendous body of data and knowledge that is available in the biomedical domain in the form of natural language text. These NLP systems are valuable because they can encode and amass the information in the text so that it can be used by other automated processes to improve patient care and our understanding of disease processes and treatments. Zellig Harris proposed a theory of sublanguage that laid the foundation for natural language processing in specialized domains. He hypothesized that the informational content and structure form a specialized language that can be delineated in the form of a sublanguage grammar. The grammar can then be used by a language processor to capture and encode the salient information and relations in text. In this paper, we briefly summarize his language and sublanguage theories. In addition, we summarize our prior research, which is associated with the sublanguage grammars we developed for two different biomedical domains. These grammars illustrate how Harris' theories provide a basis for the development of language processing systems in the biomedical domain. The two domains and their associated sublanguages discussed are: the clinical domain, where the text consists of patient reports, and the biomolecular domain, where the text consists of complete journal articles.
Collapse
Affiliation(s)
- Carol Friedman
- Department of Medical Informatics, Columbia University, VC5, Vanderbilt Building, 622 West 168th Street, New York, NY 10032-3720, USA.
| | | | | |
Collapse
|