Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wilcox AB, Hripcsak G. The role of domain knowledge in automating medical text report classification. J Am Med Inform Assoc 2003;10:330-8. [PMID: 12668687 PMCID: PMC181983 DOI: 10.1197/jamia.m1157] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

For:	Wilcox AB, Hripcsak G. The role of domain knowledge in automating medical text report classification. J Am Med Inform Assoc 2003;10:330-8. [PMID: 12668687 PMCID: PMC181983 DOI: 10.1197/jamia.m1157] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Demiray O, Gunes ED, Kulak E, Dogan E, Karaketir SG, Cifcili S, Akman M, Sakarya S. Classification of patients with chronic disease by activation level using machine learning methods. Health Care Manag Sci 2023;26:626-650. [PMID: 37824033 DOI: 10.1007/s10729-023-09653-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 09/04/2023] [Indexed: 10/13/2023]

Wu H, Wang M, Wu J, Francis F, Chang YH, Shavick A, Dong H, Poon MTC, Fitzpatrick N, Levine AP, Slater LT, Handy A, Karwath A, Gkoutos GV, Chelala C, Shah AD, Stewart R, Collier N, Alex B, Whiteley W, Sudlow C, Roberts A, Dobson RJB. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. NPJ Digit Med 2022;5:186. [PMID: 36544046 PMCID: PMC9770568 DOI: 10.1038/s41746-022-00730-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open

Affiliation(s)

Honghan Wu Institute of Health Informatics, University College London, London, UK.
Minhong Wang Institute of Health Informatics, University College London, London, UK
Jinge Wu Institute of Health Informatics, University College London, London, UK Usher Institute, University of Edinburgh, Edinburgh, UK
Farah Francis Usher Institute, University of Edinburgh, Edinburgh, UK
Yun-Hsuan Chang Institute of Health Informatics, University College London, London, UK
Alex Shavick Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
Hang Dong Usher Institute, University of Edinburgh, Edinburgh, UK Department of Computer Science, University of Oxford, Oxford, UK
Michael T C Poon Usher Institute, University of Edinburgh, Edinburgh, UK
Natalie Fitzpatrick Institute of Health Informatics, University College London, London, UK
Adam P Levine Research Department of Pathology, UCL Cancer Institute, University College London, London, UK
Luke T Slater Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
Alex Handy Institute of Health Informatics, University College London, London, UK University College London Hospitals NHS Trust, London, UK
Andreas Karwath Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
Georgios V Gkoutos Institute of Cancer and Genomics, University of Birmingham, Birmingham, UK
Claude Chelala Centre for Tumour Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
Anoop Dinesh Shah Institute of Health Informatics, University College London, London, UK
Robert Stewart Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience (IoPPN), King's College London, London, UK South London and Maudsley NHS Foundation Trust, London, UK
Nigel Collier Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge, UK
Beatrice Alex Edinburgh Futures Institute, University of Edinburgh, Edinburgh, UK
William Whiteley Usher Institute, University of Edinburgh, Edinburgh, UK
Cathie Sudlow Usher Institute, University of Edinburgh, Edinburgh, UK
Angus Roberts Department of Biostatistics & Health Informatics, King's College London, London, UK
Richard J B Dobson Institute of Health Informatics, University College London, London, UK Department of Biostatistics & Health Informatics, King's College London, London, UK

Collapse

Janjua ZH, Kerins D, O'Flynn B, Tedesco S. Knowledge-driven feature engineering to detect multiple symptoms using ambulatory blood pressure monitoring data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022;217:106638. [PMID: 35220199 DOI: 10.1016/j.cmpb.2022.106638] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 11/14/2021] [Accepted: 01/14/2022] [Indexed: 06/14/2023]

Abstract

BACKGROUND

Hypertension is a major health concern across the globe and needs to be properly diagnosed to so it can be treated and to mitigate for this critical health condition. In this context, ambulatory blood pressure monitoring is essential to provide for a proper diagnosis of hypertension, which may not be possible otherwise due to the white coat effect or masked hypertension. In this paper, the objective is to develop a model which incorporates expert's knowledge in the feature engineering process so as to accurately predict multiple medical conditions. As a case study, we have considered multiple symptoms related to hypertension and used an ambulatory blood pressure monitoring method to continuously acquire hypertension relevant data from a patient. The goal is to train a model with a minimum set of the most effective knowledge-driven features which are useful to detect multiple symptoms simultaneously using multi-class classification techniques.

METHOD

Artificial intelligence-based blood pressure monitoring techniques introduce a new dimension in the diagnosis of hypertension by enabling a continuous (24hours) analysis of systolic and diastolic blood pressure levels. In this work, we present a model that entails a knowledge-driven feature engineering method and implemented an ambulatory blood pressure monitoring system to diagnose multiple cardiac parameters and associated conditions simultaneously these include morning surge, circadian rhythm, and pulse pressure. The knowledge-driven features are extracted to improve the interpretability of the classification model and machine learning techniques (Random Forest, Naive Bayes, and KNN) were applied in a multi-label classification setup using RAkEL to classify multiple conditions simultaneously.

RESULTS

The results obtained (F 1 = 0.918) show that the Random forest technique has performed well for multilabel classification using knowledge-driven features. Our technique has also reduced the complexity of the model by reducing the number of features required to train a machine learning model.

CONCLUSION

Considering these results, we conclude that knowledge-driven feature engineering enhances the learning process by reducing the number of features given as input to the machine learning algorithm. The proposed feature engineering method considers expert's knowledge to develop better diagnosis models which are free from misleading data-driven noisy features in some situations. It is a white-box approach in which clinicians can under stand the importance of a feature while looking at its value.

Collapse

Li X, Yuan W, Peng D, Mei Q, Wang Y. When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification. BMC Med Inform Decis Mak 2021;21:377. [PMID: 35382811 PMCID: PMC8981604 DOI: 10.1186/s12911-022-01829-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/22/2022] [Indexed: 11/12/2022] Open

Abstract

Background

Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP.

Method

We conducted a learning curve analysis to study the behavior of BERT and baseline models as training data size increases. We observed the classification performance of these models on two disease diagnosis data sets, where some diseases are naturally rare and have very limited observations (fewer than 2 out of 10,000). The baselines included commonly used text classification models such as sparse and dense bag-of-words models, long short-term memory networks, and their variants that leveraged external knowledge. To obtain learning curves, we incremented the amount of training examples per disease from small to large, and measured the classification performance in macro-averaged \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{1}$$\end{document}F1 score.

Results

On the task of classifying all diseases, the learning curves of BERT were consistently above all baselines, significantly outperforming them across the spectrum of training data sizes. But under extreme situations where only one or two training documents per disease were available, BERT was outperformed by linear classifiers with carefully engineered bag-of-words features.

Conclusion

As long as the amount of training documents is not extremely few, fine-tuning a pretrained BERT model is a highly effective approach to health NLP tasks like disease classification. However, in extreme cases where each class has only one or two training documents and no more will be available, simple linear models using bag-of-words features shall be considered.

Collapse

Fong A, Scoulios N, Blumenthal HJ, Anderson RE. Using Machine Learning to Capture Quality Metrics from Natural Language: A Case Study of Diabetic Eye Exams. Methods Inf Med 2021;60:110-115. [PMID: 34598298 DOI: 10.1055/s-0041-1736311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Abstract

BACKGROUND AND OBJECTIVE

The prevalence of value-based payment models has led to an increased use of the electronic health record to capture quality measures, necessitating additional documentation requirements for providers.

METHODS

This case study uses text mining and natural language processing techniques to identify the timely completion of diabetic eye exams (DEEs) from 26,203 unique clinician notes for reporting as an electronic clinical quality measure (eCQM). Logistic regression and support vector machine (SVM) using unbalanced and balanced datasets, using the synthetic minority over-sampling technique (SMOTE) algorithm, were evaluated on precision, recall, sensitivity, and f1-score for classifying records positive for DEE. We then integrate a high precision DEE model to evaluate free-text clinical narratives from our clinical EHR system.

RESULTS

Logistic regression and SVM models had comparable f1-score and specificity metrics with models trained and validated with no oversampling favoring precision over recall. SVM with and without oversampling resulted in the best precision, 0.96, and recall, 0.85, respectively. These two SVM models were applied to the unannotated 31,585 text segments representing 24,823 unique records and 13,714 unique patients. The number of records classified as positive for DEE using the SVM models ranged from 667 to 8,935 (2.7-36% out of 24,823, respectively). Unique patients classified as positive for DEE ranged from 3.5 to 41.8% highlighting the potential utility of these models.

DISCUSSION

We believe the impact of oversampling on SVM model performance to be caused by the potential of overfitting of the SVM SMOTE model on the synthesized data and the data synthesis process. However, the specificities of SVM with and without SMOTE were comparable, suggesting both models were confident in their negative predictions. By prioritizing to implement the SVM model with higher precision over sensitivity or recall in the categorization of DEEs, we can provide a highly reliable pool of results that can be documented through automation, reducing the burden of secondary review. Although the focus of this work was on completed DEEs, this method could be applied to completing other necessary documentation by extracting information from natural language in clinician notes.

CONCLUSION

By enabling the capture of data for eCQMs from documentation generated by usual clinical practice, this work represents a case study in how such techniques can be leveraged to drive quality without increasing clinician work.

Collapse

Zhou J, Zhang Q, Li X. Fuzzy factorization machine. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.067] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med Inform Decis Mak 2019;19:71. [PMID: 30943960 PMCID: PMC6448186 DOI: 10.1186/s12911-019-0781-4] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Cook MJ, Yao L, Wang X. Facilitating accurate health provider directories using natural language processing. BMC Med Inform Decis Mak 2019;19:80. [PMID: 30943977 PMCID: PMC6448184 DOI: 10.1186/s12911-019-0788-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Elmessiry A, Cooper WO, Catron TF, Karrass J, Zhang Z, Singh MP. Triaging Patient Complaints: Monte Carlo Cross-Validation of Six Machine Learning Classifiers. JMIR Med Inform 2017;5:e19. [PMID: 28760726 PMCID: PMC5556254 DOI: 10.2196/medinform.7140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 03/15/2017] [Accepted: 05/30/2017] [Indexed: 12/02/2022] Open

Clark TJ, Mieloszyk RJ, Bhargava P. What Do George Clooney and Sarah Jessica Parker Have in Common? Big-data. Curr Probl Diagn Radiol 2017;46:171-172. [DOI: 10.1067/j.cpradiol.2017.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

EHR-based phenotyping: Bulk learning and evaluation. J Biomed Inform 2017;70:35-51. [PMID: 28410982 DOI: 10.1016/j.jbi.2017.04.009] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Revised: 03/09/2017] [Accepted: 04/10/2017] [Indexed: 01/29/2023]

Luo Y, Uzuner Ö, Szolovits P. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations. Brief Bioinform 2017;18:160-178. [PMID: 26851224 PMCID: PMC5221425 DOI: 10.1093/bib/bbw001] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 11/29/2015] [Indexed: 01/18/2023] Open

Kim YM, Delen D. Medical informatics research trend analysis: A text mining approach. Health Informatics J 2016;24:432-452. [PMID: 30376768 DOI: 10.1177/1460458216678443] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Kelahan LC, Fong A, Ratwani RM, Filice RW. Call Case Dashboard: Tracking R1 Exposure to High-Acuity Cases Using Natural Language Processing. J Am Coll Radiol 2016;13:988-91. [PMID: 27162046 DOI: 10.1016/j.jacr.2016.03.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 03/06/2016] [Accepted: 03/07/2016] [Indexed: 11/28/2022]

Hruby GW, Matsoukas K, Cimino JJ, Weng C. Facilitating biomedical researchers' interrogation of electronic health record data: Ideas from outside of biomedical informatics. J Biomed Inform 2016;60:376-84. [PMID: 26972838 PMCID: PMC4837021 DOI: 10.1016/j.jbi.2016.03.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2014] [Revised: 03/03/2016] [Accepted: 03/04/2016] [Indexed: 12/19/2022]

Hripcsak G, Albers DJ. Correlating electronic health record concepts with healthcare process events. J Am Med Inform Assoc 2013;20:e311-8. [PMID: 23975625 PMCID: PMC3861922 DOI: 10.1136/amiajnl-2013-001922] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Using the electronic medical record to identify community-acquired pneumonia: toward a replicable automated strategy. PLoS One 2013;8:e70944. [PMID: 23967138 PMCID: PMC3742728 DOI: 10.1371/journal.pone.0070944] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Accepted: 06/24/2013] [Indexed: 01/19/2023] Open

Abstract

Background

Timely information about disease severity can be central to the detection and management of outbreaks of acute respiratory infections (ARI), including influenza. We asked if two resources: 1) free text, and 2) structured data from an electronic medical record (EMR) could complement each other to identify patients with pneumonia, an ARI severity landmark.

Methods

A manual EMR review of 2747 outpatient ARI visits with associated chest imaging identified x-ray reports that could support the diagnosis of pneumonia (kappa score = 0.88 (95% CI 0.82∶0.93)), along with attendant cases with Possible Pneumonia (adds either cough, sputum, fever/chills/night sweats, dyspnea or pleuritic chest pain) or with Pneumonia-in-Plan (adds pneumonia stated as a likely diagnosis by the provider). The x-ray reports served as a reference to develop a text classifier using machine-learning software that did not require custom coding. To identify pneumonia cases, the classifier was combined with EMR-based structured data and with text analyses aimed at ARI symptoms in clinical notes.

Results

370 reference cases with Possible Pneumonia and 250 with Pneumonia-in-Plan were identified. The x-ray report text classifier increased the positive predictive value of otherwise identical EMR-based case-detection algorithms by 20–70%, while retaining sensitivities of 58–75%. These performance gains were independent of the case definitions and of whether patients were admitted to the hospital or sent home. Text analyses seeking ARI symptoms in clinical notes did not add further value.

Conclusion

Specialized software development is not required for automated text analyses to help identify pneumonia patients. These results begin to map an efficient, replicable strategy through which EMR data can be used to stratify ARI severity.

Collapse

Lucero RJ, Bakken S. Practice-Based Knowledge Discovery for Comparative Effectiveness Research: An Organizing Framework. Can J Nurs Res 2013;45:98-112. [DOI: 10.1177/084456211304500109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2012;20:117-21. [PMID: 22955496 PMCID: PMC3555337 DOI: 10.1136/amiajnl-2012-001145] [Citation(s) in RCA: 407] [Impact Index Per Article: 33.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Mishra NK, Son RY, Arnzen JJ. Towards automatic diabetes case detection and ABCS protocol compliance assessment. Clin Med Res 2012;10:106-21. [PMID: 22634542 PMCID: PMC3421414 DOI: 10.3121/cmr.2012.1047] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract

OBJECTIVE

According to the American Diabetes Association, the implementation of the standards of care for diabetes has been suboptimal in most clinical settings. Diabetes is a disease that had a total estimated cost of $174 billion in 2007 for an estimated diabetes-affected population of 17.5 million in the United States. With the advent of electronic medical records (EMR), tools to analyze data residing in the EMR for healthcare surveillance can help reduce the burdens experienced today. This study was primarily designed to evaluate the efficacy of employing clinical natural language processing to analyze discharge summaries for evidence indicating a presence of diabetes, as well as to assess diabetes protocol compliance and high risk factors.

METHODS

Three sets of algorithms were developed to analyze discharge summaries for: (1) identification of diabetes, (2) protocol compliance, and (3) identification of high risk factors. The algorithms utilize a common natural language processing framework that extracts relevant discourse evidence from the medical text. Evidence utilized in one or more of the algorithms include assertion of the disease and associated findings in medical text, as well as numerical clinical measurements and prescribed medications.

RESULTS

The diabetes classifier was successful at classifying reports for the presence and absence of diabetes. Evaluated against 444 discharge summaries, the classifier's performance included macro and micro F-scores of 0.9698 and 0.9865, respectively. Furthermore, the protocol compliance and high risk factor classifiers showed promising results, with most F-measures exceeding 0.9.

CONCLUSIONS

The presented approach accurately identified diabetes in medical discharge summaries and showed promise with regards to assessment of protocol compliance and high risk factors. Utilizing free-text analytic techniques on medical text can complement clinical-public health decision support by identifying cases and high risk factors.

Collapse

Lakhani P, Kim W, Langlotz CP. Automated detection of critical results in radiology reports. J Digit Imaging 2012;25:30-6. [PMID: 22038514 DOI: 10.1007/s10278-011-9426-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022] Open

Alemi F, Torii M, Atherton MJ, Pattie DC, Cox KL. Bayesian Processing of Context-Dependent Text. Med Decis Making 2012;32:E1-9. [DOI: 10.1177/0272989x12439753] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Garla V, Lo Re V, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, Justice A, Brandt C. The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc 2011;18:614-20. [PMID: 21622934 PMCID: PMC3168305 DOI: 10.1136/amiajnl-2011-000093] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 04/22/2011] [Indexed: 11/04/2022] Open

Lakhani P, Langlotz CP. Documentation of nonroutine communications of critical or significant radiology results: a multiyear experience at a tertiary hospital. J Am Coll Radiol 2011;7:782-90. [PMID: 20889108 DOI: 10.1016/j.jacr.2010.05.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2010] [Accepted: 05/21/2010] [Indexed: 11/18/2022]

Abstract

PURPOSE

The aim of this study was to determine the frequency of radiology reports that contain nonroutine communications of results and categorize the urgency of such communications.

METHODS

A rule-based text-query algorithm was applied to a database of 2.3 million radiology reports, which has an accuracy of 98% for classifying reports containing documentation of communications. The frequency of such communications by year, modality, and study type was then determined. Finally, 200 random reports selected by the algorithm were analyzed, and reports containing critical results were categorized according to ascending levels of urgency.

RESULTS

Critical or noncritical results to health care providers were present in 5.09% of radiology reports (116,184 of 2,282,923). For common modalities, documentation of communications were most frequent in CT (14.34% [57,537 of 402,060]), followed by ultrasound (9.55% [17,814 of 186,626]), MRI (5.50% [13,697 of 248,833]), and chest radiography (1.57% [19,840 of 1,262,925]). From 1997 to 2005, there was an increase in reports containing such communications (3.04% in 1997, 6.82% in 2005). More reports contained nonroutine communications in single-view chest radiography (1.29% [5,533 of 428,377]) than frontal/lateral chest radiography (0.80% [1,815 of 226,837]), diagnostic mammography (9.42% [3,662 of 38,877]) than screening mammography (0.47% [289 of 61,114]), and head CT (26.21% [20,963 of 79,985]) than abdominal CT (15.05% [19,871 of 132,034]) or chest CT (5.33% [3,017 of 56,613]). All of these results were statistically significant (P < .00001). Of 200 random radiology reports indicating nonroutine communications, 155 (78%) had critical and 45 (22%) had noncritical results. Regarding level of urgency, 94 of 155 reports (60.6%) with critical results were categorized as high urgency, 31 (20.0%) as low urgency, 26 (16.8%) as medium urgency, and 4 (2.6%) as discrepant.

CONCLUSIONS

From 1997 to 2005, there was a significant increase in documentation of nonroutine communications, which may be due to increasing compliance with ACR guidelines. Most reports with nonroutine communications contain critical findings.

Collapse

Duchrow T, Shtatland T, Guettler D, Pivovarov M, Kramer S, Weissleder R. Enhancing navigation in biomedical databases by community voting and database-driven text classification. BMC Bioinformatics 2009;10:317. [PMID: 19799796 PMCID: PMC2768718 DOI: 10.1186/1471-2105-10-317] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 10/03/2009] [Indexed: 11/29/2022] Open

Abstract

Background

The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.

Results

Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly.

Conclusion

Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.

The system can be accessed at .

Collapse

Farkas R, Szarvas G, Hegedus I, Almási A, Vincze V, Ormándi R, Busa-Fekete R. Semi-automated construction of decision rules to predict morbidities from clinical texts. J Am Med Inform Assoc 2009;16:601-5. [PMID: 19390097 PMCID: PMC2705267 DOI: 10.1197/jamia.m3097] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2008] [Accepted: 04/07/2009] [Indexed: 11/10/2022] Open

Mishra NK, Cummo DM, Arnzen JJ, Bonander J. A rule-based approach for identifying obesity and its comorbidities in medical discharge summaries. J Am Med Inform Assoc 2009;16:576-9. [PMID: 19390102 DOI: 10.1197/jamia.m3086] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Halpern EF, Dreyer KJ. Use of Radcube for extraction of finding trends in a large radiology practice. J Digit Imaging 2008;22:629-40. [PMID: 18543033 DOI: 10.1007/s10278-008-9128-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Revised: 03/19/2008] [Accepted: 04/24/2008] [Indexed: 10/24/2022] Open

Bashyam V, Morioka C, El-Saden S, Bui AAT, Taira RK. Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS. INDIAN JOURNAL OF MEDICAL INFORMATICS 2007;2:2. [PMID: 36284749 PMCID: PMC9592058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Zhou L, Tao Y, Cimino JJ, Chen ES, Liu H, Lussier YA, Hripcsak G, Friedman C. Terminology model discovery using natural language processing and visualization techniques. J Biomed Inform 2006;39:626-36. [DOI: 10.1016/j.jbi.2005.10.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2005] [Revised: 10/27/2005] [Accepted: 10/29/2005] [Indexed: 11/26/2022]

Pakhomov SVS, Buntrock JD, Chute CG. Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J Am Med Inform Assoc 2006;13:516-25. [PMID: 16799125 PMCID: PMC1561792 DOI: 10.1197/jamia.m2077] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2006;37:296-304. [PMID: 16620802 DOI: 10.1016/j.compbiomed.2006.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2005] [Revised: 02/15/2006] [Accepted: 02/15/2006] [Indexed: 10/24/2022]

McCowan I, Moore D, Fry MJ. Classification of cancer stage from free-text histology reports. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2006;2006:5153-5156. [PMID: 17945879 DOI: 10.1109/iembs.2006.259563] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Denny JC, Smithers JD, Armstrong B, Spickard A. "Where do we teach what?" Finding broad concepts in the medical school curriculum. J Gen Intern Med 2005;20:943-6. [PMID: 16191143 PMCID: PMC1490241 DOI: 10.1111/j.1525-1497.2005.0203.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Zhou L, Melton GB, Parsons S, Hripcsak G. A temporal constraint structure for extracting temporal information from clinical narrative. J Biomed Inform 2005;39:424-39. [PMID: 16169282 DOI: 10.1016/j.jbi.2005.07.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2005] [Revised: 07/13/2005] [Accepted: 07/22/2005] [Indexed: 11/20/2022]

Abstract

INTRODUCTION

Time is an essential element in medical data and knowledge which is intrinsically connected with medical reasoning tasks. Many temporal reasoning mechanisms use constraint-based approaches. Our previous research demonstrates that electronic discharge summaries can be modeled as a simple temporal problem (STP).

OBJECTIVE

To categorize temporal expressions in clinical narrative text and to propose and evaluate a temporal constraint structure designed to model this temporal information and to support the implementation of higher-level temporal reasoning.

METHODS

A corpus of 200 random discharge summaries across 18 years was applied in a grounded approach to construct a representation structure. Then, a subset of 100 discharge summaries was used to tally the frequency of each identified time category and the percentage of temporal expressions modeled by the structure. Fifty random expressions were used to assess inter-coder agreement.

RESULTS

Six main categories of temporal expressions were identified. The constructed temporal constraint structure models time over which an event occurs by constraining its starting time and ending time. It includes a set of fields for the endpoint(s) of an event, anchor information, qualitative and metric temporal relations, and vagueness. In 100 discharge summaries, 1961 of 2022 (97%) identified temporal expressions were effectively modeled using the temporal constraint structure. Inter-coder evaluation of 50 expressions yielded exact match in 90%, partial match with trivial differences in 8%, partial match with large differences in 2%, and total mismatch in 0%.

CONCLUSION

The proposed temporal constraint structure embodies a sufficient and successful implementation method to encode the diversity of temporal information in discharge summaries. Placing data within the structure provides a foundational representation upon which further reasoning, including the addition of domain knowledge and other post-processing to implement an STP, can be accomplished.

Collapse

Hazlehurst B, Frost HR, Sittig DF, Stevens VJ. MediClass: A system for detecting and classifying encounter-based clinical events in any electronic medical record. J Am Med Inform Assoc 2005;12:517-29. [PMID: 15905485 PMCID: PMC1205600 DOI: 10.1197/jamia.m1771] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open