1
|
Mottin L, Goldman JP, Jäggli C, Achermann R, Gobeill J, Knafou J, Ehrsam J, Wicky A, Gérard CL, Schwenk T, Charrier M, Tsantoulis P, Lovis C, Leichtle A, Kiessling MK, Michielin O, Pradervand S, Foufi V, Ruch P. Multilingual RECIST classification of radiology reports using supervised learning. Front Digit Health 2023; 5:1195017. [PMID: 37388252 PMCID: PMC10303934 DOI: 10.3389/fdgth.2023.1195017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/05/2023] [Indexed: 07/01/2023] Open
Abstract
Objectives The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.
Collapse
Affiliation(s)
- Luc Mottin
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jean-Philippe Goldman
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Christoph Jäggli
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Rita Achermann
- Department of Radiology, Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Julien Gobeill
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Knafou
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Ehrsam
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexandre Wicky
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Camille L. Gérard
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Tanja Schwenk
- Department of Oncology, Kantonsspital Aarau, Aarau, Switzerland
| | - Mélinda Charrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Petros Tsantoulis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Alexander Leichtle
- Inselspital – Bern University Hospital and University of Bern, Bern, Switzerland
| | - Michael K. Kiessling
- Department of Medical Oncology and Hematology, University Hospital Zurich, Zurich, Switzerland
| | - Olivier Michielin
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Sylvain Pradervand
- Precision Oncology Center, Oncology Department, Centre Hospitalier Universitaire Vaudois – CHUV, Lausanne, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
| | - Patrick Ruch
- HES-SO\HEG Genève, Information Sciences, Geneva, Switzerland
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
2
|
Ramos J, Kockelkorn TTJP, Ramos I, Ramos R, Grutters J, Viergever MA, van Ginneken B, Campilho A. Content-Based Image Retrieval by Metric Learning From Radiology Reports: Application to Interstitial Lung Diseases. IEEE J Biomed Health Inform 2016; 20:281-92. [DOI: 10.1109/jbhi.2014.2375491] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
3
|
Simpson MS, You D, Rahman MM, Antani SK, Thoma GR, Demner-Fushman D. Towards the creation of a visual ontology of biomedical imaging entities. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:866-875. [PMID: 23304361 PMCID: PMC3540530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Image content is frequently the target of biomedical information extraction systems. However, the meaning of this content cannot be easily understood without some associated text. In order to improve the integration of textual and visual information, we are developing a visual ontology for biomedical image retrieval. Our visual ontology maps the appearance of image regions to concepts in an existing textual ontology, thereby inheriting relationships among the visual entities. Such a resource creates a bridge between the visual characteristics of important image regions and their semantic interpretation. We automatically populate our visual ontology by pairing image regions with their associated descriptions. To demonstrate the usefulness of this resource, we have developed a classification method that automatically labels image regions with appropriate concepts based solely on their appearance. Our results for thoracic imaging terms show that our methods are promising first steps towards the creation of a biomedical visual ontology.
Collapse
Affiliation(s)
- Matthew S Simpson
- Lister Hill National Center for Biomedical Communications, U. S. National Library of Medicine, Bethesda, MD, USA
| | | | | | | | | | | |
Collapse
|
4
|
Kahn CE, Kalpathy-Cramer J, Lam CA, Eldredge CE. Accurate determination of imaging modality using an ensemble of text- and image-based classifiers. J Digit Imaging 2012; 25:37-42. [PMID: 21748413 DOI: 10.1007/s10278-011-9399-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Imaging modality can aid retrieval of medical images for clinical practice, research, and education. We evaluated whether an ensemble classifier could outperform its constituent individual classifiers in determining the modality of figures from radiology journals. Seventeen automated classifiers analyzed 77,495 images from two radiology journals. Each classifier assigned one of eight imaging modalities--computed tomography, graphic, magnetic resonance imaging, nuclear medicine, positron emission tomography, photograph, ultrasound, or radiograph-to each image based on visual and/or textual information. Three physicians determined the modality of 5,000 randomly selected images as a reference standard. A "Simple Vote" ensemble classifier assigned each image to the modality that received the greatest number of individual classifiers' votes. A "Weighted Vote" classifier weighted each individual classifier's vote based on performance over a training set. For each image, this classifier's output was the imaging modality that received the greatest weighted vote score. We measured precision, recall, and F score (the harmonic mean of precision and recall) for each classifier. Individual classifiers' F scores ranged from 0.184 to 0.892. The simple vote and weighted vote classifiers correctly assigned 4,565 images (F score, 0.913; 95% confidence interval, 0.905-0.921) and 4,672 images (F score, 0.934; 95% confidence interval, 0.927-0.941), respectively. The weighted vote classifier performed significantly better than all individual classifiers. An ensemble classifier correctly determined the imaging modality of 93% of figures in our sample. The imaging modality of figures published in radiology journals can be determined with high accuracy, which will improve systems for image retrieval.
Collapse
Affiliation(s)
- Charles E Kahn
- Department of Radiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| | | | | | | |
Collapse
|
5
|
Welter P, Deserno TM, Fischer B, Günther RW, Spreckelsen C. Towards case-based medical learning in radiological decision making using content-based image retrieval. BMC Med Inform Decis Mak 2011; 11:68. [PMID: 22032775 PMCID: PMC3217894 DOI: 10.1186/1472-6947-11-68] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Accepted: 10/27/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Radiologists' training is based on intensive practice and can be improved with the use of diagnostic training systems. However, existing systems typically require laboriously prepared training cases and lack integration into the clinical environment with a proper learning scenario. Consequently, diagnostic training systems advancing decision-making skills are not well established in radiological education. METHODS We investigated didactic concepts and appraised methods appropriate to the radiology domain, as follows: (i) Adult learning theories stress the importance of work-related practice gained in a team of problem-solvers; (ii) Case-based reasoning (CBR) parallels the human problem-solving process; (iii) Content-based image retrieval (CBIR) can be useful for computer-aided diagnosis (CAD). To overcome the known drawbacks of existing learning systems, we developed the concept of image-based case retrieval for radiological education (IBCR-RE). The IBCR-RE diagnostic training is embedded into a didactic framework based on the Seven Jump approach, which is well established in problem-based learning (PBL). In order to provide a learning environment that is as similar as possible to radiological practice, we have analysed the radiological workflow and environment. RESULTS We mapped the IBCR-RE diagnostic training approach into the Image Retrieval in Medical Applications (IRMA) framework, resulting in the proposed concept of the IRMAdiag training application. IRMAdiag makes use of the modular structure of IRMA and comprises (i) the IRMA core, i.e., the IRMA CBIR engine; and (ii) the IRMAcon viewer. We propose embedding IRMAdiag into hospital information technology (IT) infrastructure using the standard protocols Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7). Furthermore, we present a case description and a scheme of planned evaluations to comprehensively assess the system. CONCLUSIONS The IBCR-RE paradigm incorporates a novel combination of essential aspects of diagnostic learning in radiology: (i) Provision of work-relevant experiences in a training environment integrated into the radiologist's working context; (ii) Up-to-date training cases that do not require cumbersome preparation because they are provided by routinely generated electronic medical records; (iii) Support of the way adults learn while remaining suitable for the patient- and problem-oriented nature of medicine. Future work will address unanswered questions to complete the implementation of the IRMAdiag trainer.
Collapse
Affiliation(s)
- Petra Welter
- Department of Medical Informatics, RWTH Aachen University of Technology, Germany.
| | | | | | | | | |
Collapse
|
6
|
Biomedical imaging modality classification using combined visual features and textual terms. Int J Biomed Imaging 2011; 2011:241396. [PMID: 21912534 PMCID: PMC3170788 DOI: 10.1155/2011/241396] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Revised: 05/12/2011] [Accepted: 07/06/2011] [Indexed: 11/18/2022] Open
Abstract
We describe an approach for the automatic modality classification in medical image retrieval task of the 2010 CLEF cross-language image retrieval campaign (ImageCLEF). This paper is focused on the process of feature
extraction from medical images and fuses the different extracted visual features and textual feature for modality classification. To extract visual features from the images, we used histogram descriptor of edge, gray, or color intensity and block-based variation as global features and SIFT histogram as local feature. For textual feature of image representation, the binary histogram of some predefined vocabulary words from image captions is used. Then, we combine the different features using normalized kernel functions for SVM classification. Furthermore, for some easy misclassified modality pairs such as CT and MR or PET and NM modalities, a local classifier is used for distinguishing samples in the pair modality to improve performance. The proposed strategy is evaluated with the provided modality dataset by ImageCLEF 2010.
Collapse
|
7
|
|
8
|
Demner-Fushman D, Antani S, Simpson M, Thoma GR. Annotation and retrieval of clinically relevant images. Int J Med Inform 2009; 78:e59-67. [PMID: 19546026 DOI: 10.1016/j.ijmedinf.2009.05.003] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2008] [Revised: 05/07/2009] [Accepted: 05/22/2009] [Indexed: 11/29/2022]
Abstract
PURPOSE Medical images are a significant information source for clinical decision-making. Currently available information retrieval and decision support systems rely primarily on the text of scientific publications to find evidence in support of clinical information needs. The images and illustrations are available only within the full text of a scientific publication and do not directly contribute evidence to such systems. Our first goal is to explore whether image features facilitate finding relevant images that appear in publications. Our second goal is to find promising approaches for providing clinical evidence at the point of service, leveraging information contained in the text and images. METHODS We studied two approaches to finding illustrative evidence: a supervised machine-learning approach, in which images are classified as being relevant to an information need or not, and a pipeline information retrieval approach, in which images were retrieved using associated text and then re-ranked using content-based image retrieval (CBIR) techniques. RESULTS Our information retrieval approach did not benefit from combining textual and image information. However, given sufficient training data for the machine-learning approach, we achieved 56% average precision at 94% recall using textual features, and 27% average precision at 86% recall using image features. Combining these classifiers resulted in improvement up to 81% precision at 96% recall (74% recall at 85% precision, on average) for the requests with over 180 positive training examples. CONCLUSIONS Our supervised machine-learning methods that combine information from image and text are capable of achieving image annotation and retrieval accuracy acceptable for providing clinical evidence, given sufficient training data.
Collapse
Affiliation(s)
- Dina Demner-Fushman
- Communications Engineering Branch, Lister Hill National Center for Biomedical Communications, US National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|
9
|
Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A. Building a semantically annotated corpus of clinical texts. J Biomed Inform 2009; 42:950-66. [PMID: 19535011 DOI: 10.1016/j.jbi.2008.12.013] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2008] [Revised: 10/02/2008] [Accepted: 12/22/2008] [Indexed: 11/16/2022]
Abstract
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.
Collapse
Affiliation(s)
- Angus Roberts
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, UK.
| | | | | | | | | | | | | |
Collapse
|
10
|
Kalpathy-Cramer J, Bedrick S, Hatt W, Hersh W. Multimodal Medical Image Retrieval OHSU at ImageCLEF 2008. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/978-3-642-04447-2_96] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
|
11
|
Névéol A, Deserno TM, Darmoni SJ, Güld MO, Aronson AR. Natural Language Processing Versus Content-Based Image Analysis for Medical Document Retrieval. ACTA ACUST UNITED AC 2009; 60:123-134. [PMID: 19633735 DOI: 10.1002/asi.20955] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
One of the most significant recent advances in health information systems has been the shift from paper to electronic documents. While research on automatic text and image processing has taken separate paths, there is a growing need for joint efforts, particularly for electronic health records and biomedical literature databases. This work aims at comparing text-based versus image-based access to multimodal medical documents using state-of-the-art methods of processing text and image components. A collection of 180 medical documents containing an image accompanied by a short text describing it was divided into training and test sets. Content-based image analysis and natural language processing techniques are applied individually and combined for multimodal document analysis. The evaluation consists of an indexing task and a retrieval task based on the "gold standard" codes manually assigned to corpus documents. The performance of text-based and image-based access, as well as combined document features, is compared. Image analysis proves more adequate for both the indexing and retrieval of the images. In the indexing task, multimodal analysis outperforms both independent image and text analysis. This experiment shows that text describing images can be usefully analyzed in the framework of a hybrid text/image retrieval system.
Collapse
Affiliation(s)
- Aurélie Névéol
- U.S. National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894. E-mail:
| | | | | | | | | |
Collapse
|
12
|
Müller H, Kalpathy-Cramer J, Kahn CE, Hatt W, Bedrick S, Hersh W. Overview of the ImageCLEFmed 2008 Medical Image Retrieval Task. LECTURE NOTES IN COMPUTER SCIENCE 2009. [DOI: 10.1007/978-3-642-04447-2_63] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
13
|
Hersh W. Ubiquitous but unfinished: grand challenges for information retrieval. Health Info Libr J 2008; 25 Suppl 1:90-3. [PMID: 19090855 DOI: 10.1111/j.1471-1842.2008.00815.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- William Hersh
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA.
| |
Collapse
|
14
|
Abstract
The multilingual search engine ARRS GoldMiner Global was created to facilitate broad international access to a richly indexed collection of more than 200,000 radiologic images. Images are indexed according to key-words and medical concepts that appear in the unstructured text of their English-language image captions. GoldMiner Global exploits the Unicode standard, which allows the accurate representation of characters and ideographs from virtually any language and which supports both left-to-right and right-to-left text directions. The user interface supports queries in Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Russian, or Spanish. GoldMiner Global incorporates an interface to the United States National Library of Medicine that translates queries into English-language Medical Subject Headings (MeSH) terms. The translated MeSH terms are then used to search the image index and retrieve relevant images. Explanatory text, pull-down menu choices, and navigational guides are displayed in the selected language; search results are displayed in English. GoldMiner Global is freely available on the World Wide Web.
Collapse
Affiliation(s)
- Charles E Kahn
- Department of Radiology, Medical College of Wisconsin, 9200 W Wisconsin Ave, Milwaukee, WI 53226, USA.
| |
Collapse
|
15
|
Kalpathy-Cramer J, Hersh W. Effectiveness of Global Features for Automatic Medical Image Classification and Retrieval - the experiences of OHSU at ImageCLEFmed. Pattern Recognit Lett 2008; 29:2032-2038. [PMID: 19884953 DOI: 10.1016/j.patrec.2008.05.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In 2006 and 2007, Oregon Health & Science University (OHSU) participated in the automatic image annotation task for medical images at ImageCLEF, an annual international benchmarking event that is part of the Cross Language Evaluation Forum (CLEF). The goal of the automatic annotation task was to classify 1000 test images based on the Image Retrieval in Medical Applications (IRMA) code, given a set of 10,000 training images. There were 116 distinct classes in 2006 and 2007. We evaluated the efficacy of a variety of primarily global features for this classification task. These included features based on histograms, gray level correlation matrices and the gist technique. A multitude of classifiers including k-nearest neighbors, two-level neural networks, support vector machines, and maximum likelihood classifiers were evaluated. Our official error rates for the 1000 test images were 26% in 2006 using the flat classification structure. The error count in 2007 was 67.8 using the hierarchical classification error computation based on the IRMA code in 2007. Confusion matrices as well as clustering experiments were used to identify visually similar classes. The use of the IRMA code did not help us in the classification task as the semantic hierarchy of the IRMA classes did not correspond well with the hierarchy based on clustering of image features that we used. Our most frequent misclassification errors were along the view axis. Subsequent experiments based on a two-stage classification system decreased our error rate to 19.8% for the 2006 dataset and our error count to 55.4 for the 2007 data.
Collapse
Affiliation(s)
- Jayashree Kalpathy-Cramer
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR
| | | |
Collapse
|
16
|
The ImageCLEFmed medical image retrieval task test collection. J Digit Imaging 2008; 22:648-55. [PMID: 18769965 DOI: 10.1007/s10278-008-9154-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2008] [Revised: 07/20/2008] [Accepted: 07/27/2008] [Indexed: 10/21/2022] Open
Abstract
A growing number of clinicians, educators, researchers, and others use digital images in their work and search for them via image retrieval systems. Yet, this area of information retrieval is much less understood and developed than searching for text-based content, such as biomedical literature and its derivations. The goal of the ImageCLEF medical image retrieval task (ImageCLEFmed) is to improve understanding and system capability in search for medical images. In this paper, we describe the development and use of a medical image test collection designed to facilitate research with image retrieval systems and their users. We also provide baseline results with the new collection and describe them in the context of past research with portions of the collection.
Collapse
|
17
|
Medical Image Retrieval and Automatic Annotation: OHSU at ImageCLEF 2007. LECTURE NOTES IN COMPUTER SCIENCE 2008. [DOI: 10.1007/978-3-540-85760-0_79] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
18
|
Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2007; 15:14-24. [PMID: 17947624 DOI: 10.1197/jamia.m2408] [Citation(s) in RCA: 204] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.
Collapse
Affiliation(s)
- Ozlem Uzuner
- University at Albany, SUNY, Draper 114A, 135 Western Avenue, Albany, NY 12222, USA.
| | | | | | | |
Collapse
|
19
|
Kahn CE. Effective metadata discovery for dynamic filtering of queries to a radiology image search engine. J Digit Imaging 2007; 21:269-73. [PMID: 17558534 PMCID: PMC3043832 DOI: 10.1007/s10278-007-9036-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Revised: 03/12/2007] [Accepted: 03/29/2007] [Indexed: 10/23/2022] Open
Abstract
We sought to demonstrate the effectiveness of techniques to index radiology images using metadata discovered in their free-text figure captions. The ARRS GoldMiner image library incorporated 94,256 figures from 11,712 articles published in peer-reviewed online radiology journals. Algorithms were developed to discover metadata--age, sex, and imaging modality--from the figures' free-text captions. Age was recorded in years, and was classified as infant (less than 2 years), child (2 to 17 years), or adult (18+ years). Each figure was assigned to one of eight imaging modalities. A random sample of 1,000 images was examined to measure accuracy of the metadata. The patient's age was identified in 58,994 cases (63%), and the patient's sex was identified in 58,427 cases (62%). An imaging modality was assigned to 80,402 (85%) of the figures. Based on the 1,000 sampled cases, recall values for age, sex, and imaging modality were 97.2%, 99.7%, and 86.4%, respectively. Precision values for age, sex, and imaging modality were 100%, 100%, and 97.2%, respectively. Automated techniques can accurately discover age, sex, and imaging modality metadata from captions of figures published in radiology journals. The metadata can be used to dynamically filter queries for an image search engine.
Collapse
Affiliation(s)
- Charles E Kahn
- Division of Informatics, Department of Radiology, Medical College of Wisconsin, 9200 W. Wisconsin Ave., Milwaukee, WI 53226, USA.
| |
Collapse
|
20
|
Overview of the ImageCLEFmed 2006 Medical Retrieval and Medical Annotation Tasks. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL 2007. [DOI: 10.1007/978-3-540-74999-8_72] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
21
|
Hersh W, Kalpathy-Cramer J, Jensen J. Medical Image Retrieval and Automated Annotation: OHSU at ImageCLEF 2006. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL 2007. [DOI: 10.1007/978-3-540-74999-8_81] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|