1
|
Gorenstein L, Konen E, Green M, Klang E. Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications. J Am Coll Radiol 2024; 21:914-941. [PMID: 38302036 DOI: 10.1016/j.jacr.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/13/2024] [Accepted: 01/26/2024] [Indexed: 02/03/2024]
Abstract
INTRODUCTION Bidirectional Encoder Representations from Transformers (BERT), introduced in 2018, has revolutionized natural language processing. Its bidirectional understanding of word context has enabled innovative applications, notably in radiology. This study aimed to assess BERT's influence and applications within the radiologic domain. METHODS Adhering to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a systematic review, searching PubMed for literature on BERT-based models and natural language processing in radiology from January 1, 2018, to February 12, 2023. The search encompassed keywords related to generative models, transformer architecture, and various imaging techniques. RESULTS Of 597 results, 30 met our inclusion criteria. The remaining were unrelated to radiology or did not use BERT-based models. The included studies were retrospective, with 14 published in 2022. The primary focus was on classification and information extraction from radiology reports, with x-rays as the prevalent imaging modality. Specific investigations included automatic CT protocol assignment and deep learning applications in chest x-ray interpretation. CONCLUSION This review underscores the primary application of BERT in radiology for report classification. It also reveals emerging BERT applications for protocol assignment and report generation. As BERT technology advances, we foresee further innovative applications. Its implementation in radiology holds potential for enhancing diagnostic precision, expediting report generation, and optimizing patient care.
Collapse
Affiliation(s)
- Larisa Gorenstein
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel; Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Eli Konen
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; Chair, Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel
| | - Michael Green
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat-Gan, Israel; Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Klang
- Icahn School of Medicine at Mount Sinai, New York, New York; and Associate Professor of Radiology, Innovation Center, Sheba Medical Center, Affiliated with Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
2
|
Dada A, Ufer TL, Kim M, Hasin M, Spieker N, Forsting M, Nensa F, Egger J, Kleesiek J. Information extraction from weakly structured radiological reports with natural language queries. Eur Radiol 2024; 34:330-337. [PMID: 37505252 DOI: 10.1007/s00330-023-09977-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 05/08/2023] [Accepted: 05/27/2023] [Indexed: 07/29/2023]
Abstract
OBJECTIVES Provide physicians and researchers an efficient way to extract information from weakly structured radiology reports with natural language processing (NLP) machine learning models. METHODS We evaluate seven different German bidirectional encoder representations from transformers (BERT) models on a dataset of 857,783 unlabeled radiology reports and an annotated reading comprehension dataset in the format of SQuAD 2.0 based on 1223 additional reports. RESULTS Continued pre-training of a BERT model on the radiology dataset and a medical online encyclopedia resulted in the most accurate model with an F1-score of 83.97% and an exact match score of 71.63% for answerable questions and 96.01% accuracy in detecting unanswerable questions. Fine-tuning a non-medical model without further pre-training led to the lowest-performing model. The final model proved stable against variation in the formulations of questions and in dealing with questions on topics excluded from the training set. CONCLUSIONS General domain BERT models further pre-trained on radiological data achieve high accuracy in answering questions on radiology reports. We propose to integrate our approach into the workflow of medical practitioners and researchers to extract information from radiology reports. CLINICAL RELEVANCE STATEMENT By reducing the need for manual searches of radiology reports, radiologists' resources are freed up, which indirectly benefits patients. KEY POINTS • BERT models pre-trained on general domain datasets and radiology reports achieve high accuracy (83.97% F1-score) on question-answering for radiology reports. • The best performing model achieves an F1-score of 83.97% for answerable questions and 96.01% accuracy for questions without an answer. • Additional radiology-specific pretraining of all investigated BERT models improves their performance.
Collapse
Affiliation(s)
- Amin Dada
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany.
| | - Tim Leon Ufer
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
| | - Moon Kim
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
| | - Max Hasin
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
| | | | - Michael Forsting
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
| | - Felix Nensa
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany
| | - Jan Egger
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Essen, Germany
| | - Jens Kleesiek
- Institute of AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, 45131, Essen, Germany
- Dr. Krüger MVZ GmbH, Bocholt, Germany
- German Cancer Consortium (DKTK), Partner Site Essen, Essen, Germany
| |
Collapse
|
3
|
Kim M, Ong KTI, Choi S, Yeo J, Kim S, Han K, Park JE, Kim HS, Choi YS, Ahn SS, Kim J, Lee SK, Sohn B. Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports. Eur Radiol 2023; 33:8017-8025. [PMID: 37566271 DOI: 10.1007/s00330-023-10061-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/18/2023] [Accepted: 06/22/2023] [Indexed: 08/12/2023]
Abstract
OBJECTIVES To evaluate the performance of natural language processing (NLP) models to predict isocitrate dehydrogenase (IDH) mutation status in diffuse glioma using routine MR radiology reports. MATERIALS AND METHODS This retrospective, multi-center study included consecutive patients with diffuse glioma with known IDH mutation status from May 2009 to November 2021 whose initial MR radiology report was available prior to pathologic diagnosis. Five NLP models (long short-term memory [LSTM], bidirectional LSTM, bidirectional encoder representations from transformers [BERT], BERT graph convolutional network [GCN], BioBERT) were trained, and area under the receiver operating characteristic curve (AUC) was assessed to validate prediction of IDH mutation status in the internal and external validation sets. The performance of the best performing NLP model was compared with that of the human readers. RESULTS A total of 1427 patients (mean age ± standard deviation, 54 ± 15; 779 men, 54.6%) with 720 patients in the training set, 180 patients in the internal validation set, and 527 patients in the external validation set were included. In the external validation set, BERT GCN showed the highest performance (AUC 0.85, 95% CI 0.81-0.89) in predicting IDH mutation status, which was higher than LSTM (AUC 0.77, 95% CI 0.72-0.81; p = .003) and BioBERT (AUC 0.81, 95% CI 0.76-0.85; p = .03). This was higher than that of a neuroradiologist (AUC 0.80, 95% CI 0.76-0.84; p = .005) and a neurosurgeon (AUC 0.79, 95% CI 0.76-0.84; p = .04). CONCLUSION BERT GCN was externally validated to predict IDH mutation status in patients with diffuse glioma using routine MR radiology reports with superior or at least comparable performance to human reader. CLINICAL RELEVANCE STATEMENT Natural language processing may be used to extract relevant information from routine radiology reports to predict cancer genotype and provide prognostic information that may aid in guiding treatment strategy and enabling personalized medicine. KEY POINTS • A transformer-based natural language processing (NLP) model predicted isocitrate dehydrogenase mutation status in diffuse glioma with an AUC of 0.85 in the external validation set. • The best NLP models were superior or at least comparable to human readers in both internal and external validation sets. • Transformer-based models showed higher performance than conventional NLP model such as long short-term memory.
Collapse
Affiliation(s)
- Minjae Kim
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Kai Tzu-Iunn Ong
- Department of Artificial Intelligence, College of Computing, Yonsei University, Seoul, Korea
| | - Seonah Choi
- Department of Neurosurgery, Brain Tumor Center, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Jinyoung Yeo
- Department of Artificial Intelligence, College of Computing, Yonsei University, Seoul, Korea
| | - Sooyon Kim
- Department of Statistics and Data Science, Yonsei University, Seoul, Korea
| | - Kyunghwa Han
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - Ji Eun Park
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Ho Sung Kim
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Yoon Seong Choi
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - Sung Soo Ahn
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - Jinna Kim
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - Seung-Koo Lee
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - Beomseok Sohn
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea.
- Department of Radiology and Center for Imaging Sciences, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea.
| |
Collapse
|
4
|
Fink MA. From data to insights: how natural language processing and structured reporting advance data-driven radiology. Eur Radiol 2023; 33:7494-7495. [PMID: 37782342 PMCID: PMC10598143 DOI: 10.1007/s00330-023-10242-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 08/30/2023] [Accepted: 09/05/2023] [Indexed: 10/03/2023]
Affiliation(s)
- Matthias A Fink
- Clinic for Diagnostic and Interventional Radiology, University Hospital Heidelberg, Heidelberg, Germany.
- Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany.
| |
Collapse
|
5
|
Tejani AS. To BERT or not to BERT: advancing non-invasive prediction of tumor biomarkers using transformer-based natural language processing (NLP). Eur Radiol 2023; 33:8014-8016. [PMID: 37740083 DOI: 10.1007/s00330-023-10224-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/27/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023]
Affiliation(s)
- Ali S Tejani
- Department of Radiology, The University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, 75390, USA.
| |
Collapse
|
6
|
Waters MR, Aneja S, Hong JC. Unlocking the Power of ChatGPT, Artificial Intelligence, and Large Language Models: Practical Suggestions for Radiation Oncologists. Pract Radiat Oncol 2023; 13:e484-e490. [PMID: 37598727 DOI: 10.1016/j.prro.2023.06.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 08/22/2023]
Abstract
Recent advances in artificial intelligence (AI), such as generative AI and large language models (LLMs), have generated significant excitement about the potential of AI to revolutionize our lives, work, and interaction with technology. This article explores the practical applications of LLMs, particularly ChatGPT, in the field of radiation oncology. We offer a guide on how radiation oncologists can interact with LLMs like ChatGPT in their routine clinical and administrative tasks, highlighting potential use cases of the present and future. We also highlight limitations and ethical considerations, including the current state of LLMs in decision making, protection of sensitive data, and the important role of human review of AI-generated content.
Collapse
Affiliation(s)
- Michael R Waters
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, Missouri
| | - Sanjay Aneja
- Department of Radiation Oncology, Yale School of Medicine, New Haven, Connecticut
| | - Julian C Hong
- Department of Radiation Oncology and Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California.
| |
Collapse
|
7
|
Elbatarny L, Do RKG, Gangai N, Ahmed F, Chhabra S, Simpson AL. Applying Natural Language Processing to Single-Report Prediction of Metastatic Disease Response Using the OR-RADS Lexicon. Cancers (Basel) 2023; 15:4909. [PMID: 37894276 PMCID: PMC10605614 DOI: 10.3390/cancers15204909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 09/25/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023] Open
Abstract
Generating Real World Evidence (RWE) on disease responses from radiological reports is important for understanding cancer treatment effectiveness and developing personalized treatment. A lack of standardization in reporting among radiologists impacts the feasibility of large-scale interpretation of disease response. This study examines the utility of applying natural language processing (NLP) to the large-scale interpretation of disease responses using a standardized oncologic response lexicon (OR-RADS) to facilitate RWE collection. Radiologists annotated 3503 retrospectively collected clinical impressions from radiological reports across several cancer types with one of seven OR-RADS categories. A Bidirectional Encoder Representations from Transformers (BERT) model was trained on this dataset with an 80-20% train/test split to perform multiclass and single-class classification tasks using the OR-RADS. Radiologists also performed the classification to compare human and model performance. The model achieved accuracies from 95 to 99% across all classification tasks, performing better in single-class tasks compared to the multiclass task and producing minimal misclassifications, which pertained mostly to overpredicting the equivocal and mixed OR-RADS labels. Human accuracy ranged from 74 to 93% across all classification tasks, performing better on single-class tasks. This study demonstrates the feasibility of the BERT NLP model in predicting disease response in cancer patients, exceeding human performance, and encourages the use of the standardized OR-RADS lexicon to improve large-scale prediction accuracy.
Collapse
Affiliation(s)
- Lydia Elbatarny
- School of Computing, Queen’s University, Kingston, ON K7L 2N8, Canada;
| | - Richard K. G. Do
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (N.G.); (F.A.); (S.C.)
| | - Natalie Gangai
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (N.G.); (F.A.); (S.C.)
| | - Firas Ahmed
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (N.G.); (F.A.); (S.C.)
| | - Shalini Chhabra
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; (N.G.); (F.A.); (S.C.)
| | - Amber L. Simpson
- School of Computing, Queen’s University, Kingston, ON K7L 2N8, Canada;
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, ON K7L 2V7, Canada
| |
Collapse
|
8
|
Belkouchi Y, Lederlin M, Ben Afia A, Fabre C, Ferretti G, De Margerie C, Berge P, Liberge R, Elbaz N, Blain M, Brillet PY, Chassagnon G, Cadour F, Caramella C, Hajjam ME, Boussouar S, Hadchiti J, Fablet X, Khalil A, Luciani A, Cotten A, Meder JF, Talbot H, Lassau N. Detection and quantification of pulmonary embolism with artificial intelligence: The SFR 2022 artificial intelligence data challenge. Diagn Interv Imaging 2023; 104:485-489. [PMID: 37321875 DOI: 10.1016/j.diii.2023.05.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 05/29/2023] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
PURPOSE In 2022, the French Society of Radiology together with the French Society of Thoracic Imaging and CentraleSupelec organized their 13th data challenge. The aim was to aid in the diagnosis of pulmonary embolism, by identifying the presence of pulmonary embolism and by estimating the ratio between right and left ventricular (RV/LV) diameters, and an arterial obstruction index (Qanadli's score) using artificial intelligence. MATERIALS AND METHODS The data challenge was composed of three tasks: the detection of pulmonary embolism, the RV/LV diameter ratio, and Qanadli's score. Sixteen centers all over France participated in the inclusion of the cases. A health data hosting certified web platform was established to facilitate the inclusion process of the anonymized CT examinations in compliance with general data protection regulation. CT pulmonary angiography images were collected. Each center provided the CT examinations with their annotations. A randomization process was established to pool the scans from different centers. Each team was required to have at least a radiologist, a data scientist, and an engineer. Data were provided in three batches to the teams, two for training and one for evaluation. The evaluation of the results was determined to rank the participants on the three tasks. RESULTS A total of 1268 CT examinations were collected from the 16 centers following the inclusion criteria. The dataset was split into three batches of 310, 580 and 378 C T examinations provided to the participants respectively on September 5, 2022, October 7, 2022 and October 9, 2022. Seventy percent of the data from each center were used for training, and 30% for the evaluation. Seven teams with a total of 48 participants including data scientists, researchers, radiologists and engineering students were registered for participation. The metrics chosen for evaluation included areas under receiver operating characteristic curves, specificity and sensitivity for the classification task, and the coefficient of determination r2 for the regression tasks. The winning team achieved an overall score of 0.784. CONCLUSION This multicenter study suggests that the use of artificial intelligence for the diagnosis of pulmonary embolism is possible on real data. Moreover, providing quantitative measures is mandatory for the interpretability of the results, and is of great aid to the radiologists especially in emergency settings.
Collapse
Affiliation(s)
- Younes Belkouchi
- OPIS, CentraleSupelec, Inria, Université Paris-Saclay, 91190 Gif-Sur-Yvette, France; Laboratoire d'Imagerie Biomédicale Multimodale Paris-Saclay, BIOMAPS, UMR 1281, Université Paris-Saclay, Inserm, CNRS, CEA, 94800 Villejuif, France.
| | | | - Amira Ben Afia
- Department of Radiology, APHP Nord, Hôpital Bichat, 75018 Paris, France; Université Paris Cité, 75006 Paris, France
| | - Clement Fabre
- Department of Radiology, Centre Hospitalier de Laval, 53000 Laval, France
| | - Gilbert Ferretti
- Universite Grenobles Alpes, Service de Radiologie et Imagerie Médicale, CHU Grenoble-Alpes, 38000 Grenoble, France
| | - Constance De Margerie
- Department of Radiology, Assistance Publique-Hôpitaux de Paris, Hôpital Saint-Louis, 75010 Paris, France; Université Paris Cité, 75006 Paris, France
| | - Pierre Berge
- Department of Radiology, CHU Angers, 49000 Angers, France
| | - Renan Liberge
- Department of Radiology, CHU Nantes, 44000 Nantes, France
| | - Nicolas Elbaz
- Department of Radiology, Hôpital Européen Georges Pompidou, AP-HP, 75015 Paris, France
| | - Maxime Blain
- Department of Radiology, Hopital Henri Mondor, AP-HP, 94000 Créteil, France
| | - Pierre-Yves Brillet
- Department of Radiology, Hôpital Avicenne, Paris 13 University, 93000 Bobigny, France
| | - Guillaume Chassagnon
- Department of Radiology, Hopital Cochin, APHP, 75014 Paris, France; Université Paris Cité, 75006 Paris, France
| | - Farah Cadour
- APHM, Hôpital Universitaire Timone, CEMEREM, 13005 Marseille, France
| | - Caroline Caramella
- Department of Radiology, Groupe hospitalier Paris Saint-Joseph, Île-de-France, 75015 Paris, France
| | - Mostafa El Hajjam
- Department of Radiology, Ambroise Paré Hospital GH AP-HP Paris Saclay, UMR 1179 INSERM/UVSQ, Team 3, 92100 Boulogne-Billancourt, France
| | - Samia Boussouar
- Sorbonne Université, APHP, Hôpital La Pitié-Salpêtrière, Unité d'Imagerie Cardiovasculaire et Thoracique (ICT), 75013 Paris, France
| | - Joya Hadchiti
- Department of Imaging, Institut Gustave Roussy, 94800 Villejuif, France
| | - Xavier Fablet
- Department of Radiology, CHU Rennes, 35000 Rennes, France
| | - Antoine Khalil
- Department of Radiology, APHP Nord, Hôpital Bichat, 75018 Paris, France; Université Paris Cité, 75006 Paris, France
| | - Alain Luciani
- Medical Imaging Department, AP-HP, Henri Mondor University Hospital, 94000 Créteil, France; INSERM, U955, Team 18, 94000 Créteil, France
| | - Anne Cotten
- Department of Musculoskeletal Radiology, Univ. Lille, CHU Lille, MABlab ULR 4490, 59000 Lille, France
| | - Jean-Francois Meder
- Department of Neuroimaging, Sainte-Anne Hospital, 75013 Paris, France; Université Paris Cité, 75006 Paris, France
| | - Hugues Talbot
- OPIS, CentraleSupelec, Inria, Université Paris-Saclay, 91190 Gif-Sur-Yvette, France
| | - Nathalie Lassau
- Laboratoire d'Imagerie Biomédicale Multimodale Paris-Saclay, BIOMAPS, UMR 1281, Université Paris-Saclay, Inserm, CNRS, CEA, 94800 Villejuif, France; Department of Imaging, Institut Gustave Roussy, 94800 Villejuif, France
| |
Collapse
|
9
|
Fink MA, Bischoff A, Fink CA, Moll M, Kroschke J, Dulz L, Heußel CP, Kauczor HU, Weber TF. Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer. Radiology 2023; 308:e231362. [PMID: 37724963 DOI: 10.1148/radiol.231362] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Abstract
Background The latest large language models (LLMs) solve unseen problems via user-defined text prompts without the need for retraining, offering potentially more efficient information extraction from free-text medical records than manual annotation. Purpose To compare the performance of the LLMs ChatGPT and GPT-4 in data mining and labeling oncologic phenotypes from free-text CT reports on lung cancer by using user-defined prompts. Materials and Methods This retrospective study included patients who underwent lung cancer follow-up CT between September 2021 and March 2023. A subset of 25 reports was reserved for prompt engineering to instruct the LLMs in extracting lesion diameters, labeling metastatic disease, and assessing oncologic progression. This output was fed into a rule-based natural language processing pipeline to match ground truth annotations from four radiologists and derive performance metrics. The oncologic reasoning of LLMs was rated on a five-point Likert scale for factual correctness and accuracy. The occurrence of confabulations was recorded. Statistical analyses included Wilcoxon signed rank and McNemar tests. Results On 424 CT reports from 424 patients (mean age, 65 years ± 11 [SD]; 265 male), GPT-4 outperformed ChatGPT in extracting lesion parameters (98.6% vs 84.0%, P < .001), resulting in 96% correctly mined reports (vs 67% for ChatGPT, P < .001). GPT-4 achieved higher accuracy in identification of metastatic disease (98.1% [95% CI: 97.7, 98.5] vs 90.3% [95% CI: 89.4, 91.0]) and higher performance in generating correct labels for oncologic progression (F1 score, 0.96 [95% CI: 0.94, 0.98] vs 0.91 [95% CI: 0.89, 0.94]) (both P < .001). In oncologic reasoning, GPT-4 had higher Likert scale scores for factual correctness (4.3 vs 3.9) and accuracy (4.4 vs 3.3), with a lower rate of confabulation (1.7% vs 13.7%) than ChatGPT (all P < .001). Conclusion When using user-defined prompts, GPT-4 outperformed ChatGPT in extracting oncologic phenotypes from free-text CT reports on lung cancer and demonstrated better oncologic reasoning with fewer confabulations. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Hafezi-Nejad and Trivedi in this issue.
Collapse
Affiliation(s)
- Matthias A Fink
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Arved Bischoff
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Christoph A Fink
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Martin Moll
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Jonas Kroschke
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Luca Dulz
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Claus Peter Heußel
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Hans-Ulrich Kauczor
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| | - Tim F Weber
- From the Clinic for Diagnostic and Interventional Radiology (M.A.F., A.B., M.M., J.K., L.D., C.P.H., H.U.K., T.F.W.) and Department of Radiation Oncology (C.A.F.), University Hospital Heidelberg, Im Neuenheimer Feld 420, 69120 Heidelberg, Germany; Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany (M.A.F., A.B., L.D., C.P.H., H.U.K., T.F.W.); and Department of Diagnostic and Interventional Radiology with Nuclear Medicine, Heidelberg Thoracic Clinic, University of Heidelberg, Heidelberg, Germany (C.P.H.)
| |
Collapse
|
10
|
Fink MA. [Large language models such as ChatGPT and GPT-4 for patient-centered care in radiology]. RADIOLOGIE (HEIDELBERG, GERMANY) 2023; 63:665-671. [PMID: 37615692 DOI: 10.1007/s00117-023-01187-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 07/14/2023] [Indexed: 08/25/2023]
Abstract
BACKGROUND With the introduction of ChatGPT in late November 2022, large language models based on artificial intelligence have gained worldwide recognition. These language models are trained on vast amounts of data, enabling them to process complex tasks in seconds and provide detailed, high-level text-based responses. OBJECTIVE To provide an overview of the most widely discussed large language models, ChatGPT and GPT‑4, with a focus on potential applications for patient-centered radiology. MATERIALS AND METHODS A PubMed search of both large language models was performed using the terms "ChatGPT" and "GPT-4", with subjective selection and completion in the form of a narrative review. RESULTS The generic nature of language models holds great promise for radiology, enabling both patients and referrers to facilitate understanding of radiological findings, overcome language barriers, and improve the quality of informed consent discussions. This could represent a significant step towards patient-centered or person-centered radiology. CONCLUSION Large language models represent a promising tool for improving the communication of findings, interdisciplinary collaboration, and workflow in radiology. However, important privacy issues and the reliable applicability of these models in medicine remain to be addressed.
Collapse
Affiliation(s)
- Matthias A Fink
- Klinik für Diagnostische und Interventionelle Radiologie, Universitätsklinikum Heidelberg, Im Neuenheimer Feld 420, 69120, Heidelberg, Deutschland.
| |
Collapse
|
11
|
Grouin C, Grabar N. Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area. Yearb Med Inform 2023; 32:244-252. [PMID: 38147866 PMCID: PMC10751107 DOI: 10.1055/s-0043-1768752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVES To analyse the content of publications within the medical Natural Language Processing (NLP) domain in 2022. METHODS Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS Three best papers have been selected. We also propose an analysis of the content of the NLP publications in 2022, stressing on some of the topics. CONCLUSION The main trend in 2022 is certainly related to the availability of large language models, especially those based on Transformers, and to their use by non-NLP researchers. This leads to the democratization of the NLP methods. We also observe the renewal of interest to languages other than English, the continuation of research on information extraction and prediction, the massive use of data from social media, and the consideration of needs and interests of patients.
Collapse
Affiliation(s)
- Cyril Grouin
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Natalia Grabar
- UMR8163 STL, CNRS, Université de Lille, Domaine du Pont-de-bois, 59653 Villeneuve-d'Ascq cedex, France
| | | |
Collapse
|
12
|
Kleesiek J, Wu Y, Stiglic G, Egger J, Bian J. An Opinion on ChatGPT in Health Care-Written by Humans Only. J Nucl Med 2023; 64:701-703. [PMID: 37055219 DOI: 10.2967/jnumed.123.265687] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/14/2023] [Indexed: 04/15/2023] Open
Affiliation(s)
- Jens Kleesiek
- Institute for AI in Medicine, University Medicine Essen, Essen, Germany;
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida; and
| | - Gregor Stiglic
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia
| | - Jan Egger
- Institute for AI in Medicine, University Medicine Essen, Essen, Germany
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida; and
| |
Collapse
|
13
|
Jantscher M, Gunzer F, Kern R, Hassler E, Tschauner S, Reishofer G. Information extraction from German radiological reports for general clinical text and language understanding. Sci Rep 2023; 13:2353. [PMID: 36759679 PMCID: PMC9911592 DOI: 10.1038/s41598-023-29323-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 02/02/2023] [Indexed: 02/11/2023] Open
Abstract
Recent advances in deep learning and natural language processing (NLP) have opened many new opportunities for automatic text understanding and text processing in the medical field. This is of great benefit as many clinical downstream tasks rely on information from unstructured clinical documents. However, for low-resource languages like German, the use of modern text processing applications that require a large amount of training data proves to be difficult, as only few data sets are available mainly due to legal restrictions. In this study, we present an information extraction framework that was initially pre-trained on real-world computed tomographic (CT) reports of head examinations, followed by domain adaptive fine-tuning on reports from different imaging examinations. We show that in the pre-training phase, the semantic and contextual meaning of one clinical reporting domain can be captured and effectively transferred to foreign clinical imaging examinations. Moreover, we introduce an active learning approach with an intrinsic strategic sampling method to generate highly informative training data with low human annotation cost. We see that the model performance can be significantly improved by an appropriate selection of the data to be annotated, without the need to train the model on a specific downstream task. With a general annotation scheme that can be used not only in the radiology field but also in a broader clinical setting, we contribute to a more consistent labeling and annotation process that also facilitates the verification and evaluation of language models in the German clinical setting.
Collapse
Affiliation(s)
| | - Felix Gunzer
- Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria
| | | | - Eva Hassler
- Division of Neuroradiology, Vascular and Interventional Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria
| | - Sebastian Tschauner
- Division of Pediatric Radiology, Department of Radiology, Medical University Graz, 8036, Graz, Austria
| | - Gernot Reishofer
- Department of Radiology, Medical University Graz, 8036, Graz, Austria. .,BioTechMed-Graz, 8010, Graz, Austria.
| |
Collapse
|