Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, Hsu CN. RadBERT: Adapting Transformer-based Language Models to Radiology. Radiol Artif Intell 2022;4:e210258. [PMID: 35923376 PMCID: PMC9344353 DOI: 10.1148/ryai.210258] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 04/28/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]

For:	Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, Hsu CN. RadBERT: Adapting Transformer-based Language Models to Radiology. Radiol Artif Intell 2022;4:e210258. [PMID: 35923376 PMCID: PMC9344353 DOI: 10.1148/ryai.210258] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 04/28/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]

Number

Cited by Other Article(s)

Bhayana R, Biswas S, Cook TS, Kim W, Kitamura FC, Gichoya J, Yi PH. From Bench to Bedside With Large Language Models: AJR Expert Panel Narrative Review. AJR Am J Roentgenol 2024:1-10. [PMID: 38598354 DOI: 10.2214/ajr.24.30928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]

Ding JE, Thao PNM, Peng WC, Wang JZ, Chug CC, Hsieh MC, Tseng YC, Chen L, Luo D, Wu C, Wang CT, Hsu CH, Chen YT, Chen PF, Liu F, Hung FM. Large language multimodal models for new-onset type 2 diabetes prediction using five-year cohort electronic health records. Sci Rep 2024;14:20774. [PMID: 39237580 PMCID: PMC11377777 DOI: 10.1038/s41598-024-71020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 08/23/2024] [Indexed: 09/07/2024] Open

Affiliation(s)

Jun-En Ding School of Systems and Enterprises, Stevens Institute of Technology, Hoboken, USA
Phan Nguyen Minh Thao Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
Wen-Chih Peng Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
Jian-Zhe Wang Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
Chun-Cheng Chug Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
Min-Chen Hsieh Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
Yun-Chien Tseng Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
Ling Chen Institute of Hospital and Health Care Administration, National Yang Ming Chiao Tung University, Taipei City, Taiwan
Dongsheng Luo School of Computing and Information Science, Florida International University, Miami, USA
Chenwei Wu Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
Chi-Te Wang Center of Artificial Intelligence, Far Eastern Memorial Hospital, New Taipei City, Taiwan
Chih-Ho Hsu Department of Surgery, Far Eastern Memorial Hospital, New Taipei City, Taiwan
Yi-Tui Chen Smart Healthcare Interdisciplinary College, National Taipei University of Nursing and Health Sciences, Taipei City, Taiwan
Pei-Fu Chen Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei City, Taiwan Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
Feng Liu School of Systems and Enterprises, Stevens Institute of Technology, Hoboken, USA
Fang-Ming Hung Surgical Trauma Intensive Care Unit, Far Eastern Memorial Hospital, New Taipei City, Taiwan. Smart Healthcare Interdisciplinary College, National Taipei University of Nursing and Health Sciences, Taipei City, Taiwan.

Collapse

Huemann Z, Tie X, Hu J, Bradshaw TJ. ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024;37:1652-1663. [PMID: 38485899 PMCID: PMC11300752 DOI: 10.1007/s10278-024-01051-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/09/2024] [Accepted: 01/17/2024] [Indexed: 07/24/2024]

Kim S, Kim SS, Kim E, Cecchini M, Park MS, Choi JA, Kim SH, Hwang HK, Kang CM, Choi HJ, Shin SJ, Kang J, Lee CK. Deep-Transfer-Learning-Based Natural Language Processing of Serial Free-Text Computed Tomography Reports for Predicting Survival of Patients With Pancreatic Cancer. JCO Clin Cancer Inform 2024;8:e2400021. [PMID: 39151114 DOI: 10.1200/cci.24.00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/22/2024] [Accepted: 07/10/2024] [Indexed: 08/18/2024] Open

Abstract

PURPOSE

To explore the predictive potential of serial computed tomography (CT) radiology reports for pancreatic cancer survival using natural language processing (NLP).

METHODS

Deep-transfer-learning-based NLP models were retrospectively trained and tested with serial, free-text CT reports, and survival information of consecutive patients diagnosed with pancreatic cancer in a Korean tertiary hospital was extracted. Randomly selected patients with pancreatic cancer and their serial CT reports from an independent tertiary hospital in the United States were included in the external testing data set. The concordance index (c-index) of predicted survival and actual survival, and area under the receiver operating characteristic curve (AUROC) for predicting 1-year survival were calculated.

RESULTS

Between January 2004 and June 2021, 2,677 patients with 12,255 CT reports and 670 patients with 3,058 CT reports were allocated to training and internal testing data sets, respectively. ClinicalBERT (Bidirectional Encoder Representations from Transformers) model trained on the single, first CT reports showed a c-index of 0.653 and AUROC of 0.722 in predicting the overall survival of patients with pancreatic cancer. ClinicalBERT trained on up to 15 consecutive reports from the initial report showed an improved c-index of 0.811 and AUROC of 0.911. On the external testing set with 273 patients with 1,947 CT reports, the AUROC was 0.888, indicating the generalizability of our model. Further analyses showed our model's contextual interpretation beyond specific phrases.

CONCLUSION

Deep-transfer-learning-based NLP model of serial CT reports can predict the survival of patients with pancreatic cancer. Clinical decisions can be supported by the developed model, with survival information extracted solely from serial radiology reports.

Collapse

Affiliation(s)

Sunkyu Kim Department of Computer Science and Engineering, Korea University, Seoul, Korea
Seung-Seob Kim Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
Eejung Kim Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Michael Cecchini Department of Internal Medicine (Medical Oncology), Yale University School of Medicine, New Haven, CT
Mi-Suk Park Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea
Ji A Choi Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea
Sung Hyun Kim Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
Ho Kyoung Hwang Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
Chang Moo Kang Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Department of Surgery, Yonsei University College of Medicine, Seoul, Korea
Hye Jin Choi Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
Sang Joon Shin Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
Jaewoo Kang Department of Computer Science and Engineering, Korea University, Seoul, Korea AIGEN Sciences Inc, Seoul, Korea
Choong-Kun Lee Pancreaticobiliary Cancer Clinic, Yonsei Cancer Center, Severance Hospital, Seoul, Korea Song-dang Institute for Cancer Research, Yonsei University College of Medicine, Seoul, Korea Division of Medical Oncology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea

Collapse

Kanzawa J, Yasaka K, Fujita N, Fujiwara S, Abe O. Automated classification of brain MRI reports using fine-tuned large language models. Neuroradiology 2024:10.1007/s00234-024-03427-7. [PMID: 38995393 DOI: 10.1007/s00234-024-03427-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/05/2024] [Indexed: 07/13/2024]

Nakai H, Suman G, Adamo DA, Navin PJ, Bookwalter CA, LeGout JD, Chen FK, Wellnitz CV, Silva AC, Thomas JV, Kawashima A, Fan JW, Froemming AT, Lomas DJ, Humphreys MR, Dora C, Korfiatis P, Takahashi N. Natural language processing pipeline to extract prostate cancer-related information from clinical notes. Eur Radiol 2024:10.1007/s00330-024-10812-6. [PMID: 38842692 DOI: 10.1007/s00330-024-10812-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/28/2024] [Accepted: 04/10/2024] [Indexed: 06/07/2024]

Abstract

OBJECTIVES

To develop an automated pipeline for extracting prostate cancer-related information from clinical notes.

MATERIALS AND METHODS

This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists.

RESULTS

Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively.

CONCLUSION

The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes.

CLINICAL RELEVANCE STATEMENT

The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI.

KEY POINTS

When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.

Collapse

Yao J, Alabousi A, Mironov O. Evaluation of a BERT Natural Language Processing Model for Automating CT and MRI Triage and Protocol Selection. Can Assoc Radiol J 2024:8465371241255895. [PMID: 38832645 DOI: 10.1177/08465371241255895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024] Open

Gorenstein L, Konen E, Green M, Klang E. Bidirectional Encoder Representations from Transformers in Radiology: A Systematic Review of Natural Language Processing Applications. J Am Coll Radiol 2024;21:914-941. [PMID: 38302036 DOI: 10.1016/j.jacr.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/13/2024] [Accepted: 01/26/2024] [Indexed: 02/03/2024]

Hasani AM, Singh S, Zahergivar A, Ryan B, Nethala D, Bravomontenegro G, Mendhiratta N, Ball M, Farhadi F, Malayeri A. Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports. Eur Radiol 2024;34:3566-3574. [PMID: 37938381 DOI: 10.1007/s00330-023-10384-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/01/2023] [Accepted: 09/08/2023] [Indexed: 11/09/2023]

Abstract

OBJECTIVE

Radiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports.

METHODS

A comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports.

RESULTS

The AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775.

CONCLUSION

The results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice.

CLINICAL RELEVANCE STATEMENT

The findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice.

KEY POINTS

• Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports. • Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. • Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.

Collapse

Tay SB, Low GH, Wong GJE, Tey HJ, Leong FL, Li C, Chua MLK, Tan DSW, Thng CH, Tan IBH, Tan RSYC. Use of Natural Language Processing to Infer Sites of Metastatic Disease From Radiology Reports at Scale. JCO Clin Cancer Inform 2024;8:e2300122. [PMID: 38788166 PMCID: PMC11371090 DOI: 10.1200/cci.23.00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 03/02/2024] [Accepted: 04/01/2024] [Indexed: 05/26/2024] Open

Abstract

PURPOSE

To evaluate natural language processing (NLP) methods to infer metastatic sites from radiology reports.

METHODS

A set of 4,522 computed tomography (CT) reports of 550 patients with 14 types of cancer was used to fine-tune four clinical large language models (LLMs) for multilabel classification of metastatic sites. We also developed an NLP information extraction (IE) system (on the basis of named entity recognition, assertion status detection, and relation extraction) for comparison. Model performances were measured by F1 scores on test and three external validation sets. The best model was used to facilitate analysis of metastatic frequencies in a cohort study of 6,555 patients with 53,838 CT reports.

RESULTS

The RadBERT, BioBERT, GatorTron-base, and GatorTron-medium LLMs achieved F1 scores of 0.84, 0.87, 0.89, and 0.91, respectively, on the test set. The IE system performed best, achieving an F1 score of 0.93. F1 scores of the IE system by individual cancer type ranged from 0.89 to 0.96. The IE system attained F1 scores of 0.89, 0.83, and 0.81, respectively, on external validation sets including additional cancer types, positron emission tomography-CT ,and magnetic resonance imaging scans, respectively. In our cohort study, we found that for colorectal cancer, liver-only metastases were higher in de novo stage IV versus recurrent patients (29.7% v 12.2%; P < .001). Conversely, lung-only metastases were more frequent in recurrent versus de novo stage IV patients (17.2% v 7.3%; P < .001).

CONCLUSION

We developed an IE system that accurately infers metastatic sites in multiple primary cancers from radiology reports. It has explainable methods and performs better than some clinical LLMs. The inferred metastatic phenotypes could enhance cancer research databases and clinical trial matching, and identify potential patients for oligometastatic interventions.

Collapse

Affiliation(s)

See Boon Tay Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore NUS Yong Loo Lin School of Medicine, Singapore, Singapore
Guat Hwa Low Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore
Gillian Jing En Wong NUS Yong Loo Lin School of Medicine, Singapore, Singapore
Han Jieh Tey Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore
Fun Loon Leong Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore
Constance Li Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore
Melvin Lee Kiang Chua Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore Singapore Duke-NUS Medical School, Singapore, Singapore Division of Radiation Oncology, National Cancer Centre Singapore, Singapore, Singapore
Daniel Shao Weng Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore Singapore Duke-NUS Medical School, Singapore, Singapore Division of Clinical Trials and Epidemiological Sciences, National Cancer Centre Singapore, Singapore, Singapore
Choon Hua Thng Singapore Duke-NUS Medical School, Singapore, Singapore Division of Oncologic Imaging, National Cancer Centre Singapore, Singapore, Singapore
Iain Bee Huat Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore Singapore Duke-NUS Medical School, Singapore, Singapore
Ryan Shea Ying Cong Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore, Singapore Singapore Duke-NUS Medical School, Singapore, Singapore Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY

Collapse

Lyu D, Wang X, Chen Y, Wang F. Language model and its interpretability in biomedicine: A scoping review. iScience 2024;27:109334. [PMID: 38495823 PMCID: PMC10940999 DOI: 10.1016/j.isci.2024.109334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Open

Oeding JF, Yang L, Sanchez-Sotelo J, Camp CL, Karlsson J, Samuelsson K, Pearle AD, Ranawat AS, Kelly BT, Pareek A. A practical guide to the development and deployment of deep learning models for the orthopaedic surgeon: Part III, focus on registry creation, diagnosis, and data privacy. Knee Surg Sports Traumatol Arthrosc 2024;32:518-528. [PMID: 38426614 DOI: 10.1002/ksa.12085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 03/02/2024]

Abstract

Deep learning is a subset of artificial intelligence (AI) with enormous potential to transform orthopaedic surgery. As has already become evident with the deployment of Large Language Models (LLMs) like ChatGPT (OpenAI Inc.), deep learning can rapidly enter clinical and surgical practices. As such, it is imperative that orthopaedic surgeons acquire a deeper understanding of the technical terminology, capabilities and limitations associated with deep learning models. The focus of this series thus far has been providing surgeons with an overview of the steps needed to implement a deep learning-based pipeline, emphasizing some of the important technical details for surgeons to understand as they encounter, evaluate or lead deep learning projects. However, this series would be remiss without providing practical examples of how deep learning models have begun to be deployed and highlighting the areas where the authors feel deep learning may have the most profound potential. While computer vision applications of deep learning were the focus of Parts I and II, due to the enormous impact that natural language processing (NLP) has had in recent months, NLP-based deep learning models are also discussed in this final part of the series. In this review, three applications that the authors believe can be impacted the most by deep learning but with which many surgeons may not be familiar are discussed: (1) registry construction, (2) diagnostic AI and (3) data privacy. Deep learning-based registry construction will be essential for the development of more impactful clinical applications, with diagnostic AI being one of those applications likely to augment clinical decision-making in the near future. As the applications of deep learning continue to grow, the protection of patient information will become increasingly essential; as such, applications of deep learning to enhance data privacy are likely to become more important than ever before. Level of Evidence: Level IV.

Collapse

Martín-Noguerol T, López-Úbeda P, Pons-Escoda A, Luna A. Natural language processing deep learning models for the differential between high-grade gliomas and metastasis: what if the key is how we report them? Eur Radiol 2024;34:2113-2120. [PMID: 37665389 DOI: 10.1007/s00330-023-10202-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 09/05/2023]

Abstract

OBJECTIVES

The differential between high-grade glioma (HGG) and metastasis remains challenging in common radiological practice. We compare different natural language processing (NLP)-based deep learning models to assist radiologists based on data contained in radiology reports.

METHODS

This retrospective study included 185 MRI reports between 2010 and 2022 from two different institutions. A total of 117 reports were used for the training and 21 were reserved for the validation set, while the rest were used as a test set. A comparison of the performance of different deep learning models for HGG and metastasis classification has been carried out. Specifically, Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), a hybrid version of BiLSTM and CNN, and a radiology-specific Bidirectional Encoder Representations from Transformers (RadBERT) model were used.

RESULTS

For the classification of MRI reports, the CNN network provided the best results among all tested, showing a macro-avg precision of 87.32%, a sensitivity of 87.45%, and an F1 score of 87.23%. In addition, our NLP algorithm detected keywords such as tumor, temporal, and lobe to positively classify a radiological report as HGG or metastasis group.

CONCLUSIONS

A deep learning model based on CNN enables radiologists to discriminate between HGG and metastasis based on MRI reports with high-precision values. This approach should be considered an additional tool in diagnosing these central nervous system lesions.

CLINICAL RELEVANCE STATEMENT

The use of our NLP model enables radiologists to differentiate between patients with high-grade glioma and metastasis based on their MRI reports and can be used as an additional tool to the conventional image-based approach for this challenging task.

KEY POINTS

• Differential between high-grade glioma and metastasis is still challenging in common radiological practice. • Natural language processing (NLP)-based deep learning models can assist radiologists based on data contained in radiology reports. • We have developed and tested a natural language processing model for discriminating between high-grade glioma and metastasis based on MRI reports that show high precision for this task.

Collapse

Chien A, Tang H, Jagessar B, Chang KW, Peng N, Nael K, Salamon N. AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical. AJNR Am J Neuroradiol 2024;45:244-248. [PMID: 38238092 DOI: 10.3174/ajnr.a8102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 11/09/2023] [Indexed: 02/09/2024]

Abstract

BACKGROUND AND PURPOSE

The review of clinical reports is an essential part of monitoring disease progression. Synthesizing multiple imaging reports is also important for clinical decisions. It is critical to aggregate information quickly and accurately. Machine learning natural language processing (NLP) models hold promise to address an unmet need for report summarization.

MATERIALS AND METHODS

We evaluated NLP methods to summarize longitudinal aneurysm reports. A total of 137 clinical reports and 100 PubMed case reports were used in this study. Models were 1) compared against expert-generated summary using longitudinal imaging notes collected in our institute and 2) compared using publicly accessible PubMed case reports. Five AI models were used to summarize the clinical reports, and a sixth model, the online GPT3davinci NLP large language model (LLM), was added for the summarization of PubMed case reports. We assessed the summary quality through comparison with expert summaries using quantitative metrics and quality reviews by experts.

RESULTS

In clinical summarization, BARTcnn had the best performance (BERTscore = 0.8371), followed by LongT5Booksum and LEDlegal. In the analysis using PubMed case reports, GPT3davinci demonstrated the best performance, followed by models BARTcnn and then LEDbooksum (BERTscore = 0.894, 0.872, and 0.867, respectively).

CONCLUSIONS

AI NLP summarization models demonstrated great potential in summarizing longitudinal aneurysm reports, though none yet reached the level of quality for clinical usage. We found the online GPT LLM outperformed the others; however, the BARTcnn model is potentially more useful because it can be implemented on-site. Future work to improve summarization, address other types of neuroimaging reports, and develop structured reports may allow NLP models to ease clinical workflow.

Collapse

Chae A, Yao MS, Sagreiya H, Goldberg AD, Chatterjee N, MacLean MT, Duda J, Elahi A, Borthakur A, Ritchie MD, Rader D, Kahn CE, Witschey WR, Gee JC. Strategies for Implementing Machine Learning Algorithms in the Clinical Practice of Radiology. Radiology 2024;310:e223170. [PMID: 38259208 PMCID: PMC10831483 DOI: 10.1148/radiol.223170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 08/24/2023] [Accepted: 08/29/2023] [Indexed: 01/24/2024]

Affiliation(s)

Allison Chae
Michael S. Yao
Hersh Sagreiya From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Ari D. Goldberg From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Neil Chatterjee From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Matthew T. MacLean From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Jeffrey Duda From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Ameena Elahi From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Arijitt Borthakur From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Marylyn D. Ritchie From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Daniel Rader From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Charles E. Kahn From the Departments of Bioengineering (M.S.Y.), Radiology (H.S., N.C., M.T.M., J.D., A.B., C.E.K., W.R.W., J.C.G.), Genetics (M.D.R.), and Medicine (D.R.), Perelman School of Medicine (A.C., M.S.Y., H.S., A.B., C.E.K., W.R.W., J.C.G.), University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; Department of Radiology, Loyola University Medical Center, Maywood, Ill (A.D.G.); Department of Information Services, University of Pennsylvania, Philadelphia, Pa (A.E.); and Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pa (A.B.)
Walter R. Witschey
James C. Gee

Collapse

Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024;310:e232756. [PMID: 38226883 DOI: 10.1148/radiol.232756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]

Bell LC, Shimron E. Sharing Data Is Essential for the Future of AI in Medical Imaging. Radiol Artif Intell 2024;6:e230337. [PMID: 38231036 PMCID: PMC10831510 DOI: 10.1148/ryai.230337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 11/16/2023] [Accepted: 11/20/2023] [Indexed: 01/18/2024]

dos Santos DP, Kotter E, Mildenberger P, Martí-Bonmatí L. ESR paper on structured reporting in radiology-update 2023. Insights Imaging 2023;14:199. [PMID: 37995019 PMCID: PMC10667169 DOI: 10.1186/s13244-023-01560-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/03/2023] [Indexed: 11/24/2023] Open

Abstract

Structured reporting in radiology continues to hold substantial potential to improve the quality of service provided to patients and referring physicians. Despite many physicians' preference for structured reports and various efforts by radiological societies and some vendors, structured reporting has still not been widely adopted in clinical routine.While in many countries national radiological societies have launched initiatives to further promote structured reporting, cross-institutional applications of report templates and incentives for usage of structured reporting are lacking. Various legislative measures have been taken in the USA and the European Union to promote interoperable data formats such as Fast Healthcare Interoperability Resources (FHIR) in the context of the EU Health Data Space (EHDS) which will certainly be relevant for the future of structured reporting. Lastly, recent advances in artificial intelligence and large language models may provide innovative and efficient approaches to integrate structured reporting more seamlessly into the radiologists' workflow.The ESR will remain committed to advancing structured reporting as a key component towards more value-based radiology. Practical solutions for structured reporting need to be provided by vendors. Policy makers should incentivize the usage of structured radiological reporting, especially in cross-institutional setting.Critical relevance statement Over the past years, the benefits of structured reporting in radiology have been widely discussed and agreed upon; however, implementation in clinical routine is lacking due-policy makers should incentivize the usage of structured radiological reporting, especially in cross-institutional setting.Key points1. Various national societies have established initiatives for structured reporting in radiology.2. Almost no monetary or structural incentives exist that favor structured reporting.3. A consensus on technical standards for structured reporting is still missing.4. The application of large language models may help structuring radiological reports.5. Policy makers should incentivize the usage of structured radiological reporting.

Collapse

Lu X, Chang EY, Du J, Yan A, McAuley J, Gentili A, Hsu CN. Robust Multi-View Fracture Detection in the Presence of Other Abnormalities Using HAMIL-Net. Mil Med 2023;188:590-597. [PMID: 37948284 DOI: 10.1093/milmed/usad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/31/2023] [Accepted: 06/26/2023] [Indexed: 11/12/2023] Open

Abstract

INTRODUCTION

Foot and ankle fractures are the most common military health problem. Automated diagnosis can save time and personnel. It is crucial to distinguish fractures not only from normal healthy cases, but also robust against the presence of other orthopedic pathologies. Artificial intelligence (AI) deep learning has been shown to be promising. Previously, we have developed HAMIL-Net to automatically detect orthopedic injuries for upper extremity injuries. In this research, we investigated the performance of HAMIL-Net for detecting foot and ankle fractures in the presence of other abnormalities.

MATERIALS AND METHODS

HAMIL-Net is a novel deep neural network consisting of a hierarchical attention layer followed by a multiple-instance learning layer. The design allowed it to deal with imaging studies with multiple views. We used 148K musculoskeletal imaging studies for 51K Veterans at VA San Diego in the past 20 years to create datasets for this research. We annotated each study by a semi-automated pipeline leveraging radiology reports written by board-certified radiologists and extracting findings with a natural language processing tool and manually validated the annotations.

RESULTS

HAMIL-Net can be trained with study-level, multiple-view examples, and detect foot and ankle fractures with a 0.87 area under the receiver operational curve, but the performance dropped when tested by cases including other abnormalities. By integrating a fracture specialized model with one that detecting a broad range of abnormalities, HAMIL-Net's accuracy of detecting any abnormality improved from 0.53 to 0.77 and F-score from 0.46 to 0.86. We also reported HAMIL-Net's performance under different study types including for young (age 18-35) patients.

CONCLUSIONS

Automated fracture detection is promising but to be deployed in clinical use, presence of other abnormalities must be considered to deliver its full benefit. Our results with HAMIL-Net showed that considering other abnormalities improved fracture detection and allowed for incidental findings of other musculoskeletal abnormalities pertinent or superimposed on fractures.

Collapse

Kim M, Ong KTI, Choi S, Yeo J, Kim S, Han K, Park JE, Kim HS, Choi YS, Ahn SS, Kim J, Lee SK, Sohn B. Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports. Eur Radiol 2023;33:8017-8025. [PMID: 37566271 DOI: 10.1007/s00330-023-10061-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/18/2023] [Accepted: 06/22/2023] [Indexed: 08/12/2023]

Abstract

OBJECTIVES

To evaluate the performance of natural language processing (NLP) models to predict isocitrate dehydrogenase (IDH) mutation status in diffuse glioma using routine MR radiology reports.

MATERIALS AND METHODS

This retrospective, multi-center study included consecutive patients with diffuse glioma with known IDH mutation status from May 2009 to November 2021 whose initial MR radiology report was available prior to pathologic diagnosis. Five NLP models (long short-term memory [LSTM], bidirectional LSTM, bidirectional encoder representations from transformers [BERT], BERT graph convolutional network [GCN], BioBERT) were trained, and area under the receiver operating characteristic curve (AUC) was assessed to validate prediction of IDH mutation status in the internal and external validation sets. The performance of the best performing NLP model was compared with that of the human readers.

RESULTS

A total of 1427 patients (mean age ± standard deviation, 54 ± 15; 779 men, 54.6%) with 720 patients in the training set, 180 patients in the internal validation set, and 527 patients in the external validation set were included. In the external validation set, BERT GCN showed the highest performance (AUC 0.85, 95% CI 0.81-0.89) in predicting IDH mutation status, which was higher than LSTM (AUC 0.77, 95% CI 0.72-0.81; p = .003) and BioBERT (AUC 0.81, 95% CI 0.76-0.85; p = .03). This was higher than that of a neuroradiologist (AUC 0.80, 95% CI 0.76-0.84; p = .005) and a neurosurgeon (AUC 0.79, 95% CI 0.76-0.84; p = .04).

CONCLUSION

BERT GCN was externally validated to predict IDH mutation status in patients with diffuse glioma using routine MR radiology reports with superior or at least comparable performance to human reader.

CLINICAL RELEVANCE STATEMENT

Natural language processing may be used to extract relevant information from routine radiology reports to predict cancer genotype and provide prognostic information that may aid in guiding treatment strategy and enabling personalized medicine.

KEY POINTS

• A transformer-based natural language processing (NLP) model predicted isocitrate dehydrogenase mutation status in diffuse glioma with an AUC of 0.85 in the external validation set. • The best NLP models were superior or at least comparable to human readers in both internal and external validation sets. • Transformer-based models showed higher performance than conventional NLP model such as long short-term memory.

Collapse

Affiliation(s)

Minjae Kim Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
Kai Tzu-Iunn Ong Department of Artificial Intelligence, College of Computing, Yonsei University, Seoul, Korea
Seonah Choi Department of Neurosurgery, Brain Tumor Center, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
Jinyoung Yeo Department of Artificial Intelligence, College of Computing, Yonsei University, Seoul, Korea
Sooyon Kim Department of Statistics and Data Science, Yonsei University, Seoul, Korea
Kyunghwa Han Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
Ji Eun Park Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
Ho Sung Kim Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
Yoon Seong Choi Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
Sung Soo Ahn Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
Jinna Kim Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
Seung-Koo Lee Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
Beomseok Sohn Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea. Department of Radiology and Center for Imaging Sciences, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea.

Collapse

Tejani AS. To BERT or not to BERT: advancing non-invasive prediction of tumor biomarkers using transformer-based natural language processing (NLP). Eur Radiol 2023;33:8014-8016. [PMID: 37740083 DOI: 10.1007/s00330-023-10224-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/27/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023]

Tan RSYC, Lin Q, Low GH, Lin R, Goh TC, Chang CCE, Lee FF, Chan WY, Tan WC, Tey HJ, Leong FL, Tan HQ, Nei WL, Chay WY, Tai DWM, Lai GGY, Cheng LTE, Wong FY, Chua MCH, Chua MLK, Tan DSW, Thng CH, Tan IBH, Ng HT. Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting. J Am Med Inform Assoc 2023;30:1657-1664. [PMID: 37451682 PMCID: PMC10531105 DOI: 10.1093/jamia/ocad133] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/18/2023] Open

Affiliation(s)

Ryan Shea Ying Cong Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Qian Lin Department of Computer Science, National University of Singapore, Singapore
Guat Hwa Low Division of Medical Oncology, National Cancer Centre Singapore, Singapore
Ruixi Lin Department of Computer Science, National University of Singapore, Singapore
Tzer Chew Goh Institute of Systems Science, National University of Singapore, Singapore
Christopher Chu En Chang Institute of Systems Science, National University of Singapore, Singapore
Fung Fung Lee Institute of Systems Science, National University of Singapore, Singapore
Wei Yin Chan Institute of Systems Science, National University of Singapore, Singapore
Wei Chong Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Han Jieh Tey Division of Medical Oncology, National Cancer Centre Singapore, Singapore
Fun Loon Leong Division of Medical Oncology, National Cancer Centre Singapore, Singapore
Hong Qi Tan Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
Wen Long Nei Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
Wen Yee Chay Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
David Wai Meng Tai Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Gillianne Geet Yi Lai Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore
Lionel Tim-Ee Cheng Duke-NUS Medical School, Singapore Department of Diagnostic Radiology, Singapore General Hospital, Singapore
Fuh Yong Wong Division of Radiation Oncology, National Cancer Centre Singapore, Singapore
Matthew Chin Heng Chua Yong Loo Lin School of Medicine, National University of Singapore, Singapore
Melvin Lee Kiang Chua Duke-NUS Medical School, Singapore Division of Radiation Oncology, National Cancer Centre Singapore, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore
Daniel Shao Weng Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Division of Clinical Trials and Epidemiological Sciences, National Cancer Centre Singapore, Singapore
Choon Hua Thng Duke-NUS Medical School, Singapore Division of Oncologic Imaging, National Cancer Centre Singapore, Singapore
Iain Bee Huat Tan Division of Medical Oncology, National Cancer Centre Singapore, Singapore Duke-NUS Medical School, Singapore Data and Computational Science Core, National Cancer Centre Singapore, Singapore
Hwee Tou Ng Department of Computer Science, National University of Singapore, Singapore

Collapse

Barrington NM, Gupta N, Musmar B, Doyle D, Panico N, Godbole N, Reardon T, D’Amico RS. A Bibliometric Analysis of the Rise of ChatGPT in Medical Research. Med Sci (Basel) 2023;11:61. [PMID: 37755165 PMCID: PMC10535733 DOI: 10.3390/medsci11030061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/04/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023] Open

Bernstein IA, Zhang Y(V, Govil D, Majid I, Chang RT, Sun Y, Shue A, Chou JC, Schehlein E, Christopher KL, Groth SL, Ludwig C, Wang SY. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw Open 2023;6:e2330320. [PMID: 37606922 PMCID: PMC10445188 DOI: 10.1001/jamanetworkopen.2023.30320] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/13/2023] [Indexed: 08/23/2023] Open

Abstract

Importance

Large language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients.

Objective

To evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice.

Design, Setting, and Participants

This cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)-affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023.

Main Outcomes and Measures

Identification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm.

Results

A total of 200 pairs of user questions and answers by AAO-affiliated ophthalmologists were evaluated. The mean (SD) accuracy for distinguishing between AI and human responses was 61.3% (9.7%). Of 800 evaluations of chatbot-written answers, 168 answers (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written. Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI (prevalence ratio [PR], 1.72; 95% CI, 1.52-1.93). The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers (PR, 0.92; 95% CI, 0.77-1.10), and did not differ from human answers in terms of likelihood of harm (PR, 0.84; 95% CI, 0.67-1.07) nor extent of harm (PR, 0.99; 95% CI, 0.80-1.22).

Conclusions and Relevance

In this cross-sectional study of human-written and AI-generated responses to 200 eye care questions from an online advice forum, a chatbot appeared capable of responding to long user-written eye health posts and largely generated appropriate responses that did not differ significantly from ophthalmologist-written responses in terms of incorrect information, likelihood of harm, extent of harm, or deviation from ophthalmologist community standards. Additional research is needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation, to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimizes harm.

Collapse

Oh JH, Tannenbaum A, Deasy JO. Improved prediction of drug-induced liver injury literature using natural language processing and machine learning methods. Front Genet 2023;14:1161047. [PMID: 37529777 PMCID: PMC10390074 DOI: 10.3389/fgene.2023.1161047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/29/2023] [Indexed: 08/03/2023] Open

Karabacak M, Margetis K. Embracing Large Language Models for Medical Applications: Opportunities and Challenges. Cureus 2023;15:e39305. [PMID: 37378099 PMCID: PMC10292051 DOI: 10.7759/cureus.39305] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2023] [Indexed: 06/29/2023] Open

Chng SY, Tern PJW, Kan MRX, Cheng LTE. Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods. HEALTH CARE SCIENCE 2023;2:120-128. [PMID: 38938764 PMCID: PMC11080679 DOI: 10.1002/hcs2.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 01/31/2023] [Accepted: 02/23/2023] [Indexed: 06/29/2024]

Improved Fine-Tuning of In-Domain Transformer Model for Inferring COVID-19 Presence in Multi-Institutional Radiology Reports. J Digit Imaging 2023;36:164-177. [PMID: 36323915 PMCID: PMC9629758 DOI: 10.1007/s10278-022-00714-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 09/05/2022] [Accepted: 10/03/2022] [Indexed: 11/06/2022] Open

Wiggins WF, Tejani AS. On the Opportunities and Risks of Foundation Models for Natural Language Processing in Radiology. Radiol Artif Intell 2022;4:e220119. [PMID: 35923379 PMCID: PMC9344208 DOI: 10.1148/ryai.220119] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 06/23/2022] [Accepted: 06/27/2022] [Indexed: 06/15/2023]