1
|
Park SH, Han K, Lee JG. Conceptual review of outcome metrics and measures used in clinical evaluation of artificial intelligence in radiology. LA RADIOLOGIA MEDICA 2024; 129:1644-1655. [PMID: 39225919 DOI: 10.1007/s11547-024-01886-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024]
Abstract
Artificial intelligence (AI) has numerous applications in radiology. Clinical research studies to evaluate the AI models are also diverse. Consequently, diverse outcome metrics and measures are employed in the clinical evaluation of AI, presenting a challenge for clinical radiologists. This review aims to provide conceptually intuitive explanations of the outcome metrics and measures that are most frequently used in clinical research, specifically tailored for clinicians. While we briefly discuss performance metrics for AI models in binary classification, detection, or segmentation tasks, our primary focus is on less frequently addressed topics in published literature. These include metrics and measures for evaluating multiclass classification; those for evaluating generative AI models, such as models used in image generation or modification and large language models; and outcome measures beyond performance metrics, including patient-centered outcome measures. Our explanations aim to guide clinicians in the appropriate use of these metrics and measures.
Collapse
Affiliation(s)
- Seong Ho Park
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea.
| | - Kyunghwa Han
- Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, South Korea
| | - June-Goo Lee
- Biomedical Engineering Research Center, Asan Institute for Life Sciences, University of Ulsan College of Medicine, Seoul, South Korea
| |
Collapse
|
2
|
Prescott MR, Yeager S, Ham L, Rivera Saldana CD, Serrano V, Narez J, Paltin D, Delgado J, Moore DJ, Montoya J. Comparing the Efficacy and Efficiency of Human and Generative AI: Qualitative Thematic Analyses. JMIR AI 2024; 3:e54482. [PMID: 39094113 PMCID: PMC11329846 DOI: 10.2196/54482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 03/25/2024] [Accepted: 06/06/2024] [Indexed: 08/04/2024]
Abstract
BACKGROUND Qualitative methods are incredibly beneficial to the dissemination and implementation of new digital health interventions; however, these methods can be time intensive and slow down dissemination when timely knowledge from the data sources is needed in ever-changing health systems. Recent advancements in generative artificial intelligence (GenAI) and their underlying large language models (LLMs) may provide a promising opportunity to expedite the qualitative analysis of textual data, but their efficacy and reliability remain unknown. OBJECTIVE The primary objectives of our study were to evaluate the consistency in themes, reliability of coding, and time needed for inductive and deductive thematic analyses between GenAI (ie, ChatGPT and Bard) and human coders. METHODS The qualitative data for this study consisted of 40 brief SMS text message reminder prompts used in a digital health intervention for promoting antiretroviral medication adherence among people with HIV who use methamphetamine. Inductive and deductive thematic analyses of these SMS text messages were conducted by 2 independent teams of human coders. An independent human analyst conducted analyses following both approaches using ChatGPT and Bard. The consistency in themes (or the extent to which the themes were the same) and reliability (or agreement in coding of themes) between methods were compared. RESULTS The themes generated by GenAI (both ChatGPT and Bard) were consistent with 71% (5/7) of the themes identified by human analysts following inductive thematic analysis. The consistency in themes was lower between humans and GenAI following a deductive thematic analysis procedure (ChatGPT: 6/12, 50%; Bard: 7/12, 58%). The percentage agreement (or intercoder reliability) for these congruent themes between human coders and GenAI ranged from fair to moderate (ChatGPT, inductive: 31/66, 47%; ChatGPT, deductive: 22/59, 37%; Bard, inductive: 20/54, 37%; Bard, deductive: 21/58, 36%). In general, ChatGPT and Bard performed similarly to each other across both types of qualitative analyses in terms of consistency of themes (inductive: 6/6, 100%; deductive: 5/6, 83%) and reliability of coding (inductive: 23/62, 37%; deductive: 22/47, 47%). On average, GenAI required significantly less overall time than human coders when conducting qualitative analysis (20, SD 3.5 min vs 567, SD 106.5 min). CONCLUSIONS The promising consistency in the themes generated by human coders and GenAI suggests that these technologies hold promise in reducing the resource intensiveness of qualitative thematic analysis; however, the relatively lower reliability in coding between them suggests that hybrid approaches are necessary. Human coders appeared to be better than GenAI at identifying nuanced and interpretative themes. Future studies should consider how these powerful technologies can be best used in collaboration with human coders to improve the efficiency of qualitative research in hybrid approaches while also mitigating potential ethical risks that they may pose.
Collapse
Affiliation(s)
- Maximo R Prescott
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA, United States
| | - Samantha Yeager
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
| | - Lillian Ham
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA, United States
| | - Carlos D Rivera Saldana
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- Department of Medicine, University of California, San Diego, San Diego, CA, United States
| | - Vanessa Serrano
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA, United States
| | - Joey Narez
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
| | - Dafna Paltin
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA, United States
| | - Jorge Delgado
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
| | - David J Moore
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States
| | - Jessica Montoya
- HIV Neurobehavioral Research Program, University of California, San Diego, San Diego, CA, United States
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
3
|
Kim K, Cho K, Jang R, Kyung S, Lee S, Ham S, Choi E, Hong GS, Kim N. Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals. Korean J Radiol 2024; 25:224-242. [PMID: 38413108 PMCID: PMC10912493 DOI: 10.3348/kjr.2023.0818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/27/2023] [Accepted: 12/28/2023] [Indexed: 02/29/2024] Open
Abstract
The emergence of Chat Generative Pre-trained Transformer (ChatGPT), a chatbot developed by OpenAI, has garnered interest in the application of generative artificial intelligence (AI) models in the medical field. This review summarizes different generative AI models and their potential applications in the field of medicine and explores the evolving landscape of Generative Adversarial Networks and diffusion models since the introduction of generative AI models. These models have made valuable contributions to the field of radiology. Furthermore, this review also explores the significance of synthetic data in addressing privacy concerns and augmenting data diversity and quality within the medical domain, in addition to emphasizing the role of inversion in the investigation of generative models and outlining an approach to replicate this process. We provide an overview of Large Language Models, such as GPTs and bidirectional encoder representations (BERTs), that focus on prominent representatives and discuss recent initiatives involving language-vision models in radiology, including innovative large language and vision assistant for biomedicine (LLaVa-Med), to illustrate their practical application. This comprehensive review offers insights into the wide-ranging applications of generative AI models in clinical research and emphasizes their transformative potential.
Collapse
Affiliation(s)
- Kiduk Kim
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Kyungjin Cho
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | | | - Sunggu Kyung
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Soyoung Lee
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sungwon Ham
- Healthcare Readiness Institute for Unified Korea, Korea University Ansan Hospital, Korea University College of Medicine, Ansan, Republic of Korea
| | - Edward Choi
- Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Gil-Sun Hong
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.
| | - Namkug Kim
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.
| |
Collapse
|
4
|
Park SH. Use of Generative Artificial Intelligence, Including Large Language Models Such as ChatGPT, in Scientific Publications: Policies of KJR and Prominent Authorities. Korean J Radiol 2023; 24:715-718. [PMID: 37500572 PMCID: PMC10400373 DOI: 10.3348/kjr.2023.0643] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 07/10/2023] [Indexed: 07/29/2023] Open
Affiliation(s)
- Seong Ho Park
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
| |
Collapse
|
5
|
Jang M, Bae HJ, Kim M, Park SY, Son AY, Choi SJ, Choe J, Choi HY, Hwang HJ, Noh HN, Seo JB, Lee SM, Kim N. Image Turing test and its applications on synthetic chest radiographs by using the progressive growing generative adversarial network. Sci Rep 2023; 13:2356. [PMID: 36759636 PMCID: PMC9911730 DOI: 10.1038/s41598-023-28175-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 01/13/2023] [Indexed: 02/11/2023] Open
Abstract
The generative adversarial network (GAN) is a promising deep learning method for generating images. We evaluated the generation of highly realistic and high-resolution chest radiographs (CXRs) using progressive growing GAN (PGGAN). We trained two PGGAN models using normal and abnormal CXRs, solely relying on normal CXRs to demonstrate the quality of synthetic CXRs that were 1000 × 1000 pixels in size. Image Turing tests were evaluated by six radiologists in a binary fashion using two independent validation sets to judge the authenticity of each CXR, with a mean accuracy of 67.42% and 69.92% for the first and second trials, respectively. Inter-reader agreements were poor for the first (κ = 0.10) and second (κ = 0.14) Turing tests. Additionally, a convolutional neural network (CNN) was used to classify normal or abnormal CXR using only real images and/or synthetic images mixed datasets. The accuracy of the CNN model trained using a mixed dataset of synthetic and real data was 93.3%, compared to 91.0% for the model built using only the real data. PGGAN was able to generate CXRs that were identical to real CXRs, and this showed promise to overcome imbalances between classes in CNN training.
Collapse
Affiliation(s)
- Miso Jang
- Department of Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea
- Department of Biomedical Engineering, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | | | - Minjee Kim
- Promedius Inc., Seoul, Republic of Korea
| | - Seo Young Park
- Department of Statistics and Data Science, Korea National Open University, Seoul, Republic of Korea
| | - A-Yeon Son
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Se Jin Choi
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Jooae Choe
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Hye Young Choi
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Hye Jeon Hwang
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Han Na Noh
- Department of Health Screening and Promotion Center, Asan Medical Center, Seoul, Republic of Korea
| | - Joon Beom Seo
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea
| | - Sang Min Lee
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea.
| | - Namkug Kim
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine and Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea.
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea.
| |
Collapse
|
6
|
Park SH. Looking Back at 2022 and ahead to 2023 for the Korean Journal of Radiology. Korean J Radiol 2023; 24:15-18. [PMID: 36606615 PMCID: PMC9830144 DOI: 10.3348/kjr.2022.0963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 12/03/2022] [Indexed: 01/03/2023] Open
Affiliation(s)
- Seong Ho Park
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.
| |
Collapse
|