1
|
Grenier PA, Brun AL, Mellot F. [The contribution of artificial intelligence (AI) subsequent to the processing of thoracic imaging]. Rev Mal Respir 2024; 41:110-126. [PMID: 38129269 DOI: 10.1016/j.rmr.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023]
Abstract
The contribution of artificial intelligence (AI) to medical imaging is currently the object of widespread experimentation. The development of deep learning (DL) methods, particularly convolution neural networks (CNNs), has led to performance gains often superior to those achieved by conventional methods such as machine learning. Radiomics is an approach aimed at extracting quantitative data not accessible to the human eye from images expressing a disease. The data subsequently feed machine learning models and produce diagnostic or prognostic probabilities. As for the multiple applications of AI methods in thoracic imaging, they are undergoing evaluation. Chest radiography is a practically ideal field for the development of DL algorithms able to automatically interpret X-rays. Current algorithms can detect up to 14 different abnormalities present either in isolation or in combination. Chest CT is another area offering numerous AI applications. Various algorithms have been specifically formed and validated for the detection and characterization of pulmonary nodules and pulmonary embolism, as well as segmentation and quantitative analysis of the extent of diffuse lung diseases (emphysema, infectious pneumonias, interstitial lung disease). In addition, the analysis of medical images can be associated with clinical, biological, and functional data (multi-omics analysis), the objective being to construct predictive approaches regarding disease prognosis and response to treatment.
Collapse
Affiliation(s)
- P A Grenier
- Délégation à la recherche clinique et l'innovation, hôpital Foch, Suresnes, France.
| | - A L Brun
- Service de radiologie, hôpital Foch, Suresnes, France
| | - F Mellot
- Service de radiologie, hôpital Foch, Suresnes, France
| |
Collapse
|
2
|
Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon 2023; 9:e23050. [PMID: 38144348 PMCID: PMC10746423 DOI: 10.1016/j.heliyon.2023.e23050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 10/24/2023] [Accepted: 11/24/2023] [Indexed: 12/26/2023] Open
Abstract
Since its release, ChatGPT has taken the world by storm with its utilization in various fields of life. This review's main goal was to offer a thorough and fact-based evaluation of ChatGPT's potential as a tool for medical and dental research, which could direct subsequent research and influence clinical practices. METHODS Different online databases were scoured for relevant articles that were in accordance with the study objectives. A team of reviewers was assembled to devise a proper methodological framework for inclusion of articles and meta-analysis. RESULTS 11 descriptive studies were considered for this review that evaluated the accuracy of ChatGPT in answering medical queries related to different domains such as systematic reviews, cancer, liver diseases, diagnostic imaging, education, and COVID-19 vaccination. The studies reported different accuracy ranges, from 18.3 % to 100 %, across various datasets and specialties. The meta-analysis showed an odds ratio (OR) of 2.25 and a relative risk (RR) of 1.47 with a 95 % confidence interval (CI), indicating that the accuracy of ChatGPT in providing correct responses was significantly higher compared to the total responses for queries. However, significant heterogeneity was present among the studies, suggesting considerable variability in the effect sizes across the included studies. CONCLUSION The observations indicate that ChatGPT has the ability to provide appropriate solutions to questions in the medical and dentistry areas, but researchers and doctors should cautiously assess its responses because they might not always be dependable. Overall, the importance of this study rests in shedding light on ChatGPT's accuracy in the medical and dentistry fields and emphasizing the need for additional investigation to enhance its performance. © 2017 Elsevier Inc. All rights reserved.
Collapse
Affiliation(s)
- Hiroj Bagde
- Department of Periodontology, Chhattisgarh Dental College and Research Institute, Rajnandgaon, Chhattisgarh, India
| | - Ashwini Dhopte
- Department of Oral Medicine and Radiology, Chhattisgarh Dental College and Research Institute, Rajnandgaon, Chhattisgarh, India
| | - Mohammad Khursheed Alam
- Preventive Dentistry Department, College of Dentistry, Jouf University, Sakaka, 72345, Saudi Arabia
- Department of Dental Research Cell, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Chennai, India
- Department of Public Health, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, Bangladesh
| | - Rehana Basri
- Department of Internal Medicine, College of Medicine, Jouf University, Sakaka, 72345, Saudi Arabia
| |
Collapse
|
3
|
Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer K, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res 2023; 25:e48659. [PMID: 37606976 PMCID: PMC10481210 DOI: 10.2196/48659] [Citation(s) in RCA: 48] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/23/2023] Open
Abstract
BACKGROUND Large language model (LLM)-based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated. OBJECTIVE This study aimed to evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. METHODS We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT's performance on clinical tasks. RESULTS ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%; P<.001) and clinical management (β=-7.4%; P=.02) question types. CONCLUSIONS ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT's training data set.
Collapse
Affiliation(s)
- Arya Rao
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Michael Pang
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - John Kim
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Meghana Kamineni
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Winston Lie
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Anoop K Prasad
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Adam Landman
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Brigham and Women's Hospital, Boston, MA, United States
| | - Keith Dreyer
- Harvard Medical School, Boston, MA, United States
- Data Science Office, Mass General Brigham, Boston, MA, United States
| | - Marc D Succi
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
- Mass General Brigham Innovation, Mass General Brigham, Boston, MA, United States
| |
Collapse
|
4
|
Stanzione A, Cuocolo R, Bombace C, Pesce I, Mainolfi CG, De Giorgi M, Delli Paoli G, La Selva P, Petrone J, Camera L, Klain M, Del Vecchio S, Cuocolo A, Maurea S. Prediction of 2-[ 18F]FDG PET-CT SUVmax for Adrenal Mass Characterization: A CT Radiomics Feasibility Study. Cancers (Basel) 2023; 15:3439. [PMID: 37444549 DOI: 10.3390/cancers15133439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 06/20/2023] [Accepted: 06/28/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Indeterminate adrenal masses (AM) pose a diagnostic challenge, and 2-[18F]FDG PET-CT serves as a problem-solving tool. Aim of this study was to investigate whether CT radiomics features could be used to predict the 2-[18F]FDG SUVmax of AM. METHODS Patients with AM on 2-[18F]FDG PET-CT scan were grouped based on iodine contrast injection as CT contrast-enhanced (CE) or CT unenhanced (NCE). Two-dimensional segmentations of AM were manually obtained by multiple operators on CT images. Image resampling and discretization (bin number = 16) were performed. 919 features were calculated using PyRadiomics. After scaling, unstable, redundant, and low variance features were discarded. Using linear regression and the Uniform Manifold Approximation and Projection technique, a CT radiomics synthetic value (RadSV) was obtained. The correlation between CT RadSV and 2-[18F]FDG SUVmax was assessed with Pearson test. RESULTS A total of 725 patients underwent PET-CT from April 2020 to April 2021. In 150 (21%) patients, a total of 179 AM (29 bilateral) were detected. Group CE consisted of 84 patients with 108 AM (size = 18.1 ± 4.9 mm) and Group NCE of 66 patients with 71 AM (size = 18.5 ± 3.8 mm). In both groups, 39 features were selected. No statisticallyf significant correlation between CT RadSV and 2-[18F]FDG SUVmax was found (Group CE, r = 0.18 and p = 0.058; Group NCE, r = 0.13 and p = 0.27). CONCLUSIONS It might not be feasible to predict 2-[18F]FDG SUVmax of AM using CT RadSV. Its role as a problem-solving tool for indeterminate AM remains fundamental.
Collapse
Affiliation(s)
- Arnaldo Stanzione
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Renato Cuocolo
- Department of Medicine, Surgery and Dentistry, University of Salerno, 84084 Baronissi, Italy
| | - Claudia Bombace
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Ilaria Pesce
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Ciro Gabriele Mainolfi
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Marco De Giorgi
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Gregorio Delli Paoli
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Pasquale La Selva
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Jessica Petrone
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Luigi Camera
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Michele Klain
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Silvana Del Vecchio
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Alberto Cuocolo
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| | - Simone Maurea
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131 Naples, Italy
| |
Collapse
|
5
|
Yoon MS, Kwon G, Oh J, Ryu J, Lim J, Kang BK, Lee J, Han DK. Effect of Contrast Level and Image Format on a Deep Learning Algorithm for the Detection of Pneumothorax with Chest Radiography. J Digit Imaging 2023; 36:1237-1247. [PMID: 36698035 PMCID: PMC10287877 DOI: 10.1007/s10278-022-00772-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 12/23/2022] [Accepted: 12/29/2022] [Indexed: 01/26/2023] Open
Abstract
Under the black-box nature in the deep learning model, it is uncertain how the change in contrast level and format affects the performance. We aimed to investigate the effect of contrast level and image format on the effectiveness of deep learning for diagnosing pneumothorax on chest radiographs. We collected 3316 images (1016 pneumothorax and 2300 normal images), and all images were set to the standard contrast level (100%) and stored in the Digital Imaging and Communication in Medicine and Joint Photographic Experts Group (JPEG) formats. Data were randomly separated into 80% of training and 20% of test sets, and the contrast of images in the test set was changed to 5 levels (50%, 75%, 100%, 125%, and 150%). We trained the model to detect pneumothorax using ResNet-50 with 100% level images and tested with 5-level images in the two formats. While comparing the overall performance between each contrast level in the two formats, the area under the receiver-operating characteristic curve (AUC) was significantly different (all p < 0.001) except between 125 and 150% in JPEG format (p = 0.382). When comparing the two formats at same contrast levels, AUC was significantly different (all p < 0.001) except 50% and 100% (p = 0.079 and p = 0.082, respectively). The contrast level and format of medical images could influence the performance of the deep learning model. It is required to train with various contrast levels and formats of image, and further image processing for improvement and maintenance of the performance.
Collapse
Affiliation(s)
- Myeong Seong Yoon
- Department of Emergency Medicine, College of Medicine, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
- Machine Learning Research Center for Medical Data, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
- Department of Radiological Science, Eulji University, 553 Sanseong-daero, Seongnam-si, Gyeonggi Do, 13135, Republic of Korea
| | - Gitaek Kwon
- Department of Computer Science, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
- VUNO, Inc, 479 Gangnam-daero, Seocho-gu, Seoul, 06541, Republic of Korea
| | - Jaehoon Oh
- Department of Emergency Medicine, College of Medicine, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea.
- Machine Learning Research Center for Medical Data, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea.
| | - Jongbin Ryu
- Department of Software and Computer Engineering, Ajou University, 206 World cup-ro, Suwon-si, Gyeonggi Do, 16499, Republic of Korea.
| | - Jongwoo Lim
- Department of Computer Science, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
- Machine Learning Research Center for Medical Data, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
| | - Bo-Kyeong Kang
- Machine Learning Research Center for Medical Data, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
- Department of Radiology, College of Medicine, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
| | - Juncheol Lee
- Department of Emergency Medicine, College of Medicine, Hanyang University, 222 Wangsimni-Ro, Seongdong-Gu, Seoul, 04763, Republic of Korea
| | - Dong-Kyoon Han
- Department of Radiological Science, Eulji University, 553 Sanseong-daero, Seongnam-si, Gyeonggi Do, 13135, Republic of Korea
| |
Collapse
|
6
|
Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer KJ, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23285886. [PMID: 36865204 PMCID: PMC9980239 DOI: 10.1101/2023.02.21.23285886] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
IMPORTANCE Large language model (LLM) artificial intelligence (AI) chatbots direct the power of large training datasets towards successive, related tasks, as opposed to single-ask tasks, for which AI already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as virtual physicians, has not yet been evaluated. OBJECTIVE To evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. DESIGN We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. SETTING ChatGPT, a publicly available LLM. PARTICIPANTS Clinical vignettes featured hypothetical patients with a variety of age and gender identities, and a range of Emergency Severity Indices (ESIs) based on initial clinical presentation. EXPOSURES MSD Clinical Manual vignettes. MAIN OUTCOMES AND MEASURES We measured the proportion of correct responses to the questions posed within the clinical vignettes tested. RESULTS ChatGPT achieved 71.7% (95% CI, 69.3% to 74.1%) accuracy overall across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI, 67.8% to 86.1%), and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI, 54.2% to 66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%, p<0.001) and clinical management (β=-7.4%, p=0.02) type questions. CONCLUSIONS AND RELEVANCE ChatGPT achieves impressive accuracy in clinical decision making, with particular strengths emerging as it has more clinical information at its disposal.
Collapse
|
7
|
Developing medical imaging AI for emerging infectious diseases. Nat Commun 2022; 13:7060. [PMID: 36400764 PMCID: PMC9672573 DOI: 10.1038/s41467-022-34234-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 10/19/2022] [Indexed: 11/19/2022] Open
Abstract
Very few of the COVID-19 ML models were fit for deployment in real-world settings. In this Comment, Huang et al. discuss the main steps required to develop clinically useful models in the context of an emerging infectious disease.
Collapse
|