1
|
Chen Z, Chambara N, Wu C, Lo X, Liu SYW, Gunda ST, Han X, Qu J, Chen F, Ying MTC. Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images. Endocrine 2025; 87:1041-1049. [PMID: 39394537 PMCID: PMC11845565 DOI: 10.1007/s12020-024-04066-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 10/02/2024] [Indexed: 10/13/2024]
Abstract
PURPOSE Large language models (LLMs) are pivotal in artificial intelligence, demonstrating advanced capabilities in natural language understanding and multimodal interactions, with significant potential in medical applications. This study explores the feasibility and efficacy of LLMs, specifically ChatGPT-4o and Claude 3-Opus, in classifying thyroid nodules using ultrasound images. METHODS This study included 112 patients with a total of 116 thyroid nodules, comprising 75 benign and 41 malignant cases. Ultrasound images of these nodules were analyzed using ChatGPT-4o and Claude 3-Opus to diagnose the benign or malignant nature of the nodules. An independent evaluation by a junior radiologist was also conducted. Diagnostic performance was assessed using Cohen's Kappa and receiver operating characteristic (ROC) curve analysis, referencing pathological diagnoses. RESULTS ChatGPT-4o demonstrated poor agreement with pathological results (Kappa = 0.116), while Claude 3-Opus showed even lower agreement (Kappa = 0.034). The junior radiologist exhibited moderate agreement (Kappa = 0.450). ChatGPT-4o achieved an area under the ROC curve (AUC) of 57.0% (95% CI: 48.6-65.5%), slightly outperforming Claude 3-Opus (AUC of 52.0%, 95% CI: 43.2-60.9%). In contrast, the junior radiologist achieved a significantly higher AUC of 72.4% (95% CI: 63.7-81.1%). The unnecessary biopsy rates were 41.4% for ChatGPT-4o, 43.1% for Claude 3-Opus, and 12.1% for the junior radiologist. CONCLUSION While LLMs such as ChatGPT-4o and Claude 3-Opus show promise for future applications in medical imaging, their current use in clinical diagnostics should be approached cautiously due to their limited accuracy.
Collapse
Affiliation(s)
- Ziman Chen
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.
| | | | - Chaoqun Wu
- Department of Ultrasound, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Xina Lo
- Department of Surgery, North District Hospital, Sheung Shui, New Territories, Hong Kong, China
| | - Shirley Yuk Wah Liu
- Department of Surgery, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong, China
| | - Simon Takadiyi Gunda
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
| | - Xinyang Han
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
| | - Jingguo Qu
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
| | - Fei Chen
- Department of Ultrasound, The Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China.
| | - Michael Tin Cheung Ying
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.
| |
Collapse
|
2
|
Du L, Liu H, Cai M, Pan J, Zha H, Nie C, Lin M, Li C, Zong M, Zhang B. Ultrasound S-detect system can improve diagnostic performance of less experienced radiologists in differentiating breast masses: a retrospective dual-centre study. Br J Radiol 2025; 98:404-411. [PMID: 39535865 DOI: 10.1093/bjr/tqae233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 09/13/2024] [Accepted: 11/09/2024] [Indexed: 11/16/2024] Open
Abstract
OBJECTIVE To compare the performance of radiologists when assisted by an S-detect system with that of radiologists or an S-detect system alone in diagnosing breast masses on US images in a dual-centre setting. METHODS US images were retrospectively identified 296 breast masses (150 benign, 146 malignant) by investigators at 2 medical centres. Six radiologists from the 2 centres independently analysed the US images and classified each mass into categories 2-5. The radiologists then re-reviewed the images with the use of the S-detect system. The diagnostic value of radiologists alone, S-detect alone, and radiologists + S-detect were analysed and compared. RESULTS Radiologists had significantly decreased the average false negative rate (FNR) for diagnosing breast masses using S-detect system (-10.7%) (P < .001) and increased the area under the receiver operating characteristic curve (AUC) from 0.743 to 0.788 (P < .001). Seventy-seven out of 888 US images from 6 radiologists in this study were changed positively (from false positive to true negative or from false negative to true positive) with the S-detect, whereas 39 out of 888 US images were altered negatively. CONCLUSION Radiologists had better performance for the diagnosis of malignant breast masses on US images with an S-detect system than without. ADVANCES IN KNOWLEDGE The study reported an improvement in sensitivity and AUC particularly for low to intermediate-level radiologists, involved cases and radiologists from 2 different centres, and compared the diagnostic value of using S-detect system for masses of different sizes.
Collapse
Affiliation(s)
- Liwen Du
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Hongli Liu
- Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Mengjun Cai
- Department of Ultrasound, The Affiliated Drum Tower Hospital, Medical School of Nanjing University, Nanjing 210008, China
| | - Jiazhen Pan
- Department of Ultrasound, Jiangsu Cancer Hospital, Nanjing 210009, China
| | - Hailing Zha
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Chenlei Nie
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Minjia Lin
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Cuiying Li
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Min Zong
- Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Bo Zhang
- Department of Ultrasound, Shanghai East Clinical Medical College, Nanjing Medical University, Nanjing 211166, China
| |
Collapse
|
3
|
Chen L, Zhang M, Luo Y. Ultrasound radiomics and genomics improve the diagnosis of cytologically indeterminate thyroid nodules. Front Endocrinol (Lausanne) 2025; 16:1529948. [PMID: 40093750 PMCID: PMC11906326 DOI: 10.3389/fendo.2025.1529948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 02/12/2025] [Indexed: 03/19/2025] Open
Abstract
Background Increasing numbers of cytologically indeterminate thyroid nodules (ITNs) present challenges for preoperative diagnosis, often leading to unnecessary diagnostic surgical procedures for nodules that prove benign. Research in ultrasound radiomics and genomic testing leverages high-throughput data and image or sequence algorithms to establish assisted models or testing panels for ITN diagnosis. Many radiomics models now demonstrate diagnostic accuracy above 80% and sensitivity over 90%, surpassing the performance of less experienced radiologists and, in some cases, matching the accuracy of experienced radiologists. Molecular testing panels have helped clinicians achieve accurate diagnoses of ITNs, preventing unnecessary diagnostic surgical procedures in 42%-61% of patients with benign nodules. Objective In this review, we examined studies on ultrasound radiomics and genomic molecular testing for cytological ITNs conducted over the past 5 years, aiming to provide insights for researchers focused on improving ITN diagnosis. Conclusion Radiomics models and molecular testing have enhanced diagnostic accuracy before surgery and reduced unnecessary diagnostic surgical procedures for ITN patients.
Collapse
Affiliation(s)
| | - Mingbo Zhang
- Department of Ultrasound, The First Medical Center of Chinese People’s Liberation Army (PLA) of China General Hospital, Beijing, China
| | - Yukun Luo
- Department of Ultrasound, The First Medical Center of Chinese People’s Liberation Army (PLA) of China General Hospital, Beijing, China
| |
Collapse
|
4
|
Lin Y, Cheng Y, Zhang Y, Ren X, Li J, Shi H, Li Y, Luo Y, Wang H. The value of Korean, American, and Chinese ultrasound risk stratification systems combined with BRAF(V600E) mutation for detecting papillary thyroid carcinoma in cytologically indeterminate thyroid nodules. Endocrine 2024; 84:549-559. [PMID: 37940765 DOI: 10.1007/s12020-023-03586-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/24/2023] [Indexed: 11/10/2023]
Abstract
PURPOSE To investigate the value of Korean, American, and Chinese ultrasound risk stratification systems combined with BRAF(V600E) mutation in the detection of papillary thyroid carcinoma (PTC)within cytologically indeterminate thyroid nodules (CITNs). METHODS A single-center retrospective study encompassed 511 CITNs selected from 509 patients between January 2020 and July 2023.Each nodule underwent surgical treatment and was classified according to three distinct systems. Receiver operating characteristic (ROC) curves were plotted using histopathological diagnosis as the reference standard, and diagnostic performance was compared. RESULTS The three ultrasound stratification systems showed an elevated malignant risk with increasing grades (all P for trend2 < 0.001). The cut-off values for Korean, American, and Chinese systems were 5, 5, and 4c, and their respective area under the curves (AUCs) were 0.735, 0.778, and 0.783.The combination of BRAF (V600E) mutation significantly enhanced the diagnostic efficacy for the Korean(0.773vs0.735, P < 0.001), American (0.809vs0.778, P < 0.001) and Chinese (0.815vs0.783, P < 0.001) stratification systems in distinguishing CITNs without compromising specificity. When the three stratification systems were applied individually or combined with BRAF (V600E) mutation, the AUCs of the American and Chinese systems were similar (all P > 0.05), both of which were higher than the AUC of the Korean system (all P < 0.05). The American system exhibited higher specificity compared to the Chinese and Korean systems (all P < 0.001), whereas the Chinese system demonstrated higher sensitivity and accuracy when compared to the American and Korean systems (all P < 0.001). CONCLUSION Korean, American and Chinese stratification systems present potential in the differential diagnosis of CITNs. BRAF (V600E) mutation can significantly improve the detection rate of malignant nodules within CTNs, particularly PTC. Notably, the American and Chinese systems demonstrate superior overall diagnostic performance among these systems.
Collapse
Affiliation(s)
- Yu Lin
- Department of Pathology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
- Medical School of Chinese PLA, Beijing, China
| | - Yiming Cheng
- Medical School of Chinese PLA, Beijing, China
- Department of Ultrasound, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Yan Zhang
- Department of Ultrasound, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Xiuyun Ren
- Department of Ultrasound, Hainan Hospital, Chinese PLA General Hospital, Sanya, China
| | - Jie Li
- Department of Pathology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Huaiyin Shi
- Department of Pathology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Yuxin Li
- Department of Pathology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Yukun Luo
- Department of Ultrasound, The First Medical Center, Chinese PLA General Hospital, Beijing, China.
| | - Hongwei Wang
- Department of Pathology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China.
| |
Collapse
|
5
|
Cong P, Wang XM, Zhang YF. Comparison of artificial intelligence, elastic imaging, and the thyroid imaging reporting and data system in the differential diagnosis of suspicious nodules. Quant Imaging Med Surg 2024; 14:711-721. [PMID: 38223033 PMCID: PMC10784040 DOI: 10.21037/qims-23-788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/16/2023] [Indexed: 01/16/2024]
Abstract
Background Ultrasound is widely used for detecting thyroid nodules in clinical practice. This retrospective study aimed to assess the diagnostic efficacy of the American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS), S-Detect, and elastography of the carotid artery for suspicious thyroid nodules and to determine the complementary value of artificial intelligence and elastography. Methods Between January 2021 and November 2021, 101 consecutive patients with 138 thyroid nodules were enrolled in The First Hospital of China Medical University. All nodules were evaluated using ACR-TIRADS categories (TR), S-Detect, and elastography, and then the diagnostic performance of the different methods and the combined assessment were compared. The inclusion criteria were the following: (I) TR3, TR4, and TR5 nodules, which were defined as "suspicious nodules"; (II) patients who had surgical or cytopathological results after ultrasound examination; and (III) voluntary enrollment in this study. Meanwhile, the exclusion criteria were the following: (I) TR1 and TR2 nodules, (II) patients who had undergone fine-needle aspiration before ultrasound examination, and (III) inconclusive cytologic findings. Results A total of 71 patients (12 men and 59 women) with 94 suspicious thyroid nodules (42 benign nodules and 52 malignant nodules) were finally included in this study. S-Detect had a significantly better sensitivity than did ACR-TIRADS [S-Detect: 98.1%, 95% confidence interval (CI): 89.7-100.0%; ACR-TIRADS: 84.6%, 95% CI: 71.9-93.1%; P=0.036], but its specificity was much lower (S-Detect: 19.0%; 95% CI: 8.6-34.1%; ACR-TIRADS: 40.5%, 95% CI: 25.6-56.7%; P=0.032). The accuracy was not significantly different between S-Detect (62.8%; 95% CI: 52.2-72.5%) and ACR-TIRADS (64.9%; 95% CI: 54.4-74.5%) (P=0.761). The elasticity contrast index (ECI) was not definitively useful in identifying suspicious thyroid nodules (P=0.592). Compared with the use of ACR-TIRADS and S-Detect alone, the specificity (45.2%; 95% CI: 29.8-61.3%), positive predictive value (65.2%; 95% CI: 52.4-76.5%), accuracy (66.0%; 95% CI: 55.5-75.4%), and the area under the receiver operating characteristic curve (0.640; 95% CI: 0.534-0.736) of their combination were higher but not significantly so. Conclusions At present, S-Detect cannot replace manual diagnosis, and the value of elastography of the carotid artery in diagnosing suspected thyroid nodules remains unclear.
Collapse
Affiliation(s)
- Peng Cong
- Department of Ultrasound, The First Hospital of China Medical University, Shenyang, China
| | - Xue-Mei Wang
- Department of Ultrasound, The First Hospital of China Medical University, Shenyang, China
| | | |
Collapse
|
6
|
Yang L, Li C, Chen Z, He S, Wang Z, Liu J. Diagnostic efficiency among Eu-/C-/ACR-TIRADS and S-Detect for thyroid nodules: a systematic review and network meta-analysis. Front Endocrinol (Lausanne) 2023; 14:1227339. [PMID: 37720531 PMCID: PMC10501732 DOI: 10.3389/fendo.2023.1227339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/16/2023] [Indexed: 09/19/2023] Open
Abstract
Background The performance in evaluating thyroid nodules on ultrasound varies across different risk stratification systems, leading to inconsistency and uncertainty regarding diagnostic sensitivity, specificity, and accuracy. Objective Comparing diagnostic performance of detecting thyroid cancer among distinct ultrasound risk stratification systems proposed in the last five years. Evidence acquisition Systematic search was conducted on PubMed, EMBASE, and Web of Science databases to find relevant research up to December 8, 2022, whose study contents contained elucidation of diagnostic performance of any one of the above ultrasound risk stratification systems (European Thyroid Imaging Reporting and Data System[Eu-TIRADS]; American College of Radiology TIRADS [ACR TIRADS]; Chinese version of TIRADS [C-TIRADS]; Computer-aided diagnosis system based on deep learning [S-Detect]). Based on golden diagnostic standard in histopathology and cytology, single meta-analysis was performed to obtain the optimal cut-off value for each system, and then network meta-analysis was conducted on the best risk stratification category in each system. Evidence synthesis This network meta-analysis included 88 studies with a total of 59,304 nodules. The most accurate risk category thresholds were TR5 for Eu-TIRADS, TR5 for ACR TIRADS, TR4b and above for C-TIRADS, and possible malignancy for S-Detect. At the best thresholds, sensitivity of these systems ranged from 68% to 82% and specificity ranged from 71% to 81%. It identified the highest sensitivity for C-TIRADS TR4b and the highest specificity for ACR TIRADS TR5. However, sensitivity for ACR TIRADS TR5 was the lowest. The diagnostic odds ratio (DOR) and area under curve (AUC) were ranked first in C-TIRADS. Conclusion Among four ultrasound risk stratification options, this systemic review preliminarily proved that C-TIRADS possessed favorable diagnostic performance for thyroid nodules. Systematic review registration https://www.crd.york.ac.uk/prospero, CRD42022382818.
Collapse
Affiliation(s)
- Longtao Yang
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Cong Li
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Zhe Chen
- Department of Thoracic Surgery, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Shaqi He
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Zhiyuan Wang
- Department of Ultrasound, The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China
| | - Jun Liu
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China
- Clinical Research Center for Medical Imaging in Hunan Province, Changsha, China
- Department of Radiology Quality Control Center in Hunan Province, Changsha, China
| |
Collapse
|