1
|
Wakonig KM, Barisch S, Kozarzewski L, Dommerich S, Lerchbaumer MH. Comparing ChatGPT 4.0's Performance in Interpreting Thyroid Nodule Ultrasound Reports Using ACR-TI-RADS 2017: Analysis Across Different Levels of Ultrasound User Experience. Diagnostics (Basel) 2025; 15:635. [PMID: 40075883 PMCID: PMC11899695 DOI: 10.3390/diagnostics15050635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2025] [Revised: 02/24/2025] [Accepted: 02/26/2025] [Indexed: 03/14/2025] Open
Abstract
Background/Objectives: This study evaluates ChatGPT 4.0's ability to interpret thyroid ultrasound (US) reports using ACR-TI-RADS 2017 criteria, comparing its performance with different levels of US users. Methods: A team of medical experts, an inexperienced US user, and ChatGPT 4.0 analyzed 100 fictitious thyroid US reports. ChatGPT's performance was assessed for accuracy, consistency, and diagnostic recommendations, including fine-needle aspirations (FNA) and follow-ups. Results: ChatGPT demonstrated substantial agreement with experts in assessing echogenic foci, but inconsistencies in other criteria, such as composition and margins, were evident in both its analyses. Interrater reliability between ChatGPT and experts ranged from moderate to almost perfect, reflecting AI's potential but also its limitations in achieving expert-level interpretations. The inexperienced US user outperformed ChatGPT with a nearly perfect agreement with the experts, highlighting the critical role of traditional medical training in standardized risk stratification tools such as TI-RADS. Conclusions: ChatGPT showed high specificity in recommending FNAs but lower sensitivity and specificity for follow-ups compared to the medical student. These findings emphasize ChatGPT's potential as a supportive diagnostic tool rather than a replacement for human expertise. Enhancing AI algorithms and training could improve ChatGPT's clinical utility, enabling better support for clinicians in managing thyroid nodules and improving patient care. This study highlights both the promise and current limitations of AI in medical diagnostics, advocating for its refinement and integration into clinical workflows. However, it emphasizes that traditional clinical training must not be compromised, as it is essential for identifying and correcting AI-driven errors.
Collapse
Affiliation(s)
- Katharina Margherita Wakonig
- Department of Otorhinolaryngology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Campus Virchow Klinikum and Campus Charité Mitte, Charitéplatz 1, 10117 Berlin, Germany
| | - Simon Barisch
- Department of Otorhinolaryngology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Campus Virchow Klinikum and Campus Charité Mitte, Charitéplatz 1, 10117 Berlin, Germany
| | - Leonard Kozarzewski
- Department of Endocrinology, Diabetes and Metabolism, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Steffen Dommerich
- Department of Otorhinolaryngology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Campus Virchow Klinikum and Campus Charité Mitte, Charitéplatz 1, 10117 Berlin, Germany
| | - Markus Herbert Lerchbaumer
- Department of Radiology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
2
|
Hillebrand G, Gartmeier M, Weiss N, Engelmann L, Stenzl A, Johnson F, Hofauer B. [Virtual DEGUM-certified course in the head and neck region-a useful complement to conventional course formats?]. HNO 2024; 72:154-160. [PMID: 38353674 PMCID: PMC10879222 DOI: 10.1007/s00106-023-01413-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
BACKGROUND Training in clinical ultrasound has become highly relevant for working as an otorhinolaryngologist. While there is a high demand for standardized and certified training courses, until recently, there was no possibility to attend web-based and exclusively virtual head and neck ultrasound courses certified by the Deutsche Gesellschaft für Ultraschall in der Medizin (DEGUM; German Society for Ultrasound in Medicine). OBJECTIVE The aim of this study was to provide a qualitative and semi-quantitative analysis of the first purely virtual DEGUM-certified head and neck ultrasound courses. MATERIALS AND METHODS In 2021, three purely web-based DEGUM-certified head and neck ultrasound courses were carried out and then qualitatively analyzed using questionnaires including an examination. RESULTS The purely virtual implementation of head and neck ultrasound courses proved to be a viable alternative to the conventional course format, with a high level of acceptance among the participants. The lack of practice among the participants remains a relevant criticism. CONCLUSION A more dominant role of web-based and remote ultrasound training is likely and should be considered as an alternative depending on existing conditions. Nevertheless, acquisition of practical sonographic skills remains a major hurdle if courses are purely digital.
Collapse
Affiliation(s)
- Gabriel Hillebrand
- Klinik und Poliklinik für Hals‑, Nasen- und Ohrenheilkunde, Klinikum rechts der Isar der TU München, Ismaninger Straße 22, 81675, München, Deutschland.
| | - Martin Gartmeier
- TUM Medical Education Center, Lehrstuhl für Medizindidaktik, medizinische Lehrentwicklung und Bildungsforschung, Fakultät für Medizin, Klinikum rechts der Isar, Nigerstraße 3, 81675, München, Deutschland
| | - Nora Weiss
- Klinik und Poliklinik für Hals‑, Nasen- und Ohrenheilkunde, Klinikum rechts der Isar der TU München, Ismaninger Straße 22, 81675, München, Deutschland
| | - Luca Engelmann
- Klinik und Poliklinik für Hals‑, Nasen- und Ohrenheilkunde, Klinikum rechts der Isar der TU München, Ismaninger Straße 22, 81675, München, Deutschland
| | - Anna Stenzl
- Klinik für Hals, Nasen und Ohrenheilkunde, Medizinische Universität Innsbruck, Anichstraße 35, 6020, Innsbruck, Österreich
| | - Felix Johnson
- Klinik für Hals, Nasen und Ohrenheilkunde, Medizinische Universität Innsbruck, Anichstraße 35, 6020, Innsbruck, Österreich
| | - Benedikt Hofauer
- Klinik für Hals, Nasen und Ohrenheilkunde, Medizinische Universität Innsbruck, Anichstraße 35, 6020, Innsbruck, Österreich
| |
Collapse
|
3
|
Dondi F, Gatta R, Treglia G, Piccardo A, Albano D, Camoni L, Gatta E, Cavadini M, Cappelli C, Bertagna F. Application of radiomics and machine learning to thyroid diseases in nuclear medicine: a systematic review. Rev Endocr Metab Disord 2024; 25:175-186. [PMID: 37434097 PMCID: PMC10808150 DOI: 10.1007/s11154-023-09822-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/30/2023] [Indexed: 07/13/2023]
Abstract
BACKGROUND In the last years growing evidences on the role of radiomics and machine learning (ML) applied to different nuclear medicine imaging modalities for the assessment of thyroid diseases are starting to emerge. The aim of this systematic review was therefore to analyze the diagnostic performances of these technologies in this setting. METHODS A wide literature search of the PubMed/MEDLINE, Scopus and Web of Science databases was made in order to find relevant published articles about the role of radiomics or ML on nuclear medicine imaging for the evaluation of different thyroid diseases. RESULTS Seventeen studies were included in the systematic review. Radiomics and ML were applied for assessment of thyroid incidentalomas at 18 F-FDG PET, evaluation of cytologically indeterminate thyroid nodules, assessment of thyroid cancer and classification of thyroid diseases using nuclear medicine techniques. CONCLUSION Despite some intrinsic limitations of radiomics and ML may have affect the results of this review, these technologies seem to have a promising role in the assessment of thyroid diseases. Validation of preliminary findings in multicentric studies is needed to translate radiomics and ML approaches in the clinical setting.
Collapse
Affiliation(s)
- Francesco Dondi
- Nuclear Medicine, ASST Spedali Civili di Brescia, P.le Spedali Civili, 1, Brescia, 25123, Italy
| | - Roberto Gatta
- Dipartimento di Scienze Cliniche e Sperimentali, Università degli Studi di Brescia, Brescia, Italy
| | - Giorgio Treglia
- Clinic of Nuclear Medicine, Imaging Institute of Southern Switzerland, Ente Ospedaliero Cantonale, Bellinzona, Switzerland
- Department of Nuclear Medicine and Molecular Imaging, Lausanne University Hospital, University of Lausanne, Lausanne, Switzerland
- Faculty of Biomedical Sciences, Università della Svizzera italiana, Lugano, Switzerland
| | | | - Domenico Albano
- Nuclear Medicine, ASST Spedali Civili di Brescia and Università degli Studi di Brescia, Brescia, Italy
| | - Luca Camoni
- Nuclear Medicine, ASST Spedali Civili di Brescia, P.le Spedali Civili, 1, Brescia, 25123, Italy
| | - Elisa Gatta
- Unit of Endocrinology and Metabolism, ASST Spedali Civili di Brescia and Università degli Studi di Brescia, Brescia, Italy
| | - Maria Cavadini
- Unit of Endocrinology and Metabolism, ASST Spedali Civili di Brescia and Università degli Studi di Brescia, Brescia, Italy
| | - Carlo Cappelli
- Unit of Endocrinology and Metabolism, ASST Spedali Civili di Brescia and Università degli Studi di Brescia, Brescia, Italy
| | - Francesco Bertagna
- Nuclear Medicine, ASST Spedali Civili di Brescia, P.le Spedali Civili, 1, Brescia, 25123, Italy.
- Nuclear Medicine, ASST Spedali Civili di Brescia and Università degli Studi di Brescia, Brescia, Italy.
| |
Collapse
|
4
|
Yazgi D, Richa C, Salenave S, Kamenicky P, Bourouina A, Clavier L, Dupeux M, Papon JF, Young J, Chanson P, Maione L. Differentiating pathologic parathyroid glands from thyroid nodules on neck ultrasound: the PARATH-US cross-sectional study. THE LANCET REGIONAL HEALTH. EUROPE 2023; 35:100751. [PMID: 37915399 PMCID: PMC10616552 DOI: 10.1016/j.lanepe.2023.100751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/21/2023] [Accepted: 09/27/2023] [Indexed: 11/03/2023]
Abstract
Background Neck ultrasound (US) is a widely used and accessible operator-dependent technique that helps characterize thyroid nodules and pathologic parathyroid glands (PPGs). However, thyroid nodules may sometimes be confused with PPGs. PARATH-US study aims at identifying US characteristics to differentiate PPGs from thyroid nodules, as there is no study, at present, which directly compares the US features of these two common neoplasms. Methods PARATH-US is a single-center study that was conducted at a tertiary referral center, including consecutive lesions from patients undergoing neck US examination from 2016 to 2022. Findings 176 PPGs (158 patients: serum calcium levels 2.91 [IQR 2.74-3.05] mmol/L, PTH levels 173 [112-296] ng/L) were compared to 232 size- and volume-matched thyroid nodules (204 age- and sex-matched patients). The morphologic patterns, echoic content and vascular status were all different between PPGs and thyroid neoplasms (p < 0.01 for all comparisons). The combined parameters maximally discriminated PPGs from thyroid nodules (OR, 7.6; 95% CI: 3.4, 17.1, p < 0.0001). When applying risk stratification systems developed for thyroid malignancies, 58-63% of PPGs were classified as high-risk lesions. Parathyroid adenomas had larger sizes and volumes than hyperplasias (p = 0.013 and p = 0.029). Serum calcium and PTH levels were significantly correlated with PPG size and volume (p < 0.0001 for all comparisons). Interpretation We demonstrate the presence of distinct US characteristics in PPGs, which help differentiate them from thyroid nodules. When mistaken for thyroid nodules, PPGs bear high-risk US features. When dealing with high-risk cervical lesions detected on US, a PPG should be suspected, and an assessment of calcium levels recommended to avoid unnecessary invasive procedures. Funding CYTO-TRAIN, C2022DOSRH053, funded by the French Regional Health Agency.
Collapse
Affiliation(s)
- Dolly Yazgi
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | - Carine Richa
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | - Sylvie Salenave
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | - Peter Kamenicky
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | - Amel Bourouina
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | | | - Margot Dupeux
- Université Paris-Saclay, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre Service d’Anatomie et Cytologie Pathologiques, Le Kremlin-Bicêtre, France
| | - Jean-François Papon
- Université Paris-Saclay, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Oto-Rhino-Laryngologie et Chirurgie Cervico-Maxillo Faciale, Le Kremlin-Bicêtre, France
| | - Jacques Young
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | - Philippe Chanson
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| | - Luigi Maione
- Université Paris-Saclay, Inserm, Physiologie et Physiopathologie Endocriniennes, Assistance Publique-Hôpitaux de Paris, Hôpital Bicêtre, Service d’Endocrinologie et des Maladies de la Reproduction, Le Kremlin-Bicêtre, France
| |
Collapse
|
5
|
Konca C, Elhan AH. Unveiling the Accuracy of Ultrasonographic Assessment of Thyroid Volume: A Comparative Analysis of Ultrasonographic Measurements and Specimen Volumes. J Clin Med 2023; 12:6619. [PMID: 37892758 PMCID: PMC10607290 DOI: 10.3390/jcm12206619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/15/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open
Abstract
In endocrine surgery, a precise ultrasonographic measurement of thyroid volume is crucial. However, there is limited comparative research between ultrasonographic and specimen volumes, which has left this issue open to debate. This study aims to assess the accuracy of recommended formulas for ultrasonographic thyroid volume measurement by comparing them to specimen volumes and analyzing the influencing variables. From the data of 120 eligible patients, different formulas, including ultrasonographic thyroid volume (US-TV) based on the ellipsoid formula, lower correction factor thyroid volume (LCF-TV), and calculated ultrasonographic (derived formula) thyroid volume (CU-TV), were used to estimate the thyroid volume based on measurements taken prior to surgery. These measurements were compared with the intraoperative specimen volume (IO-TV) derived using Archimedes' principle. According to our findings, the mean values for US-TV and LCF-TV were significantly lower, whereas CU-TV was higher than IO-TV. Deviations were more significant in patients who had surgery for benign indications or compressive symptoms and in those with suppressed thyroid-stimulating hormone levels. Although the ellipsoid formula tends to underestimate the actual thyroid volume, it remains the most accurate method for measuring ultrasonographic thyroid volume. The deviation is greater for larger volumes.
Collapse
Affiliation(s)
- Can Konca
- Department of General Surgery, Ankara University School of Medicine, 06230 Ankara, Turkey
| | - Atilla Halil Elhan
- Department of Biostatistics, Ankara University School of Medicine, 06230 Ankara, Turkey;
| |
Collapse
|