1
|
Shiraishi M, Sowa Y, Tomita K, Terao Y, Satake T, Muto M, Morita Y, Higai S, Toyohara Y, Kurokawa Y, Sunaga A, Okazaki M. Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction. Aesthetic Plast Surg 2025; 49:1947-1953. [PMID: 39592492 DOI: 10.1007/s00266-024-04515-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 11/04/2024] [Indexed: 11/28/2024]
Abstract
BACKGROUND Artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4) and Grok-1 (Grok), have been shown to be potentially useful in several medical fields, but have not been examined in plastic and aesthetic surgery. The aim of this study is to evaluate the responses of these AI chatbots for clinical questions (CQs) related to the guidelines for implant-based breast reconstruction (IBBR) published by the Japan Society of Plastic and Reconstructive Surgery (JSPRS) in 2021. METHODS CQs in the JSPRS guidelines were used as question sources. Responses from two AI chatbots, GPT-4 and Grok, were evaluated for accuracy, informativeness, and readability by five Japanese Board-certified breast reconstruction specialists and five Japanese clinical fellows of plastic surgery. RESULTS GPT-4 outperformed Grok significantly in terms of accuracy (p < 0.001), informativeness (p < 0.001), and readability (p < 0.001) when evaluated by plastic surgery fellows. Compared to the original guidelines, Grok scored significantly lower in all three areas (all p < 0.001). The accuracy of GPT-4 was rated to be significantly higher based on scores given by plastic surgery fellows compared to those of breast reconstruction specialists (p = 0.012), whereas there was no significant difference between these scores for Grok. CONCLUSIONS The study suggests that GPT-4 has the potential to assist in interpreting and applying clinical guidelines for IBBR but importantly there is still a risk that AI chatbots can misinform. Further studies are needed to understand the broader role of current and future AI chatbots in breast reconstruction surgery. LEVEL OF EVIDENCE IV This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine Ratings, please refer to Table of Contents or online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yoshihiro Sowa
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan.
| | - Koichi Tomita
- Department of Plastic and Reconstructive Surgery, Kindai University, Osaka, Japan
| | - Yasunobu Terao
- Department of Plastic and Reconstructive Surgery, Tokyo Metropolitan Cancer and Infectious Diseases Center, Komagome Hospital, Tokyo, Japan
| | - Toshihiko Satake
- Department of Plastic, Reconstructive and Aesthetic Surgery, Toyama University Hospital, Toyama, Japan
| | - Mayu Muto
- Department of Plastic, Reconstructive and Aesthetic Surgery, Toyama University Hospital, Toyama, Japan
- Lala Breast Reconstruction Clinic Yokohama, Yokohama, Japan
- Department of Plastic Surgery, Yokohama City University Medical Center, Yokohama, Japan
| | - Yuhei Morita
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
- Japanese Red Cross Koga Hospital, Koga, Japan
| | - Shino Higai
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Yoshihiro Toyohara
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Yasue Kurokawa
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Ataru Sunaga
- Department of Plastic Surgery, Jichi Medical University, Yakushiji, Shimotsuke, Tochigi, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
2
|
Ali R, Cui H. Leveraging ChatGPT for Enhanced Aesthetic Evaluations in Minimally Invasive Facial Procedures. Aesthetic Plast Surg 2025; 49:950-961. [PMID: 39578313 DOI: 10.1007/s00266-024-04524-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 11/04/2024] [Indexed: 11/24/2024]
Abstract
BACKGROUND In recent years, the application of AI technologies like ChatGPT has gained traction in the field of plastic surgery. AI models can analyze pre- and post-treatment images to offer insights into the effectiveness of cosmetic procedures. This technological advancement enables rapid, objective evaluations that can complement traditional assessment methods, providing a more comprehensive understanding of treatment outcomes. OBJECTIVE The study aimed to comprehensively assess the effectiveness of custom ChatGPT model, "Face Rating and Review AI," in facial feature evaluation in minimally invasive aesthetic procedures, particularly before and after Botox treatments. METHOD An analysis was conducted on the Web of Science (WoS) database, identifying 79 articles published between 2023 and 2024 on ChatGPT in the field of plastic surgery from various countries. A dataset of 23 patients from Kaggle, including pre- and post-Botox images, was used. The custom ChatGPT model, "Face Rating & Review AI," was used to assess facial features based on objective parameters such as the golden ratio, symmetry, proportion, side angles, skin condition, and overall harmony, as well as subjective parameters like personality, temperament, and social attraction. RESULT The WoS search found 79 articles on ChatGPT in plastic surgery from 27 countries, with most publications originating from the USA, Australia, and Italy. The objective and subjective parameters were analyzed using a paired t-test, and all facial features showed low p-values (<0.05). Higher mean scores on features such as the golden ratio (mean = 5.86, SD = 0.69), skin condition (mean = 3.78, SD = 0.73), and personality (mean = 5.0, SD = 0.79) indicate positive shifts after the treatment. CONCLUSION The custom ChatGPT model "Face Rating and Review AI" is a valuable tool for assessing facial features in Botox treatments. It effectively evaluates objective and subjective attributes, aiding clinical decision-making. However, ethical considerations highlight the need for diverse datasets in future research to improve accuracy and inclusivity. LEVEL OF EVIDENCE V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Rizwan Ali
- Department of Plastic and Cosmetic Surgery, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200092, China
- Institute of Aesthetic Plastic Surgery and Medicine, School of Medicine, Tongji University, Shanghai, 200092, China
| | - Haiyan Cui
- Department of Plastic and Cosmetic Surgery, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200092, China.
- Institute of Aesthetic Plastic Surgery and Medicine, School of Medicine, Tongji University, Shanghai, 200092, China.
| |
Collapse
|
3
|
Park KW, Diop M, Willens SH, Pepper JP. Artificial Intelligence in Facial Plastics and Reconstructive Surgery. Otolaryngol Clin North Am 2024; 57:843-852. [PMID: 38971626 DOI: 10.1016/j.otc.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2024]
Abstract
Artificial intelligence (AI), particularly computer vision and large language models, will impact facial plastic and reconstructive surgery (FPRS) by enhancing diagnostic accuracy, refining surgical planning, and improving post-operative evaluations. These advancements can address subjective limitations of aesthetic surgery by providing objective tools for patient evaluation. Despite these advancements, AI in FPRS has yet to be fully integrated in the clinic setting and faces numerous challenges including algorithmic bias, ethical considerations, and need for validation. This article discusses current and emerging AI technologies in FPRS for the clinic setting, providing a glimpse of its future potential.
Collapse
Affiliation(s)
- Ki Wan Park
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, 801 Welch Road, Palo Alto, CA 94305, USA
| | - Mohamed Diop
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, 801 Welch Road, Palo Alto, CA 94305, USA
| | - Sierra Hewett Willens
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, 801 Welch Road, Palo Alto, CA 94305, USA
| | - Jon-Paul Pepper
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, 801 Welch Road, Palo Alto, CA 94305, USA.
| |
Collapse
|
4
|
Shiraishi M, Tomioka Y, Miyakuni A, Ishii S, Hori A, Park H, Ohba J, Okazaki M. Performance of ChatGPT in Answering Clinical Questions on the Practical Guideline of Blepharoptosis. Aesthetic Plast Surg 2024; 48:2389-2398. [PMID: 38684536 DOI: 10.1007/s00266-024-04005-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/11/2024] [Indexed: 05/02/2024]
Abstract
BACKGROUND ChatGPT is a free artificial intelligence (AI) language model developed and released by OpenAI in late 2022. This study aimed to evaluate the performance of ChatGPT to accurately answer clinical questions (CQs) on the Guideline for the Management of Blepharoptosis published by the American Society of Plastic Surgeons (ASPS) in 2022. METHODS CQs in the guideline were used as question sources in both English and Japanese. For each question, ChatGPT provided answers for CQs, evidence quality, recommendation strength, reference match, and answered word counts. We compared the performance of ChatGPT in each component between English and Japanese queries. RESULTS A total of 11 questions were included in the final analysis, and ChatGPT answered 61.3% of these correctly. ChatGPT demonstrated a higher accuracy rate in English answers for CQs compared to Japanese answers for CQs (76.4% versus 46.4%; p = 0.004) and word counts (123 words versus 35.9 words; p = 0.004). No statistical differences were noted for evidence quality, recommendation strength, and reference match. A total of 697 references were proposed, but only 216 of them (31.0%) existed. CONCLUSIONS ChatGPT demonstrates potential as an adjunctive tool in the management of blepharoptosis. However, it is crucial to recognize that the existing AI model has distinct limitations, and its primary role should be to complement the expertise of medical professionals. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Saaya Ishii
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Asei Hori
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Hwayoung Park
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Ohba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
5
|
Shiraishi M, Tanigawa K, Tomioka Y, Miyakuni A, Moriwaki Y, Yang R, Oba J, Okazaki M. Blepharoptosis Consultation with Artificial Intelligence: Aesthetic Surgery Advice and Counseling from Chat Generative Pre-Trained Transformer (ChatGPT). Aesthetic Plast Surg 2024; 48:2057-2063. [PMID: 38589561 DOI: 10.1007/s00266-024-04002-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/11/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Chat generative pre-trained transformer (ChatGPT) is a publicly available extensive artificial intelligence (AI) language model that leverages deep learning to generate text that mimics human conversations. In this study, the performance of ChatGPT was assessed by offering insightful and precise answers to a series of fictional questions and emulating a preliminary consultation on blepharoplasty. METHODS ChatGPT was posed with questions derived from a blepharoplasty checklist provided by the American Society of Plastic Surgeons. Board-certified plastic surgeons and non-medical staff members evaluated the responses for accuracy, informativeness, and accessibility. RESULTS Nine questions were used in this study. Regarding informativeness, the average score given by board-certified plastic surgeons was significantly lower than that given by non-medical staff members (2.89 ± 0.72 vs 4.41 ± 0.71; p = 0.042). No statistically significant differences were observed in accuracy (p = 0.56) or accessibility (p = 0.11). CONCLUSIONS Our results emphasize the effectiveness of ChatGPT in simulating doctor-patient conversations during blepharoplasty. Non-medical individuals found its responses more informative compared with the surgeons. Although limited in terms of specialized guidance, ChatGPT offers foundational surgical information. Further exploration is warranted to elucidate the broader role of AI in esthetic surgical consultations. LEVEL OF EVIDENCE V Observational study under respected authorities. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
| | - Koji Tanigawa
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yoko Tomioka
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Ami Miyakuni
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Rui Yang
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Jun Oba
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
6
|
Liu HY, Alessandri-Bonetti M, Arellano JA, Egro FM. Can ChatGPT be the Plastic Surgeon's New Digital Assistant? A Bibliometric Analysis and Scoping Review of ChatGPT in Plastic Surgery Literature. Aesthetic Plast Surg 2024; 48:1644-1652. [PMID: 37853081 DOI: 10.1007/s00266-023-03709-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 09/30/2023] [Indexed: 10/20/2023]
Abstract
BACKGROUND ChatGPT, an artificial intelligence (AI) chatbot that uses natural language processing (NLP) to interact in a humanlike manner, has made significant contributions to various healthcare fields, including plastic surgery. However, its widespread use has raised ethical and security concerns. This study examines the presence of ChatGPT, an artificial intelligence (AI) chatbot, in the literature of plastic surgery. METHODS A bibliometric analysis and scoping review of the ChatGPT plastic surgery literature were performed. PubMed was queried using the search term "ChatGPT" to identify all biomedical literature on ChatGPT, with only studies related to plastic, reconstructive, or aesthetic surgery topics being considered eligible for inclusion. RESULTS The analysis included 30 out of 724 articles retrieved from PubMed, focusing on publications from December 2022 to July 2023. Four key areas of research emerged: applications in research/creation of original work, clinical application, surgical education, and ethics/commentary on previous studies. The versatility of ChatGPT in research, its potential in surgical education, and its role in enhancing patient education were explored. Ethical concerns regarding patient privacy, plagiarism, and the accuracy of information obtained from ChatGPT-generated sources were also highlighted. CONCLUSION While ethical concerns persist, the study underscores the potential of ChatGPT in plastic surgery research and practice, emphasizing the need for careful utilization and collaboration to optimize its benefits while minimizing risks. LEVEL OF EVIDENCE V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Hilary Y Liu
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, Pittsburgh, PA, G10315219, USA
| | - Mario Alessandri-Bonetti
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, Pittsburgh, PA, G10315219, USA
| | - José Antonio Arellano
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, Pittsburgh, PA, G10315219, USA
| | - Francesco M Egro
- Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, Pittsburgh, PA, G10315219, USA.
| |
Collapse
|
7
|
Puladi B, Gsaxner C, Kleesiek J, Hölzle F, Röhrig R, Egger J. The impact and opportunities of large language models like ChatGPT in oral and maxillofacial surgery: a narrative review. Int J Oral Maxillofac Surg 2024; 53:78-88. [PMID: 37798200 DOI: 10.1016/j.ijom.2023.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/14/2023] [Accepted: 09/19/2023] [Indexed: 10/07/2023]
Abstract
Since its release at the end of 2022, the social response to ChatGPT, a large language model (LLM), has been huge, as it has revolutionized the way we communicate with computers. This review was performed to describe the technical background of LLMs and to provide a review of the current literature on LLMs in the field of oral and maxillofacial surgery (OMS). The PubMed, Scopus, and Web of Science databases were searched for LLMs and OMS. Adjacent surgical disciplines were included to cover the entire literature, and records from Google Scholar and medRxiv were added. Out of the 57 records identified, 37 were included; 31 (84%) were related to GPT-3.5, four (11%) to GPT-4, and two (5%) to both. Current research on LLMs is mainly limited to research and scientific writing, patient information/communication, and medical education. Classic OMS diseases are underrepresented. The current literature related to LLMs in OMS has a limited evidence level. There is a need to investigate the use of LLMs scientifically and systematically in the core areas of OMS. Although LLMs are likely to add value outside the operating room, the use of LLMs raises ethical and medical regulatory issues that must first be addressed.
Collapse
Affiliation(s)
- B Puladi
- Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Aachen, Germany; Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany
| | - C Gsaxner
- Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Aachen, Germany; Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany; Institute of Computer Graphics and Vision, Graz University of Technology, Graz, Austria; Department of Oral and Maxillofacial Surgery, Medical University of Graz, Graz, Austria
| | - J Kleesiek
- Institute for AI in Medicine (IKIM), University Hospital Essen (AöR), Essen, Germany
| | - F Hölzle
- Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Aachen, Germany
| | - R Röhrig
- Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany
| | - J Egger
- Institute of Computer Graphics and Vision, Graz University of Technology, Graz, Austria; Institute for AI in Medicine (IKIM), University Hospital Essen (AöR), Essen, Germany.
| |
Collapse
|
8
|
Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, Cabeza-Osorio L, Abasolo-Alcázar L, León-Mateos L, Fernández-Gutiérrez B, Rodríguez-Rodríguez L. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 2023; 13:22129. [PMID: 38092821 PMCID: PMC10719375 DOI: 10.1038/s41598-023-49483-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models' accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1-Q3), 4.5 (2.33-4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5-4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain.
| | - Zulema Rosales-Rosado
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Dalifer Freites-Nuñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Inés Pérez-Sancristóbal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Esperanza Pato-Cour
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | | | - Luis Cabeza-Osorio
- Medicina Interna, Hospital Universitario del Henares, Avenida de Marie Curie, 0, 28822, Madrid, Spain
- Facultad de Medicina, Universidad Francisco de Vitoria, Carretera Pozuelo, Km 1800, 28223, Madrid, Spain
| | - Lydia Abasolo-Alcázar
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Leticia León-Mateos
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
- Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| |
Collapse
|