1
|
Kron P, Farid S, Ali S, Lodge P. Artificial Intelligence: A Help or Hindrance to Scientific Writing? Ann Surg 2024; 280:713-718. [PMID: 39087343 DOI: 10.1097/sla.0000000000006464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
We have assessed the chatbot Generative Pretrained Transformer, a type of artificial intelligence software designed to simulate conversations with human users, in an experiment designed to test its relevance to scientific writing. chatbot Generative Pretrained Transformer could become a promising and powerful tool for tasks such as automated draft generation, which may be useful in academic activities to make writing work faster and easier. However, the use of this tool in scientific writing raises some ethical concerns and therefore there have been calls for it to be regulated. It may be difficult to recognize whether an abstract or paper is written by a chatbot or a human being because chatbots use advanced techniques, such as natural language processing and machine learning, to generate text that is similar to human writing. To detect the author is a complex task and requires thorough critical reading to reach a conclusion. The aim of this paper is, therefore, to explore the pros and cons of the use of chatbots in scientific writing.
Collapse
Affiliation(s)
- Philipp Kron
- HPB and Transplant Unit, St. James's University Hospital, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom
- Department for General and Transplantation Surgery, University Hospital Tuebingen, Tuebingen, Germany
| | - Shahid Farid
- HPB and Transplant Unit, St. James's University Hospital, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom
| | - Sharib Ali
- Faculty of Engineering and Physical Sciences, School of Computing, University of Leeds, Leeds, United Kingdom
| | - Peter Lodge
- HPB and Transplant Unit, St. James's University Hospital, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom
| |
Collapse
|
2
|
Knoedler L, Alfertshofer M, Knoedler S, Hoch CC, Funk PF, Cotofana S, Maheta B, Frank K, Brébant V, Prantl L, Lamby P. Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis. JMIR MEDICAL EDUCATION 2024; 10:e51148. [PMID: 38180782 PMCID: PMC10799278 DOI: 10.2196/51148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/30/2023] [Accepted: 10/20/2023] [Indexed: 01/06/2024]
Abstract
BACKGROUND The United States Medical Licensing Examination (USMLE) has been critical in medical education since 1992, testing various aspects of a medical student's knowledge and skills through different steps, based on their training level. Artificial intelligence (AI) tools, including chatbots like ChatGPT, are emerging technologies with potential applications in medicine. However, comprehensive studies analyzing ChatGPT's performance on USMLE Step 3 in large-scale scenarios and comparing different versions of ChatGPT are limited. OBJECTIVE This paper aimed to analyze ChatGPT's performance on USMLE Step 3 practice test questions to better elucidate the strengths and weaknesses of AI use in medical education and deduce evidence-based strategies to counteract AI cheating. METHODS A total of 2069 USMLE Step 3 practice questions were extracted from the AMBOSS study platform. After including 229 image-based questions, a total of 1840 text-based questions were further categorized and entered into ChatGPT 3.5, while a subset of 229 questions were entered into ChatGPT 4. Responses were recorded, and the accuracy of ChatGPT answers as well as its performance in different test question categories and for different difficulty levels were compared between both versions. RESULTS Overall, ChatGPT 4 demonstrated a statistically significant superior performance compared to ChatGPT 3.5, achieving an accuracy of 84.7% (194/229) and 56.9% (1047/1840), respectively. A noteworthy correlation was observed between the length of test questions and the performance of ChatGPT 3.5 (ρ=-0.069; P=.003), which was absent in ChatGPT 4 (P=.87). Additionally, the difficulty of test questions, as categorized by AMBOSS hammer ratings, showed a statistically significant correlation with performance for both ChatGPT versions, with ρ=-0.289 for ChatGPT 3.5 and ρ=-0.344 for ChatGPT 4. ChatGPT 4 surpassed ChatGPT 3.5 in all levels of test question difficulty, except for the 2 highest difficulty tiers (4 and 5 hammers), where statistical significance was not reached. CONCLUSIONS In this study, ChatGPT 4 demonstrated remarkable proficiency in taking the USMLE Step 3, with an accuracy rate of 84.7% (194/229), outshining ChatGPT 3.5 with an accuracy rate of 56.9% (1047/1840). Although ChatGPT 4 performed exceptionally, it encountered difficulties in questions requiring the application of theoretical concepts, particularly in cardiology and neurology. These insights are pivotal for the development of examination strategies that are resilient to AI and underline the promising role of AI in the realm of medical education and diagnostics.
Collapse
Affiliation(s)
- Leonard Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Michael Alfertshofer
- Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians University Munich, Munich, Germany
| | - Samuel Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
- Division of Plastic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Cosima C Hoch
- Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich, Munich, Germany
| | - Paul F Funk
- Department of Otolaryngology, Head and Neck Surgery, University Hospital Jena, Friedrich Schiller University Jena, Jena, Germany
| | - Sebastian Cotofana
- Department of Dermatology, Erasmus Hospital, Rotterdam, Netherlands
- Centre for Cutaneous Research, Blizard Institute, Queen Mary University of London, London, United Kingdom
| | - Bhagvat Maheta
- College of Medicine, California Northstate University, Elk Grove, CA, United States
| | | | - Vanessa Brébant
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Lukas Prantl
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Philipp Lamby
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| |
Collapse
|
3
|
Kayaalp ME, Ollivier M, Winkler PW, Dahmen J, Musahl V, Hirschmann MT, Karlsson J. Embrace responsible ChatGPT usage to overcome language barriers in academic writing. Knee Surg Sports Traumatol Arthrosc 2024; 32:5-9. [PMID: 38226673 DOI: 10.1002/ksa.12014] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 11/08/2023] [Indexed: 01/17/2024]
Affiliation(s)
- M Enes Kayaalp
- Department of Orthopaedic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department for Orthopaedics and Traumatology, Istanbul Kartal Research and Training Hospital, Istanbul, Turkiye
| | - Matthieu Ollivier
- CNRS, Institute of Movement Sciences (ISM), Aix Marseille University, Marseille, France
| | - Philipp W Winkler
- Department for Orthopaedics and Traumatology, Kepler University Hospital GmbH, Linz, Austria
| | - Jari Dahmen
- Department of Orthopaedic Surgery and Sports Medicine, Amsterdam Movement Sciences, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
- Academic Center for Evidence Based Sports Medicine (ACES), Amsterdam, The Netherlands
- Amsterdam Collaboration for Health and Safety in Sports (ACHSS), International Olympic Committee (IOC) Research Center Amsterdam UMC, Amsterdam, The Netherlands
| | - Volker Musahl
- Department of Orthopaedic Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michael T Hirschmann
- Department of Orthopedic Surgery and Traumatology, Head Knee Surgery and DKF Head of Research, Kantonsspital Baselland, Bruderholz, Bottmingen, Switzerland
- University of Basel, Basel, Switzerland
| | - Jon Karlsson
- Department for Orthopaedics, Sahlgrenska University Hospital, Institute of Clinical Sciences, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| |
Collapse
|
4
|
Daungsupawong H, Wiwanitkit V. Comment on Published Article "Surgeon or Bot? The Risks of Using Artificial Intelligence in Surgical Journal Publications". ANNALS OF SURGERY OPEN 2023; 4:e357. [PMID: 38144494 PMCID: PMC10735155 DOI: 10.1097/as9.0000000000000357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 09/28/2023] [Indexed: 12/26/2023] Open
|