1
|
Najafali D, Reiche E, Araya S, Orellana M, Liu FC, Camacho JM, Patel SA, Broyles JM, Dorafshar AH, Morrison SD, Knoedler L, Fox PM. Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination. PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN 2025; 13:e6645. [PMID: 40212094 PMCID: PMC11984779 DOI: 10.1097/gox.0000000000006645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 01/27/2025] [Indexed: 04/13/2025]
Abstract
Background ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education. Methods Questions from the 2022 Plastic Surgery In-service Examination were given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents. Results GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4's highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall. Conclusions GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.
Collapse
Affiliation(s)
- Daniel Najafali
- From the Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL
| | - Erik Reiche
- Division of Plastic and Reconstructive Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Sthefano Araya
- Division of Plastic and Reconstructive Surgery, Fox Chase Cancer Center, Temple University Philadelphia, PA
| | - Manuel Orellana
- Department of Surgery, Harbor-UCLA Medical Center, Torrance, CA
| | - Farrah C. Liu
- Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA
| | - Justin M. Camacho
- Division of Plastic and Reconstructive Surgery, Fox Chase Cancer Center, Temple University Philadelphia, PA
| | - Sameer A. Patel
- Division of Plastic and Reconstructive Surgery, Fox Chase Cancer Center, Temple University Philadelphia, PA
| | - Justin M. Broyles
- Division of Plastic and Reconstructive Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Amir H. Dorafshar
- Division of Plastic and Reconstructive Surgery, Rush University Medical Center, Chicago, IL
| | - Shane D. Morrison
- Division of Plastic and Reconstructive Surgery, University of Washington at Harborview Medical Center, Seattle, WA
- Department of Urology, University of Washington Medical Center, Seattle, WA
| | - Leonard Knoedler
- Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Regensburg, Germany
| | - Paige M. Fox
- Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA
| |
Collapse
|
2
|
Guo S, Li R, Li G, Chen W, Huang J, He L, Ma Y, Wang L, Zheng H, Tian C, Zhao Y, Pan X, Wan H, Liu D, Li Z, Lei J. Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients. J Clin Endocrinol Metab 2025; 110:e841-e850. [PMID: 38597169 DOI: 10.1210/clinem/dgae235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/03/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024]
Abstract
CONTEXT For some common thyroid-related conditions with high prevalence and long follow-up times, ChatGPT can be used to respond to common thyroid-related questions. OBJECTIVE In this cross-sectional study, we assessed the ability of ChatGPT (version GPT-4.0) to provide accurate, comprehensive, compassionate, and satisfactory responses to common thyroid-related questions. METHODS First, we obtained 28 thyroid-related questions from the Huayitong app, which together with the 2 interfering questions eventually formed 30 questions. Then, these questions were responded to by ChatGPT (on July 19, 2023), a junior specialist, and a senior specialist (on July 20, 2023) separately. Finally, 26 patients and 11 thyroid surgeons evaluated those responses on 4 dimensions: accuracy, comprehensiveness, compassion, and satisfaction. RESULTS Among the 30 questions and responses, ChatGPT's speed of response was faster than that of the junior specialist (8.69 [7.53-9.48] vs 4.33 [4.05-4.60]; P < .001) and the senior specialist (8.69 [7.53-9.48] vs 4.22 [3.36-4.76]; P < .001). The word count of the ChatGPT's responses was greater than that of both the junior specialist (341.50 [301.00-384.25] vs 74.50 [51.75-84.75]; P < .001) and senior specialist (341.50 [301.00-384.25] vs 104.00 [63.75-177.75]; P < .001). ChatGPT received higher scores than the junior specialist and senior specialist in terms of accuracy, comprehensiveness, compassion, and satisfaction in responding to common thyroid-related questions. CONCLUSION ChatGPT performed better than a junior specialist and senior specialist in answering common thyroid-related questions, but further research is needed to validate the logical ability of the ChatGPT for complex thyroid questions.
Collapse
Affiliation(s)
- Siyin Guo
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Ruicen Li
- Health Management Center, General Practice Medical Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Genpeng Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Wenjie Chen
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Jing Huang
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Linye He
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Yu Ma
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Liying Wang
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Hongping Zheng
- Department of Thyroid Surgery, General Surgery Ward 7, The First Hospital of Lanzhou University, Lanzhou, Gansu 730000, China
| | - Chunxiang Tian
- Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, Sichuan 610031, China
| | - Yatong Zhao
- Thyroid Surgery, Zhengzhou Central Hospital Affiliated of Zhengzhou University, Zhengzhou, Henan 450007, China
| | - Xinmin Pan
- Department of Thyroid Surgery, General Surgery III, Gansu Provincial Hospital, Lanzhou, Gansu 730000, China
| | - Hongxing Wan
- Department of Oncology, Sanya People's Hospital, Sanya, Hainan 572000, China
| | - Dasheng Liu
- Department of Vascular Thyroid Surgery, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510120, China
| | - Zhihui Li
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Jianyong Lei
- Division of Thyroid Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
3
|
Hlavinka WJ, Sontam TR, Gupta A, Croen BJ, Abdullah MS, Humbyrd CJ. Are large language models a useful resource to address common patient concerns on hallux valgus? A readability analysis. Foot Ankle Surg 2025; 31:15-19. [PMID: 39117535 DOI: 10.1016/j.fas.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/21/2024] [Accepted: 08/02/2024] [Indexed: 08/10/2024]
Abstract
BACKGROUND This study evaluates the accuracy and readability of Google, ChatGPT-3.5, and 4.0 (two versions of an artificial intelligence model) responses to common questions regarding bunion surgery. METHODS A Google search of "bunionectomy" was performed, and the first ten questions under "People Also Ask" were recorded. ChatGPT-3.5 and 4.0 were asked these ten questions individually, and their answers were analyzed using the Flesch-Kincaid Reading Ease and Gunning-Fog Level algorithms. RESULTS When compared to Google, ChatGPT-3.5 and 4.0 had a larger word count with 315 ± 39 words (p < .0001) and 294 ± 39 words (p < .0001), respectively. A significant difference was found between ChatGPT-3.5 and 4.0 compared to Google using Flesch-Kincaid Reading Ease (p < .0001). CONCLUSIONS Our findings demonstrate that ChatGPT provided significantly lengthier responses than Google and there was a significant difference in reading ease. Both platforms exceeded the seventh to eighth-grade reading level of the U.S. POPULATION LEVEL OF EVIDENCE N/A.
Collapse
Affiliation(s)
- William J Hlavinka
- Texas A&M School of Medicine, Baylor University Medical Center, Department of Medical Education, 3500 Gaston Avenue, 6-Roberts, Dallas, TX 75246, USA.
| | - Tarun R Sontam
- Texas A&M School of Medicine, Baylor University Medical Center, Department of Medical Education, 3500 Gaston Avenue, 6-Roberts, Dallas, TX 75246, USA.
| | - Anuj Gupta
- Texas A&M School of Medicine, Baylor University Medical Center, Department of Medical Education, 3500 Gaston Avenue, 6-Roberts, Dallas, TX 75246, USA.
| | - Brett J Croen
- Department of Orthopedic Surgery, University of Pennsylvania Health System, 51 N 39th St, Philadelphia, PA 19104, USA.
| | - Mohammed S Abdullah
- Department of Orthopedic Surgery, University of Pennsylvania Health System, 51 N 39th St, Philadelphia, PA 19104, USA.
| | - Casey J Humbyrd
- Department of Orthopedic Surgery, University of Pennsylvania Health System, 51 N 39th St, Philadelphia, PA 19104, USA.
| |
Collapse
|
4
|
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Boscolo-Rizzo P, Califano G, Cammaroto G, Chiesa-Estomba CM, Committeri U, Crimi S, Curran NR, di Bello F, di Stadio A, Frosolini A, Gabriele G, Gengler IM, Lonardi F, Maglitto F, Mayo-Yáñez M, Petrocelli M, Pucci R, Saibene AM, Saponaro G, Tel A, Trabalzini F, Trecca EMC, Vellone V, Salzano G, De Riu G. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. Eur Arch Otorhinolaryngol 2024; 281:6123-6131. [PMID: 38703195 PMCID: PMC11512889 DOI: 10.1007/s00405-024-08710-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 04/27/2024] [Indexed: 05/06/2024]
Abstract
BACKGROUND The widespread diffusion of Artificial Intelligence (AI) platforms is revolutionizing how health-related information is disseminated, thereby highlighting the need for tools to evaluate the quality of such information. This study aimed to propose and validate the Quality Assessment of Medical Artificial Intelligence (QAMAI), a tool specifically designed to assess the quality of health information provided by AI platforms. METHODS The QAMAI tool has been developed by a panel of experts following guidelines for the development of new questionnaires. A total of 30 responses from ChatGPT4, addressing patient queries, theoretical questions, and clinical head and neck surgery scenarios were assessed by 27 reviewers from 25 academic centers worldwide. Construct validity, internal consistency, inter-rater and test-retest reliability were assessed to validate the tool. RESULTS The validation was conducted on the basis of 792 assessments for the 30 responses given by ChatGPT4. The results of the exploratory factor analysis revealed a unidimensional structure of the QAMAI with a single factor comprising all the items that explained 51.1% of the variance with factor loadings ranging from 0.449 to 0.856. Overall internal consistency was high (Cronbach's alpha = 0.837). The Interclass Correlation Coefficient was 0.983 (95% CI 0.973-0.991; F (29,542) = 68.3; p < 0.001), indicating excellent reliability. Test-retest reliability analysis revealed a moderate-to-strong correlation with a Pearson's coefficient of 0.876 (95% CI 0.859-0.891; p < 0.001). CONCLUSIONS The QAMAI tool demonstrated significant reliability and validity in assessing the quality of health information provided by AI platforms. Such a tool might become particularly important/useful for physicians as patients increasingly seek medical information on AI platforms.
Collapse
Affiliation(s)
- Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Viale San Pietro 43/B, 07100, Sassari, Italy.
- PhD School of Biomedical Science, Biomedical Sciences Department, University of Sassari, Sassari, Italy.
| | - Jerome R Lechien
- Department of Laryngology and Bronchoesophagology, EpiCURA Hospital, Mons School of Medicine, UMONS. Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
- Department of Otolaryngology-Head Neck Surgery, Elsan Polyclinic of Poitiers, Poitiers, France
| | - Vincenzo Abbate
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Fabiana Allevi
- Maxillofacial Surgery Department, ASSt Santi Paolo e Carlo, University of Milan, Milan, Italy
| | - Giovanni Audino
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giada Anna Beltramini
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
- Maxillofacial and Dental Unit, Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Michela Bergonzani
- Maxillo-Facial Surgery Division, Head and Neck Department, University Hospital of Parma, Parma, USA
| | - Paolo Boscolo-Rizzo
- Department of Medical, Surgical and Health Sciences, Section of Otolaryngology, University of Trieste, Trieste, Italy
| | - Gianluigi Califano
- Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giovanni Cammaroto
- ENT Department, Morgagni Pierantoni Hospital, AUSL Romagna, Forlì, Italy
| | - Carlos M Chiesa-Estomba
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital Universitario Donostia, San Sebastian, Spain
| | - Umberto Committeri
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Salvatore Crimi
- Operative Unit of Maxillofacial Surgery, Policlinico San Marco, University of Catania, Catania, Italy
| | - Nicholas R Curran
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati Medical Center, Cincinnati, OH, USA
| | - Francesco di Bello
- Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Arianna di Stadio
- Otolaryngology Unit, GF Ingrassia Department, University of Catania, Catania, Italy
| | - Andrea Frosolini
- Department of Maxillofacial Surgery, University of Siena, Siena, Italy
| | - Guido Gabriele
- Department of Maxillofacial Surgery, University of Siena, Siena, Italy
| | - Isabelle M Gengler
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati Medical Center, Cincinnati, OH, USA
| | - Fabio Lonardi
- Department of Maxillofacial Surgery, University of Verona, Verona, Italy
| | - Fabio Maglitto
- Maxillo-Facial Surgery Unit, University of Bari "Aldo Moro", Bari, Italy
| | - Miguel Mayo-Yáñez
- Otorhinolaryngology, Head and Neck Surgery Department, Complexo Hospitalario Universitario A Coruña (CHUAC), A Coruña, Galicia, Spain
| | - Marzia Petrocelli
- Maxillofacial Surgery Operative Unit, Bellaria and Maggiore Hospital, Bologna, Italy
| | - Resi Pucci
- Maxillofacial Surgery Unit, San Camillo-Forlanini Hospital, Rome, Italy
| | - Alberto Maria Saibene
- Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, University of Milan, Milan, Italy
| | - Gianmarco Saponaro
- Maxillo-Facial Surgery Unit, IRCSS "A. Gemelli" Foundation-Catholic University of the Sacred Heart, Rome, Italy
| | - Alessandro Tel
- Clinic of Maxillofacial Surgery, Department of Head and Neck Surgery and Neuroscience, University Hospital of Udine, Udine, Italy
| | - Franco Trabalzini
- Department of Otorhinolaryngology, Head and Neck Surgery, Meyer Children's Hospital, Florence, Italy
| | - Eleonora M C Trecca
- Department of Otorhinolaryngology and Maxillofacial Surgery, IRCCS Hospital Casa Sollievo Della Sofferenza, San Giovanni Rotondo, Foggia, Italy
- Department of Otorhinolaryngology, University Hospital of Foggia, Foggia, Italy
| | | | - Giovanni Salzano
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giacomo De Riu
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Viale San Pietro 43/B, 07100, Sassari, Italy
| |
Collapse
|
5
|
Zhang N, Sun Z, Xie Y, Wu H, Li C. The latest version ChatGPT powered by GPT-4o: what will it bring to the medical field? Int J Surg 2024; 110:6018-6019. [PMID: 38857508 PMCID: PMC11392067 DOI: 10.1097/js9.0000000000001754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 05/19/2024] [Indexed: 06/12/2024]
Affiliation(s)
- Nan Zhang
- Department of Gynecological Tumor Ward, The Third Affiliated Hospital of Zhengzhou University
| | - Zaijie Sun
- Department of Orthopaedics, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang
| | - Yuchen Xie
- Xiangya Medical College, Central South University, Changsha, Hunan
| | - Haiyang Wu
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou
- Department of Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin
| | - Cheng Li
- Department of Spine Surgery, Wangjing Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Center for Musculoskeletal Surgery (CMSC), Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt University of Berlin, Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
6
|
Song Z, Xu Y, He Y, Wang Y. A commentary on 'Application and challenges of ChatGPT in interventional surgery'. Int J Surg 2024; 110:5961-5962. [PMID: 38814338 PMCID: PMC11392129 DOI: 10.1097/js9.0000000000001757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 05/31/2024]
Affiliation(s)
- Zhiwei Song
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
| | - Yiya Xu
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
| | - Yingchao He
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
| | - Yinzhou Wang
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
- Fujian Key Laboratory of Medical Analysis, Fujian Academy of Medical Sciences, Fuzhou, Fujian, People's Republic of China
| |
Collapse
|
7
|
Wu H, Li W, Chen X, Li C. Not just disclosure of generative artificial intelligence like ChatGPT in scientific writing: peer-review process also needs. Int J Surg 2024; 110:5845-5846. [PMID: 38729102 PMCID: PMC11392203 DOI: 10.1097/js9.0000000000001619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 04/29/2024] [Indexed: 05/12/2024]
Affiliation(s)
- Haiyang Wu
- Department of Orthopaedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou
- Department of Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin
| | - Wanqing Li
- Department of Operating Room, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang
| | - Xiaofeng Chen
- Department of Orthopaedic Surgery, Yangxin People’s Hospital, Yangxin, Hubei
| | - Cheng Li
- Department of Orthopaedic Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing, People’s Republic of China
- Center for Musculoskeletal Surgery (CMSC), Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt University of Berlin, Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
8
|
Hassona Y, Alqaisi D, Al-Haddad A, Georgakopoulou EA, Malamos D, Alrashdan MS, Sawair F. How good is ChatGPT at answering patients' questions related to early detection of oral (mouth) cancer? Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 138:269-278. [PMID: 38714483 DOI: 10.1016/j.oooo.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 03/22/2024] [Accepted: 04/14/2024] [Indexed: 05/10/2024]
Abstract
OBJECTIVES To examine the quality, reliability, readability, and usefulness of ChatGPT in promoting oral cancer early detection. STUDY DESIGN About 108 patient-oriented questions about oral cancer early detection were compiled from expert panels, professional societies, and web-based tools. Questions were categorized into 4 topic domains and ChatGPT 3.5 was asked each question independently. ChatGPT answers were evaluated regarding quality, readability, actionability, and usefulness using. Two experienced reviewers independently assessed each response. RESULTS Questions related to clinical appearance constituted 36.1% (n = 39) of the total questions. ChatGPT provided "very useful" responses to the majority of questions (75%; n = 81). The mean Global Quality Score was 4.24 ± 1.3 of 5. The mean reliability score was 23.17 ± 9.87 of 25. The mean understandability score was 76.6% ± 25.9% of 100, while the mean actionability score was 47.3% ± 18.9% of 100. The mean FKS reading ease score was 38.4% ± 29.9%, while the mean SMOG index readability score was 11.65 ± 8.4. No misleading information was identified among ChatGPT responses. CONCLUSION ChatGPT is an attractive and potentially useful resource for informing patients about early detection of oral cancer. Nevertheless, concerns do exist about readability and actionability of the offered information.
Collapse
Affiliation(s)
- Yazan Hassona
- Faculty of Dentistry, Centre for Oral Diseases Studies (CODS), Al-Ahliyya Amman University, Jordan; School of Dentistry, The University of Jordan, Jordan.
| | - Dua'a Alqaisi
- School of Dentistry, The University of Jordan, Jordan
| | | | - Eleni A Georgakopoulou
- Molecular Carcinogenesis Group, Department of Histology and Embryology, Medical School, National and Kapodistrian University of Athens, Greece
| | - Dimitris Malamos
- Oral Medicine Clinic of the National Organization for the Provision of Health, Athens, Greece
| | - Mohammad S Alrashdan
- Department of Oral and Craniofacial Health Sciences, College of Dental Medicine, University of Sharjah, Sharjah, United Arab Emirates
| | - Faleh Sawair
- School of Dentistry, The University of Jordan, Jordan
| |
Collapse
|
9
|
Su Z, Tang G, Huang R, Qiao Y, Zhang Z, Dai X. Based on Medicine, The Now and Future of Large Language Models. Cell Mol Bioeng 2024; 17:263-277. [PMID: 39372551 PMCID: PMC11450117 DOI: 10.1007/s12195-024-00820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/08/2024] [Indexed: 10/08/2024] Open
Abstract
OBJECTIVES This review explores the potential applications of large language models (LLMs) such as ChatGPT, GPT-3.5, and GPT-4 in the medical field, aiming to encourage their prudent use, provide professional support, and develop accessible medical AI tools that adhere to healthcare standards. METHODS This paper examines the impact of technologies such as OpenAI's Generative Pre-trained Transformers (GPT) series, including GPT-3.5 and GPT-4, and other large language models (LLMs) in medical education, scientific research, clinical practice, and nursing. Specifically, it includes supporting curriculum design, acting as personalized learning assistants, creating standardized simulated patient scenarios in education; assisting with writing papers, data analysis, and optimizing experimental designs in scientific research; aiding in medical imaging analysis, decision-making, patient education, and communication in clinical practice; and reducing repetitive tasks, promoting personalized care and self-care, providing psychological support, and enhancing management efficiency in nursing. RESULTS LLMs, including ChatGPT, have demonstrated significant potential and effectiveness in the aforementioned areas, yet their deployment in healthcare settings is fraught with ethical complexities, potential lack of empathy, and risks of biased responses. CONCLUSION Despite these challenges, significant medical advancements can be expected through the proper use of LLMs and appropriate policy guidance. Future research should focus on overcoming these barriers to ensure the effective and ethical application of LLMs in the medical field.
Collapse
Affiliation(s)
- Ziqing Su
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Guozhang Tang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The Second Clinical College of Anhui Medical University, Hefei, 230032 Anhui P.R. China
| | - Rui Huang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Yang Qiao
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
| | - Zheng Zhang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Xingliang Dai
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Research & Development, East China Institute of Digital Medical Engineering, Shangrao, 334000 P.R. China
| |
Collapse
|
10
|
Hsieh CH, Hsieh HY, Lin HP. Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination. Heliyon 2024; 10:e34851. [PMID: 39149010 PMCID: PMC11324965 DOI: 10.1016/j.heliyon.2024.e34851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 06/27/2024] [Accepted: 07/17/2024] [Indexed: 08/17/2024] Open
Abstract
Background Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination. Methods The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias. Results Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5. Conclusion As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.
Collapse
Affiliation(s)
- Ching-Hua Hsieh
- Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan
| | - Hsiao-Yun Hsieh
- Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan
| | - Hui-Ping Lin
- Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan
| |
Collapse
|
11
|
Jo E, Song S, Kim JH, Lim S, Kim JH, Cha JJ, Kim YM, Joo HJ. Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts. JMIR MEDICAL EDUCATION 2024; 10:e51282. [PMID: 38989848 PMCID: PMC11250047 DOI: 10.2196/51282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 04/10/2024] [Accepted: 04/19/2024] [Indexed: 07/12/2024]
Abstract
Background Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI's GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse. Objective This study aims to compare the medical accuracy of GPT-4 with human experts in providing medical advice using real-world user-generated queries, with a specific focus on cardiology. It also sought to analyze the performance of GPT-4 and human experts in specific question categories, including drug or medication information and preliminary diagnoses. Methods We collected 251 pairs of cardiology-specific questions from general users and answers from human experts via an internet portal. GPT-4 was tasked with generating responses to the same questions. Three independent cardiologists (SL, JHK, and JJC) evaluated the answers provided by both human experts and GPT-4. Using a computer interface, each evaluator compared the pairs and determined which answer was superior, and they quantitatively measured the clarity and complexity of the questions as well as the accuracy and appropriateness of the responses, applying a 3-tiered grading scale (low, medium, and high). Furthermore, a linguistic analysis was conducted to compare the length and vocabulary diversity of the responses using word count and type-token ratio. Results GPT-4 and human experts displayed comparable efficacy in medical accuracy ("GPT-4 is better" at 132/251, 52.6% vs "Human expert is better" at 119/251, 47.4%). In accuracy level categorization, humans had more high-accuracy responses than GPT-4 (50/237, 21.1% vs 30/238, 12.6%) but also a greater proportion of low-accuracy responses (11/237, 4.6% vs 1/238, 0.4%; P=.001). GPT-4 responses were generally longer and used a less diverse vocabulary than those of human experts, potentially enhancing their comprehensibility for general users (sentence count: mean 10.9, SD 4.2 vs mean 5.9, SD 3.7; P<.001; type-token ratio: mean 0.69, SD 0.07 vs mean 0.79, SD 0.09; P<.001). Nevertheless, human experts outperformed GPT-4 in specific question categories, notably those related to drug or medication information and preliminary diagnoses. These findings highlight the limitations of GPT-4 in providing advice based on clinical experience. Conclusions GPT-4 has shown promising potential in automated medical consultation, with comparable medical accuracy to human experts. However, challenges remain particularly in the realm of nuanced clinical judgment. Future improvements in LLMs may require the integration of specific clinical reasoning pathways and regulatory oversight for safe use. Further research is needed to understand the full potential of LLMs across various medical specialties and conditions.
Collapse
Affiliation(s)
- Eunbeen Jo
- Department of Medical Informatics, Korea University College of Medicine, Seoul, Republic of Korea
| | - Sanghoun Song
- Department of Linguistics, Korea University, Seoul, Republic of Korea
| | - Jong-Ho Kim
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, Republic of Korea
- Department of Cardiology, Cardiovascular Center, Korea University College of Medicine, Seoul, Republic of Korea
| | - Subin Lim
- Division of Cardiology, Department of Internal Medicine, Korea University Anam Hospital, Seoul, Republic of Korea
| | - Ju Hyeon Kim
- Division of Cardiology, Department of Internal Medicine, Korea University Anam Hospital, Seoul, Republic of Korea
| | - Jung-Joon Cha
- Division of Cardiology, Department of Internal Medicine, Korea University Anam Hospital, Seoul, Republic of Korea
| | - Young-Min Kim
- School of Interdisciplinary Industrial Studies, Hanyang University, Seoul, Republic of Korea
| | - Hyung Joon Joo
- Department of Medical Informatics, Korea University College of Medicine, Seoul, Republic of Korea
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, Republic of Korea
- Department of Cardiology, Cardiovascular Center, Korea University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
12
|
Edalati S, Vasan V, Cheng CP, Patel Z, Govindaraj S, Iloreta AM. Can GPT-4 revolutionize otolaryngology? Navigating opportunities and ethical considerations. Am J Otolaryngol 2024; 45:104303. [PMID: 38678799 DOI: 10.1016/j.amjoto.2024.104303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 04/14/2024] [Indexed: 05/01/2024]
Abstract
Otolaryngologists can enhance workflow efficiency, provide better patient care, and advance medical research and education by integrating artificial intelligence (AI) into their practices. GPT-4 technology is a revolutionary and contemporary example of AI that may apply to otolaryngology. The knowledge of otolaryngologists should be supplemented, not replaced when using GPT-4 to make critical medical decisions and provide individualized patient care. In our thorough examination, we explore the potential uses of the groundbreaking GPT-4 technology in the field of otolaryngology, covering aspects such as potential outcomes and technical boundaries. Additionally, we delve into the intricate and intellectually challenging dilemmas that emerge when incorporating GPT-4 into otolaryngology, considering the ethical considerations inherent in its implementation. Our stance is that GPT-4 has the potential to be very helpful. Its capabilities, which include aid in clinical decision-making, patient care, and administrative job automation, present exciting possibilities for enhancing patient outcomes, boosting the efficiency of healthcare delivery, and enhancing patient experiences. Even though there are still certain obstacles and limitations, the progress made so far shows that GPT-4 can be a valuable tool for modern medicine. GPT-4 may play a more significant role in clinical practice as technology develops, helping medical professionals deliver high-quality care tailored to every patient's unique needs.
Collapse
Affiliation(s)
- Shaun Edalati
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Vikram Vasan
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Christopher P Cheng
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zara Patel
- Department of Otolaryngology-Head & Neck Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Satish Govindaraj
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alfred Marc Iloreta
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
13
|
Law S, Oldfield B, Yang W. ChatGPT/GPT-4 (large language models): Opportunities and challenges of perspective in bariatric healthcare professionals. Obes Rev 2024; 25:e13746. [PMID: 38613164 DOI: 10.1111/obr.13746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 04/14/2024]
Abstract
ChatGPT/GPT-4 is a conversational large language model (LLM) based on artificial intelligence (AI). The potential application of LLM as a virtual assistant for bariatric healthcare professionals in education and practice may be promising if relevant and valid issues are actively examined and addressed. In general medical terms, it is possible that AI models like ChatGPT/GPT-4 will be deeply integrated into medical scenarios, improving medical efficiency and quality, and allowing doctors more time to communicate with patients and implement personalized health management. Chatbots based on AI have great potential in bariatric healthcare and may play an important role in predicting and intervening in weight loss and obesity-related complications. However, given its potential limitations, we should carefully consider the medical, legal, ethical, data security, privacy, and liability issues arising from medical errors caused by ChatGPT/GPT-4. This concern also extends to ChatGPT/GPT -4's ability to justify wrong decisions, and there is an urgent need for appropriate guidelines and regulations to ensure the safe and responsible use of ChatGPT/GPT-4.
Collapse
Affiliation(s)
- Saikam Law
- Department of Metabolic and Bariatric Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, China
- School of Medicine, Jinan University, Guangzhou, China
| | - Brian Oldfield
- Department of Physiology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| | - Wah Yang
- Department of Metabolic and Bariatric Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, China
| |
Collapse
|
14
|
Luo S, Canavese F, Aroojis A, Andreacchio A, Anticevic D, Bouchard M, Castaneda P, De Rosa V, Fiogbe MA, Frick SL, Hui JH, Johari AN, Loro A, Lyu X, Matsushita M, Omeroglu H, Roye DP, Shah MM, Yong B, Li L. Are Generative Pretrained Transformer 4 Responses to Developmental Dysplasia of the Hip Clinical Scenarios Universal? An International Review. J Pediatr Orthop 2024; 44:e504-e511. [PMID: 38597198 DOI: 10.1097/bpo.0000000000002682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
OBJECTIVE There is increasing interest in applying artificial intelligence chatbots like generative pretrained transformer 4 (GPT-4) in the medical field. This study aimed to explore the universality of GPT-4 responses to simulated clinical scenarios of developmental dysplasia of the hip (DDH) across diverse global settings. METHODS Seventeen international experts with more than 15 years of experience in pediatric orthopaedics were selected for the evaluation panel. Eight simulated DDH clinical scenarios were created, covering 4 key areas: (1) initial evaluation and diagnosis, (2) initial examination and treatment, (3) nursing care and follow-up, and (4) prognosis and rehabilitation planning. Each scenario was completed independently in a new GPT-4 session. Interrater reliability was assessed using Fleiss kappa, and the quality, relevance, and applicability of GPT-4 responses were analyzed using median scores and interquartile ranges. Following scoring, experts met in ZOOM sessions to generate Regional Consensus Assessment Scores, which were intended to represent a consistent regional assessment of the use of the GPT-4 in pediatric orthopaedic care. RESULTS GPT-4's responses to the 8 clinical DDH scenarios received performance scores ranging from 44.3% to 98.9% of the 88-point maximum. The Fleiss kappa statistic of 0.113 ( P = 0.001) indicated low agreement among experts in their ratings. When assessing the responses' quality, relevance, and applicability, the median scores were 3, with interquartile ranges of 3 to 4, 3 to 4, and 2 to 3, respectively. Significant differences were noted in the prognosis and rehabilitation domain scores ( P < 0.05 for all). Regional consensus scores were 75 for Africa, 74 for Asia, 73 for India, 80 for Europe, and 65 for North America, with the Kruskal-Wallis test highlighting significant disparities between these regions ( P = 0.034). CONCLUSIONS This study demonstrates the promise of GPT-4 in pediatric orthopaedic care, particularly in supporting preliminary DDH assessments and guiding treatment strategies for specialist care. However, effective integration of GPT-4 into clinical practice will require adaptation to specific regional health care contexts, highlighting the importance of a nuanced approach to health technology adaptation. LEVEL OF EVIDENCE Level IV.
Collapse
Affiliation(s)
- Shaoting Luo
- Department of Pediatric Orthopaedics, Shengjing Hospital of China Medical University, Shenyang, Liaoning
| | - Federico Canavese
- Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA
| | - Alaric Aroojis
- Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA
| | - Antonio Andreacchio
- Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA
| | - Darko Anticevic
- Pediatric Orthopedics Clinic of Pediatric Surgery and Orthopedics, Pediatric Institute of Southern Switzerland (IPSI), Via Athos Gallino, Bellinzona, Switzerland
| | | | - Pablo Castaneda
- Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA
| | - Vincenzo De Rosa
- Pediatric Orthopedics Clinic of Pediatric Surgery and Orthopedics, Pediatric Institute of Southern Switzerland (IPSI), Via Athos Gallino, Bellinzona, Switzerland
| | | | - Steven L Frick
- Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA
| | - James H Hui
- Department of Orthopaedic Surgery, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | - Ashok N Johari
- Pediatric Orthopedics Clinic of Pediatric Surgery and Orthopedics, Pediatric Institute of Southern Switzerland (IPSI), Via Athos Gallino, Bellinzona, Switzerland
| | - Antonio Loro
- Ufuk University Faculty of Medicine, Ankara, Turkey
| | - Xuemin Lyu
- Department of Orthopaedic Surgery, School of Medicine, Stanford University, Palo Alto, CA
| | - Masaki Matsushita
- Department of Orthopaedic Surgery, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | | | - David P Roye
- Department of Orthopaedic Surgery, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan
| | | | - Bicheng Yong
- Department of Pediatric Orthopaedics, Beit CURE Children's Hospital of Malawi, Chichiri Blantyre, Malawi
| | - Lianyong Li
- Department of Pediatric Orthopaedics, Shengjing Hospital of China Medical University, Shenyang, Liaoning
| |
Collapse
|
15
|
Gomez-Cabello CA, Borna S, Pressman SM, Haider SA, Forte AJ. Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini. MEDICINA (KAUNAS, LITHUANIA) 2024; 60:957. [PMID: 38929573 PMCID: PMC11205293 DOI: 10.3390/medicina60060957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 06/06/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024]
Abstract
Background and Objectives: Large language models (LLMs) are emerging as valuable tools in plastic surgery, potentially reducing surgeons' cognitive loads and improving patients' outcomes. This study aimed to assess and compare the current state of the two most common and readily available LLMs, Open AI's ChatGPT-4 and Google's Gemini Pro (1.0 Pro), in providing intraoperative decision support in plastic and reconstructive surgery procedures. Materials and Methods: We presented each LLM with 32 independent intraoperative scenarios spanning 5 procedures. We utilized a 5-point and a 3-point Likert scale for medical accuracy and relevance, respectively. We determined the readability of the responses using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) score. Additionally, we measured the models' response time. We compared the performance using the Mann-Whitney U test and Student's t-test. Results: ChatGPT-4 significantly outperformed Gemini in providing accurate (3.59 ± 0.84 vs. 3.13 ± 0.83, p-value = 0.022) and relevant (2.28 ± 0.77 vs. 1.88 ± 0.83, p-value = 0.032) responses. Alternatively, Gemini provided more concise and readable responses, with an average FKGL (12.80 ± 1.56) significantly lower than ChatGPT-4's (15.00 ± 1.89) (p < 0.0001). However, there was no difference in the FRE scores (p = 0.174). Moreover, Gemini's average response time was significantly faster (8.15 ± 1.42 s) than ChatGPT'-4's (13.70 ± 2.87 s) (p < 0.0001). Conclusions: Although ChatGPT-4 provided more accurate and relevant responses, both models demonstrated potential as intraoperative tools. Nevertheless, their performance inconsistency across the different procedures underscores the need for further training and optimization to ensure their reliability as intraoperative decision-support tools.
Collapse
Affiliation(s)
- Cesar A. Gomez-Cabello
- Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL 32224, USA
| | - Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL 32224, USA
| | - Sophia M. Pressman
- Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL 32224, USA
| | - Syed Ali Haider
- Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL 32224, USA
| | - Antonio J. Forte
- Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL 32224, USA
- Center for Digital Health, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| |
Collapse
|
16
|
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Bolzoni A, Committeri U, Crimi S, Gabriele G, Lonardi F, Maglitto F, Petrocelli M, Pucci R, Saponaro G, Tel A, Vellone V, Chiesa-Estomba CM, Boscolo-Rizzo P, Salzano G, De Riu G. Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis. Otolaryngol Head Neck Surg 2024; 170:1492-1503. [PMID: 37595113 DOI: 10.1002/ohn.489] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/16/2023] [Accepted: 07/14/2023] [Indexed: 08/20/2023]
Abstract
OBJECTIVE To investigate the accuracy of Chat-Based Generative Pre-trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery. STUDY DESIGN Observational and valuative study. SETTING Eighteen surgeons from 14 Italian head and neck surgery units. METHODS A total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using accuracy (range 1-6), completeness (range 1-3), and references' quality Likert scales. RESULTS The overall median score of open-ended questions was 6 (interquartile range[IQR]: 5-6) for accuracy and 3 (IQR: 2-3) for completeness. Overall, the reviewers rated the answer as entirely or nearly entirely correct in 87.2% of cases and as comprehensive and covering all aspects of the question in 73% of cases. The artificial intelligence (AI) model achieved a correct response in 84.7% of the closed-ended questions (11 wrong answers). As for the clinical scenarios, ChatGPT provided a fully or nearly fully correct diagnosis in 81.7% of cases. The proposed diagnostic or therapeutic procedure was judged to be complete in 56.7% of cases. The overall quality of the bibliographic references was poor, and sources were nonexistent in 46.4% of the cases. CONCLUSION The results generally demonstrate a good level of accuracy in the AI's answers. The AI's ability to resolve complex clinical scenarios is promising, but it still falls short of being considered a reliable support for the decision-making process of specialists in head-neck surgery.
Collapse
Affiliation(s)
- Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
- Biomedical Sciences Department, PhD School of Biomedical Science, University of Sassari, Sassari, Italy
| | - Jerome R Lechien
- Department of Anatomy and Experimental Oncology, Mons School of Medicine, UMONS, Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
- Department of Otolaryngology-Head Neck Surgery, Elsan Polyclinic of Poitiers, Poitiers, France
| | - Vincenzo Abbate
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Fabiana Allevi
- Maxillofacial Surgery Department, ASSt Santi Paolo e Carlo, University of Milan, Milan, Italy
| | - Giovanni Audino
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giada Anna Beltramini
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
- Maxillofacial and Dental Unit, Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Michela Bergonzani
- Maxillo-Facial Surgery Division, Head and Neck Department, University Hospital of Parma, Parma, Italy
| | - Alessandro Bolzoni
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
| | - Umberto Committeri
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Salvatore Crimi
- Operative Unit of Maxillofacial Surgery, Policlinico San Marco, University of Catania, Catania, Italy
| | - Guido Gabriele
- Department of Maxillofacial Surgery, University of Siena, Siena, Italy
| | - Fabio Lonardi
- Department of Maxillofacial Surgery, University of Verona, Verona, Italy
| | - Fabio Maglitto
- Maxillo-Facial Surgery Unit, University of Bari "Aldo Moro", Bari, Italy
| | - Marzia Petrocelli
- Maxillofacial Surgery Operative Unit, Bellaria and Maggiore Hospital, Bologna, Italy
| | - Resi Pucci
- Maxillofacial Surgery Unit, San Camillo-Forlanini Hospital, Rome, Italy
| | - Gianmarco Saponaro
- Maxillo-Facial Surgery Unit, IRCSS "A. Gemelli" Foundation-Catholic, University of the Sacred Heart, Rome, Italy
| | - Alessandro Tel
- Department of Head and Neck Surgery and Neuroscience, Clinic of Maxillofacial Surgery, University Hospital of Udine, Udine, Italy
| | | | | | - Paolo Boscolo-Rizzo
- Department of Medical, Surgical and Health Sciences, Section of Otolaryngology, University of Trieste, Trieste, Italy
| | - Giovanni Salzano
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giacomo De Riu
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
| |
Collapse
|
17
|
Le KDR, Tay SBP, Choy KT, Verjans J, Sasanelli N, Kong JCH. Applications of natural language processing tools in the surgical journey. Front Surg 2024; 11:1403540. [PMID: 38826809 PMCID: PMC11140056 DOI: 10.3389/fsurg.2024.1403540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 05/07/2024] [Indexed: 06/04/2024] Open
Abstract
Background Natural language processing tools are becoming increasingly adopted in multiple industries worldwide. They have shown promising results however their use in the field of surgery is under-recognised. Many trials have assessed these benefits in small settings with promising results before large scale adoption can be considered in surgery. This study aims to review the current research and insights into the potential for implementation of natural language processing tools into surgery. Methods A narrative review was conducted following a computer-assisted literature search on Medline, EMBASE and Google Scholar databases. Papers related to natural language processing tools and consideration into their use for surgery were considered. Results Current applications of natural language processing tools within surgery are limited. From the literature, there is evidence of potential improvement in surgical capability and service delivery, such as through the use of these technologies to streamline processes including surgical triaging, data collection and auditing, surgical communication and documentation. Additionally, there is potential to extend these capabilities to surgical academia to improve processes in surgical research and allow innovation in the development of educational resources. Despite these outcomes, the evidence to support these findings are challenged by small sample sizes with limited applicability to broader settings. Conclusion With the increasing adoption of natural language processing technology, such as in popular forms like ChatGPT, there has been increasing research in the use of these tools within surgery to improve surgical workflow and efficiency. This review highlights multifaceted applications of natural language processing within surgery, albeit with clear limitations due to the infancy of the infrastructure available to leverage these technologies. There remains room for more rigorous research into broader capability of natural language processing technology within the field of surgery and the need for cross-sectoral collaboration to understand the ways in which these algorithms can best be integrated.
Collapse
Affiliation(s)
- Khang Duy Ricky Le
- Department of General Surgical Specialties, The Royal Melbourne Hospital, Melbourne, VIC, Australia
- Department of Surgical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Geelong Clinical School, Deakin University, Geelong, VIC, Australia
- Department of Medical Education, The University of Melbourne, Melbourne, VIC, Australia
| | - Samuel Boon Ping Tay
- Department of Anaesthesia and Pain Medicine, Eastern Health, Box Hill, VIC, Australia
| | - Kay Tai Choy
- Department of Surgery, Austin Health, Melbourne, VIC, Australia
| | - Johan Verjans
- Australian Institute for Machine Learning (AIML), University of Adelaide, Adelaide, SA, Australia
- Lifelong Health Theme (Platform AI), South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Nicola Sasanelli
- Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, SA, Australia
- Department of Operations (Strategic and International Partnerships), SmartSAT Cooperative Research Centre, Adelaide, SA, Australia
- Agora High Tech, Adelaide, SA, Australia
| | - Joseph C. H. Kong
- Department of Surgical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Monash University Department of Surgery, Alfred Hospital, Melbourne, VIC, Australia
- Department of Colorectal Surgery, Alfred Hospital, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
18
|
Shieh A, Tran B, He G, Kumar M, Freed JA, Majety P. Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Sci Rep 2024; 14:9330. [PMID: 38654011 PMCID: PMC11039662 DOI: 10.1038/s41598-024-58760-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
While there is data assessing the test performance of artificial intelligence (AI) chatbots, including the Generative Pre-trained Transformer 4.0 (GPT 4) chatbot (ChatGPT 4.0), there is scarce data on its diagnostic accuracy of clinical cases. We assessed the large language model (LLM), ChatGPT 4.0, on its ability to answer questions from the United States Medical Licensing Exam (USMLE) Step 2, as well as its ability to generate a differential diagnosis based on corresponding clinical vignettes from published case reports. A total of 109 Step 2 Clinical Knowledge (CK) practice questions were inputted into both ChatGPT 3.5 and ChatGPT 4.0, asking ChatGPT to pick the correct answer. Compared to its previous version, ChatGPT 3.5, we found improved accuracy of ChatGPT 4.0 when answering these questions, from 47.7 to 87.2% (p = 0.035) respectively. Utilizing the topics tested on Step 2 CK questions, we additionally found 63 corresponding published case report vignettes and asked ChatGPT 4.0 to come up with its top three differential diagnosis. ChatGPT 4.0 accurately created a shortlist of differential diagnoses in 74.6% of the 63 case reports (74.6%). We analyzed ChatGPT 4.0's confidence in its diagnosis by asking it to rank its top three differentials from most to least likely. Out of the 47 correct diagnoses, 33 were the first (70.2%) on the differential diagnosis list, 11 were second (23.4%), and three were third (6.4%). Our study shows the continued iterative improvement in ChatGPT's ability to answer standardized USMLE questions accurately and provides insights into ChatGPT's clinical diagnostic accuracy.
Collapse
Affiliation(s)
- Allen Shieh
- Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Brandon Tran
- Virginia Commonwealth University School of Medicine, Richmond, VA, USA.
| | - Gene He
- Virginia Commonwealth University School of Medicine, Richmond, VA, USA
| | - Mudit Kumar
- Division of Child and Adolescent Psychiatry, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Jason A Freed
- Division of Hematology and Hematologic Malignancies, Department of Internal Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Priyanka Majety
- Division of Endocrinology, Diabetes and Metabolism, Department of Internal Medicine, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
19
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
20
|
Yu QX, Feng DC, Wu RC, Li DX. Auxiliary use of ChatGPT in surgical diagnosis and treatment - correspondence. Int J Surg 2024; 110:617-618. [PMID: 38315798 PMCID: PMC10793754 DOI: 10.1097/js9.0000000000000818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 09/24/2023] [Indexed: 02/07/2024]
Affiliation(s)
- Qing-xin Yu
- Department of Pathology, Ningbo Clinical Pathology Diagnosis Center, Ningbo City, Zhejiang Province
| | - De-chao Feng
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People’s Republic of China
| | - Rui-cheng Wu
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People’s Republic of China
| | - Deng-xiong Li
- Department of Urology, Institute of Urology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People’s Republic of China
| |
Collapse
|
21
|
He SK, Tu T, Deng BW, Bai YJ. Surgery in the era of ChatGPT: A bibliometric analysis based on web of science. Asian J Surg 2024; 47:784-785. [PMID: 37879994 DOI: 10.1016/j.asjsur.2023.10.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 10/06/2023] [Indexed: 10/27/2023] Open
Affiliation(s)
- Si-Ke He
- Department of Urology, West China Hospital, Sichuan University, Chengdu, China
| | - Teng Tu
- West China School of Medicine, Sichuan University, Chengdu, China
| | | | - Yun-Jin Bai
- Department of Urology, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
22
|
Wang D, He Y, Ma Y, Wu H, Ni G. The Era of Artificial Intelligence: Talking About the Potential Application Value of ChatGPT/GPT-4 in Foot and Ankle Surgery. J Foot Ankle Surg 2024; 63:1-3. [PMID: 37516342 DOI: 10.1053/j.jfas.2023.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/12/2023] [Accepted: 07/19/2023] [Indexed: 07/31/2023]
Affiliation(s)
- Dongxue Wang
- School of Sport Medicine and Rehabilitation, Beijing Sport University, Beijing, China
| | - Yongbin He
- School of Sport Medicine and Rehabilitation, Beijing Sport University, Beijing, China
| | - Yixuan Ma
- College of Education, Beijing Sport University, Beijing, China
| | - Haiyang Wu
- Graduate School of Tianjin Medical University, Tianjin, China; Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC.
| | - Guoxin Ni
- Department of Rehabilitation Medicine, The First Affiliated Hospital of Xiamen University, Xiamen, China.
| |
Collapse
|
23
|
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
ChatGPT has quickly popularized since its release in November 2022. Currently, large language models (LLMs) and ChatGPT have been applied in various domains of medical science, including in cardiology, nephrology, orthopedics, ophthalmology, gastroenterology, and radiology. Researchers are exploring the potential of LLMs and ChatGPT for clinicians and surgeons in every domain. This study discusses how ChatGPT can help orthopedic clinicians and surgeons perform various medical tasks. LLMs and ChatGPT can help the patient community by providing suggestions and diagnostic guidelines. In this study, the use of LLMs and ChatGPT to enhance and expand the field of orthopedics, including orthopedic education, surgery, and research, is explored. Present LLMs have several shortcomings, which are discussed herein. However, next-generation and future domain-specific LLMs are expected to be more potent and transform patients' quality of life.
Collapse
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
24
|
Luo S, Deng L, Chen Y, Zhou W, Canavese F, Li L. Revolutionizing pediatric orthopedics: GPT-4, a groundbreaking innovation or just a fleeting trend? Int J Surg 2023; 109:3694-3697. [PMID: 37737896 PMCID: PMC10651230 DOI: 10.1097/js9.0000000000000610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 07/01/2023] [Indexed: 09/23/2023]
Affiliation(s)
- Shaoting Luo
- Department of Pediatric Orthopedics, Shengjing Hospital of China Medical University, Shenyang
| | - Linfang Deng
- Department of Nursing, Jinzhou Medical University, Jinzhou, Liaoning, People’s Republic of China
| | - Yufan Chen
- Department of Pediatric Orthopedics, Shengjing Hospital of China Medical University, Shenyang
| | - Weizheng Zhou
- Department of Pediatric Orthopedics, Shengjing Hospital of China Medical University, Shenyang
| | - Federico Canavese
- Department of Pediatric Orthopedic Surgery, Lille University Centre, Jeanne de Flandre Hospital, Lille, France
| | - Lianyong Li
- Department of Pediatric Orthopedics, Shengjing Hospital of China Medical University, Shenyang
| |
Collapse
|
25
|
Yu H. A Cogitation on the ChatGPT Craze from the Perspective of Psychological Algorithm Aversion and Appreciation. Psychol Res Behav Manag 2023; 16:3837-3844. [PMID: 37724135 PMCID: PMC10505389 DOI: 10.2147/prbm.s430936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/08/2023] [Indexed: 09/20/2023] Open
Abstract
In recent times, ChatGPT has garnered significant interest from the public, sparking a range of reactions that encompass both aversion and appreciation. This paper delves into the paradoxical attitudes of individuals towards ChatGPT, highlighting the simultaneous existence of algorithmic aversion and appreciation. A comprehensive analysis is conducted from the vantage points of psychology and algorithmic decision-making, exploring the underlying causes of these conflicting attitudes from three dimensions: self-performance, task types, and individual factors. Subsequently, strategies to reconcile these opposing psychological stances are proposed, delineated into two categories: flexible coping and inflexible coping. In light of the ongoing advancements in artificial intelligence, this paper posits recommendations for the attitudes and actions that individuals ought to adopt in the face of artificial intelligence. Regardless of whether one exhibits algorithm aversion or appreciation, the paper underscores that coexisting with algorithms is an inescapable reality in the age of artificial intelligence, necessitating the preservation of human advantages.
Collapse
Affiliation(s)
- Hao Yu
- Faculty of Education, Shaanxi Normal University, Xi’an, Shaanxi, People’s Republic of China
| |
Collapse
|
26
|
Ahmed SK, Hussein S, Essa RA. The role of ChatGPT in cardiothoracic surgery. Indian J Thorac Cardiovasc Surg 2023; 39:562-563. [PMID: 37609604 PMCID: PMC10441939 DOI: 10.1007/s12055-023-01568-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/19/2023] [Accepted: 07/03/2023] [Indexed: 08/24/2023] Open
Affiliation(s)
- Sirwan Khalid Ahmed
- Ministry of Health, General Health Directorate of Raparin, Rania, Sulaymaniyah 46012 Iraq
| | - Safin Hussein
- Department of Biology, College of Science, University of Raparin, Rania, Sulaymaniyah, 46012 Iraq
| | | |
Collapse
|
27
|
Cheng K, Wu H, Li C. ChatGPT/GPT-4: enabling a new era of surgical oncology. Int J Surg 2023; 109:2549-2550. [PMID: 37195797 PMCID: PMC10442081 DOI: 10.1097/js9.0000000000000451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 05/01/2023] [Indexed: 05/18/2023]
Affiliation(s)
- Kunming Cheng
- Department of Intensive Care Unit, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan
| | - Haiyang Wu
- Department of Graduate School
- Clinical College of Neurology, Neurosurgery and Neurorehabilitation, Tianjin Medical University, Tianjin
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, North Carolina, USA
| | - Cheng Li
- Department of Orthopaedic Surgery, Beijing Jishuitan Hospital, Fourth Clinical College of Peking University, Beijing, People’s Republic of China
| |
Collapse
|
28
|
He Y, Wu H, Chen Y, Wang D, Tang W, Moody MA, Ni G, Gu S. Can ChatGPT/GPT-4 assist surgeons in confronting patients with Mpox and handling future epidemics? Int J Surg 2023; 109:2544-2548. [PMID: 37161504 PMCID: PMC10442131 DOI: 10.1097/js9.0000000000000453] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 05/01/2023] [Indexed: 05/11/2023]
Affiliation(s)
- Yongbin He
- School of Sport Medicine and Rehabilitation, Beijing Sport University, Beijing
- Department of Orthopedics, The Fifth Affiliated Hospital of Zunyi Medical University, Zhuhai
| | - Haiyang Wu
- Department of Spine Surgery, Tianjin Huanhu Hospital, Graduate School of Tianjin Medical University, Tianjin
- Duke Molecular Physiology Institute
| | - Yan Chen
- School of Sport Medicine and Rehabilitation, Beijing Sport University, Beijing
| | - Dewei Wang
- Department of Orthopedics, The Fifth Affiliated Hospital of Zunyi Medical University, Zhuhai
| | - Weiming Tang
- University of North Carolina Project-China, Guangzhou
- Department of Medicine, University of North Carolina Institute for Global Health and Infectious Diseases, Chapel Hill, NC
| | - M. Anthony Moody
- Division of Infectious Diseases, Department of Pediatrics, Duke University School of Medicine
- Duke Human Vaccine Institute, Duke University Medical Center, Durham
| | - Guoxin Ni
- Department of Rehabilitation Medicine, The First Affiliated Hospital of Xiamen University, Xiamen, China
| | - Shuqin Gu
- Duke Human Vaccine Institute, Duke University Medical Center, Durham
| |
Collapse
|
29
|
Roman A, Al-Sharif L, Al Gharyani M. The Expanding Role of ChatGPT (Chat-Generative Pre-Trained Transformer) in Neurosurgery: A Systematic Review of Literature and Conceptual Framework. Cureus 2023; 15:e43502. [PMID: 37719492 PMCID: PMC10500385 DOI: 10.7759/cureus.43502] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2023] [Indexed: 09/19/2023] Open
Abstract
The objective of this study is to explore the use of ChatGPT (Chat-Generative Pre-Trained Transformer) in neurosurgery and its potential impact on the field. The authors aim to discuss, through a systematic review of current literature, how this rising new artificial intelligence (AI) technology may prove to be a useful tool in the future, weighing its potential benefits and limitations. The authors conducted a comprehensive and systematic literature review of the use of ChatGPT and its applications in healthcare and different neurosurgery topics. Through a systematic review of the literature, with a search strategy using the databases such as PubMed, Google Scholar, and Embase, we analyzed the advantages and limitations of using ChatGPT in neurosurgery and evaluated its potential impact. ChatGPT has demonstrated promising results in various applications, such as natural language processing, language translation, and text summarization. In neurosurgery, ChatGPT can assist in different areas such as surgical planning, image recognition, medical diagnosis, patient care, and scientific production. A total of 128 articles were retrieved from databases, where the final 22 articles were included for thorough analysis. The studies reviewed demonstrate the potential of AI and deep learning (DL), through language models such as ChatGPT, to improve the accuracy and efficiency of neurosurgical procedures, as well as diagnosis, treatment, and patient outcomes across various medical specialties, including neurosurgery. There are, however, limitations to its use, including the need for large datasets and the potential for errors in the output, which most authors concur will need human verification for the final application. Our search demonstrated the potential that ChatGPT holds for the present and future, in accordance with the studies' authors' findings herein analyzed and expert opinions. Further research and development are required to fully understand its capabilities and limitations. AI technology can serve as a useful tool to augment human intelligence; however, it is essential to use it in a responsible and ethical manner.
Collapse
Affiliation(s)
- Alex Roman
- Neurological Surgery, Cleveland Clinic Abu Dhabi, Abu Dhabi, ARE
| | - Lubna Al-Sharif
- Physiology, Pharmacology and Toxicology, An-Najah National University, Nablus, PSE
| | | |
Collapse
|
30
|
Cheng K, Guo Q, He Y, Lu Y, Xie R, Li C, Wu H. Artificial Intelligence in Sports Medicine: Could GPT-4 Make Human Doctors Obsolete? Ann Biomed Eng 2023:10.1007/s10439-023-03213-1. [PMID: 37097528 DOI: 10.1007/s10439-023-03213-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 04/17/2023] [Indexed: 04/26/2023]
Abstract
Sports medicine, an essential branch of orthopedics, focuses on preserving, restoring, improving, and rebuilding the function of the human motor system. As a thriving interdisciplinary field, sports medicine attracts not only the interest of the orthopedic community, but also artificial intelligence (AI). In this study, our team summarized the potential applications of GPT-4 in sports medicine including diagnostic imaging, exercise prescription, medical supervision, surgery treatment, sports nutrition, and science research. In our opinion, it is impossible that GPT-4 could make sports physicians obsolete. Instead, it could become an indispensable scientific assistant for sport doctors in future.
Collapse
Affiliation(s)
- Kunming Cheng
- Department of Intensive Care Unit, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Qiang Guo
- Department of Orthopedics, Baodi Clinical College of Tianjin Medical University, Tianjin, China
| | - Yongbin He
- School of Sport Medicine and Rehabilitation, Beijing Sport University, Beijing, China
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yanqiu Lu
- Department of Intensive Care Unit, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Ruijie Xie
- Department of Microsurgery, The Affiliated Nanhua Hospital, Hengyang Medical School, University of South China, Hengyang, China.
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany.
| | - Cheng Li
- Department of Orthopaedic Surgery, Beijing Jishuitan Hospital, Fourth Clinical College of Peking University, Beijing, China.
- Center for Musculoskeletal Surgery (CMSC), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt University of Berlin, and Berlin Institute of Health, Berlin, Germany.
| | - Haiyang Wu
- Department of Graduate School, Tianjin Medical University, Tianjin, China.
- Duke Molecular Physiology Institute, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|