1
|
Akgun MY, Savasci M, Gunerbuyuk C, Gunara SO, Oktenoglu T, Ozer AF, Ates O. Battle of the authors: Comparing neurosurgery articles written by humans and AI. J Clin Neurosci 2025; 135:111152. [PMID: 40010170 DOI: 10.1016/j.jocn.2025.111152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2025] [Revised: 02/22/2025] [Accepted: 02/23/2025] [Indexed: 02/28/2025]
Abstract
BACKGROUND The advancement of artificial intelligence (AI) has led to its application in various fields, including medical literature. This study compares the quality of neurosurgery articles written by human authors and those generated by ChatGPT, an advanced AI model. The objective was to determine if AI-generated articles meet the standards of human-written academic papers. METHODS A total of 10 neurosurgery articles, 5 written by humans and 5 by ChatGPT, were evaluated by a panel of blinded experts. The assessment parameters included overall impression, readability, criteria satisfaction, and degree of detail. Additionally, readability scores were calculated using the Lix score and the Flesch-Kincaid grade level. Preference and identification tests were also conducted to determine if experts could distinguish between the two types of articles. RESULTS The study found no significant differences in the overall quality parameters between human-written and ChatGPT -generated articles. Readability scores were higher for ChatGPT articles (Lix score: 35 vs. 26, Flesch-Kincaid grade level: 10 vs. 8). Experts correctly identified the authorship of the articles 61% of the time, with preferences almost evenly split (47% preferred CHATGPT, 44% preferred human, and 9% had no preference). The most statistically significant result was the higher readability scores of CHATGPT-generated articles, indicating that AI can produce more readable content than human authors. CONCLUSION ChatGPT is capable of generating neurosurgery articles that are comparable in quality to those written by humans. The higher readability scores of AI-generated articles suggest that ChatGPT can enhance the accessibility of scientific literature. This study supports the potential integration of AI in academic writing, offering a valuable tool for researchers and medical professionals.
Collapse
Affiliation(s)
- Mehmet Yigit Akgun
- Department of Neurosurgery, Koc University Hospital, Istanbul, Turkey; Spine Center, Koc University Hospital, Istanbul, Turkey.
| | - Melihcan Savasci
- Department of Neurosurgery, Bakirkoy Prof.Dr. Mazhar Osman Research and Education Hospital, Istanbul, Turkey
| | | | - Sezer Onur Gunara
- Department of Neurosurgery, Koc University Hospital, Istanbul, Turkey; Spine Center, Koc University Hospital, Istanbul, Turkey
| | - Tunc Oktenoglu
- Department of Neurosurgery, Koc University Hospital, Istanbul, Turkey; Spine Center, Koc University Hospital, Istanbul, Turkey
| | - Ali Fahir Ozer
- Department of Neurosurgery, Koc University Hospital, Istanbul, Turkey; Spine Center, Koc University Hospital, Istanbul, Turkey
| | - Ozkan Ates
- Department of Neurosurgery, Koc University Hospital, Istanbul, Turkey; Spine Center, Koc University Hospital, Istanbul, Turkey
| |
Collapse
|
2
|
Zhu J, Jiang Y, Chen D, Lu Y, Huang Y, Lin Y, Fan P. High identification and positive-negative discrimination but limited detailed grading accuracy of ChatGPT-4o in knee osteoarthritis radiographs. Knee Surg Sports Traumatol Arthrosc 2025; 33:1911-1919. [PMID: 40053915 DOI: 10.1002/ksa.12639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 02/13/2025] [Accepted: 02/15/2025] [Indexed: 03/09/2025]
Abstract
PURPOSE To explore the potential of ChatGPT-4o in analysing radiographic images of knee osteoarthritis (OA) and to assess its grading accuracy, feature identification and reliability, thereby helping surgeons to improve diagnostic accuracy and efficiency. METHODS A total of 117 anterior‒posterior knee radiographs from patients (23.1% men, 76.9% women, mean age 69.7 ± 7.99 years) were analysed. Two senior orthopaedic surgeons and ChatGPT-4o independently graded images with the Kellgren-Lawrence (K-L), Ahlbäck and International Knee Documentation Committee (IKDC) systems. A consensus reference standard was established by a third radiologist. ChatGPT-4o's performance metrics (accuracy, precision, recall and F1 score) were calculated, and its reliability was assessed via two evaluations separated by a 2-week interval, with intraclass correlation coefficients (ICCs) determined. RESULTS ChatGPT-4o achieved a 100% identification rate for knee radiographs and demonstrated strong binary classification performance (precision: 0.95, recall: 0.83, F score: 0.88). However, its detailed grading accuracy (35%) was substantially lower than that of surgeons (89.6%). Severe underestimation of OA severity occurred in 49.3% of the cases. Interrater reliability for surgeons was excellent (ICC: 0.78-0.91), whereas ChatGPT-4o showed poor initial consistency (ICC: 0.16-0.28), improving marginally in the second evaluation (ICC: 0.22-0.39). CONCLUSION ChatGPT-4o has the potential to rapidly identify and binary classify knee OA on radiographs. However, its detailed grading accuracy remains suboptimal, with a notable tendency to underestimate severe cases. This limits its current clinical utility for precise staging. Future research should focus on optimising its grading performance and improving accuracy to enhance diagnostic reliability. LEVEL OF EVIDENCE Level III, retrospective comparative study.
Collapse
Affiliation(s)
- Jiesheng Zhu
- Department of Orthopedics, The Second Affiliated Hospital of Wenzhou Medical University, Yuying Children's Hospital, Wenzhou, China
| | - Yilun Jiang
- Department of Orthopedics, The Second Affiliated Hospital of Wenzhou Medical University, Yuying Children's Hospital, Wenzhou, China
| | - Daosen Chen
- Department of Orthopedics, The Second Affiliated Hospital of Wenzhou Medical University, Yuying Children's Hospital, Wenzhou, China
| | - Yi Lu
- Department of Radiology, The Second Affiliated Hospital and Yuying Children's Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yijiang Huang
- Department of Orthopedics, The Second Affiliated Hospital of Wenzhou Medical University, Yuying Children's Hospital, Wenzhou, China
| | - Yimu Lin
- Department of Orthopedics, The Second Affiliated Hospital of Wenzhou Medical University, Yuying Children's Hospital, Wenzhou, China
| | - Pei Fan
- Department of Orthopedics, The Second Affiliated Hospital of Wenzhou Medical University, Yuying Children's Hospital, Wenzhou, China
| |
Collapse
|
3
|
Sweed T, Mabrouk A, Dawson M. Transforming orthopaedics with AI: Insights from a custom ChatGPT on ESSKA osteotomy consensus. Knee Surg Sports Traumatol Arthrosc 2025; 33:1557-1559. [PMID: 40079374 DOI: 10.1002/ksa.12653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 03/03/2025] [Accepted: 03/04/2025] [Indexed: 03/15/2025]
Affiliation(s)
- Tamer Sweed
- Department of Trauma & Orthopaedics, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Ahmed Mabrouk
- Basingstoke and North Hampshire Hospital, Basingstoke, UK
| | | |
Collapse
|
4
|
Saglam S, Uludag V, Karaduman ZO, Arıcan M, Yücel MO, Dalaslan RE. Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study. BMC Med Inform Decis Mak 2025; 25:163. [PMID: 40229819 PMCID: PMC11998439 DOI: 10.1186/s12911-025-02996-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Accepted: 04/07/2025] [Indexed: 04/16/2025] Open
Abstract
BACKGROUND The integration of artificial intelligence (AI) in healthcare has rapidly expanded, particularly in clinical decision-making. Large language models (LLMs) such as GPT-4 and GPT-3.5 have shown potential in various medical applications, including diagnostics and treatment planning. However, their efficacy in specialized fields like sports surgery and physiotherapy remains underexplored. This study aims to compare the performance of GPT-4 and GPT-3.5 in clinical decision-making within these domains using a structured assessment approach. METHODS This cross-sectional study included 56 professionals specializing in sports surgery and physiotherapy. Participants evaluated 10 standardized clinical scenarios generated by GPT-4 and GPT-3.5 using a 5-point Likert scale. The scenarios encompassed common musculoskeletal conditions, and assessments focused on diagnostic accuracy, treatment appropriateness, surgical technique detailing, and rehabilitation plan suitability. Data were collected anonymously via Google Forms. Statistical analysis included paired t-tests for direct model comparisons, one-way ANOVA to assess performance across multiple criteria, and Cronbach's alpha to evaluate inter-rater reliability. RESULTS GPT-4 significantly outperformed GPT-3.5 across all evaluated criteria. Paired t-test results (t(55) = 10.45, p < 0.001) demonstrated that GPT-4 provided more accurate diagnoses, superior treatment plans, and more detailed surgical recommendations. ANOVA results confirmed the higher suitability of GPT-4 in treatment planning (F(1, 55) = 35.22, p < 0.001) and rehabilitation protocols (F(1, 55) = 32.10, p < 0.001). Cronbach's alpha values indicated higher internal consistency for GPT-4 (α = 0.478) compared to GPT-3.5 (α = 0.234), reflecting more reliable performance. CONCLUSIONS GPT-4 demonstrates superior performance compared to GPT-3.5 in clinical decision-making for sports surgery and physiotherapy. These findings suggest that advanced AI models can aid in diagnostic accuracy, treatment planning, and rehabilitation strategies. However, AI should function as a decision-support tool rather than a substitute for expert clinical judgment. Future studies should explore the integration of AI into real-world clinical workflows, validate findings using larger datasets, and compare additional AI models beyond the GPT series.
Collapse
Affiliation(s)
- Sönmez Saglam
- Department of Orthopaedics and Traumatology, Faculty of Medicine, Duzce University, Duzce, Türkiye.
| | - Veysel Uludag
- Department of Physiotherapy and Rehabilitation, Faculty of Health Sciences, Duzce University, Duzce, Türkiye
| | - Zekeriya Okan Karaduman
- Department of Orthopaedics and Traumatology, Faculty of Medicine, Duzce University, Duzce, Türkiye
| | - Mehmet Arıcan
- Department of Orthopaedics and Traumatology, Faculty of Medicine, Duzce University, Duzce, Türkiye
| | - Mücahid Osman Yücel
- Department of Orthopaedics and Traumatology, Faculty of Medicine, Duzce University, Duzce, Türkiye
| | - Raşit Emin Dalaslan
- Department of Orthopaedics and Traumatology, Faculty of Medicine, Duzce University, Duzce, Türkiye
| |
Collapse
|
5
|
Costa LPDEB, Castro DHPDE, Cordeiro RP, Albino RB. EVALUATION OF THE PERFORMANCE OF CHATGPT/ARTIFICIAL INTELLIGENCE IN THE MULTIPLE-CHOICE TEST TO OBTAIN THE TITLE OF SPECIALIST IN ORTHOPEDICS AND TRAUMATOLOGY. ACTA ORTOPEDICA BRASILEIRA 2025; 33:e280947. [PMID: 40206447 PMCID: PMC11978311 DOI: 10.1590/1413-785220243201e280947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/05/2024] [Indexed: 04/11/2025]
Abstract
Introduction ChatGPT, an advanced Artificial Intelligence model specialized in natural language processing, shows remarkable abilities, achieving high scores in certification exams in various specialties. This study aims to evaluate ChatGPT's performance in multiple-choice tests applied to obtain specialist certification in Orthopedics and Traumatology. Methods We used ChatGPT 4.0 to answer 100 questions from the first phase of the Título de Especialista em Ortopedia e Traumatologia 2022 (TEOT) (Specialist in Orthopedics and Traumatology Test). We excluded non-text-based questions. Each question was entered individually into ChatGPT, with a new session initiated for each question. Performance was evaluated regarding number of words and questions' taxonomic classification. Results Of the 95 questions analyzed, ChatGPT answered 61.05% correctly and 38.95% incorrectly. There was no statistically significant difference regarding number of words, and ChatGPT's performance did not vary according to taxonomic level. Conclusion ChatGPT demonstrated vast knowledge in Orthopedics, with acceptable performance in the TEOT exam. Results suggest ChatGPT's an educational and clinical resource in Orthopedics, but needs future progress and human supervision for its effective application. Level of evidence IV, Case series.
Collapse
Affiliation(s)
- Lucas Plens DE Britto Costa
- Universidade Estadual Paulista, Grupo de Medicina e Cirurgia do Pe e Tornozelo, Department of Surgery and Orthopedics, Botucatu, SP, Brazil
- Universidade Federal de São Paulo, Escola Paulista de Medicina, Departamento de Ortopedia e Traumatologia, São Paulo,SP, Brazil
| | - Danilo Henrique Pizzo DE Castro
- Universidade Estadual Paulista, Grupo de Medicina e Cirurgia do Pe e Tornozelo, Department of Surgery and Orthopedics, Botucatu, SP, Brazil
| | - Renato Pinheiro Cordeiro
- Universidade Estadual Paulista, Grupo de Medicina e Cirurgia do Pe e Tornozelo, Department of Surgery and Orthopedics, Botucatu, SP, Brazil
| | - Rômulo Ballarin Albino
- Universidade Estadual Paulista, Grupo de Medicina e Cirurgia do Pe e Tornozelo, Department of Surgery and Orthopedics, Botucatu, SP, Brazil
| |
Collapse
|
6
|
DeFoor MT, Sheean AJ. Editorial Commentary: Experts in Shoulder Surgery Do Not Consistently Detect Artificial Intelligence-Generated Scientific Abstracts. Arthroscopy 2025; 41:925-926. [PMID: 39243996 DOI: 10.1016/j.arthro.2024.08.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Accepted: 08/30/2024] [Indexed: 09/09/2024]
Abstract
There has been exponential growth in the number of artificial intelligence (AI)- and machine learning (ML)-related publications in recent years. For example, in the field of shoulder and elbow surgery, there was a 6-fold increase in the number of publications between 2018 and 2021. AI shows the potential to improve diagnostic precision, generate precise surgical templates, direct personalized treatment plans, and reduce administrative costs. However, although AI and ML technology has the ability to positively impact biomedical research, it should be closely monitored and used with extreme caution in the realm of research and scientific writing. Current large language models raise concerns regarding the veracity of AI-generated content, copyright and ownership infringement, fabricated references, lack of in-text citations, plagiarism, and questions of authorship. Recent research has shown that even the most experienced surgeons are unable to consistently detect AI-generated scientific writing. Of note, AI detection software is more adept in this role. AI should be used with caution in the development and production of scholarly work.
Collapse
|
7
|
Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE, Klifto CS, Levy JC, Namdari S, Sanchez-Sotelo J, Menendez ME. Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy 2025; 41:916-924.e2. [PMID: 38992513 DOI: 10.1016/j.arthro.2024.06.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 07/13/2024]
Abstract
PURPOSE To evaluate the extent to which experienced reviewers can accurately discern between artificial intelligence (AI)-generated and original research abstracts published in the field of shoulder and elbow surgery and compare this with the performance of an AI detection tool. METHODS Twenty-five shoulder- and elbow-related articles published in high-impact journals in 2023 were randomly selected. ChatGPT was prompted with only the abstract title to create an AI-generated version of each abstract. The resulting 50 abstracts were randomly distributed to and evaluated by 8 blinded peer reviewers with at least 5 years of experience. Reviewers were tasked with distinguishing between original and AI-generated text. A Likert scale assessed reviewer confidence for each interpretation, and the primary reason guiding assessment of generated text was collected. AI output detector (0%-100%) and plagiarism (0%-100%) scores were evaluated using GPTZero. RESULTS Reviewers correctly identified 62% of AI-generated abstracts and misclassified 38% of original abstracts as being AI generated. GPTZero reported a significantly higher probability of AI output among generated abstracts (median, 56%; interquartile range [IQR], 51%-77%) compared with original abstracts (median, 10%; IQR, 4%-37%; P < .01). Generated abstracts scored significantly lower on the plagiarism detector (median, 7%; IQR, 5%-14%) relative to original abstracts (median, 82%; IQR, 72%-92%; P < .01). Correct identification of AI-generated abstracts was predominately attributed to the presence of unrealistic data/values. The primary reason for misidentifying original abstracts as AI was attributed to writing style. CONCLUSIONS Experienced reviewers faced difficulties in distinguishing between human and AI-generated research content within shoulder and elbow surgery. The presence of unrealistic data facilitated correct identification of AI abstracts, whereas misidentification of original abstracts was often ascribed to writing style. CLINICAL RELEVANCE With rapidly increasing AI advancements, it is paramount that ethical standards of scientific reporting are upheld. It is therefore helpful to understand the ability of reviewers to identify AI-generated content.
Collapse
Affiliation(s)
- Ryan D Stadler
- Rutgers Robert Wood Johnson Medical School, New Brunswick, New Jersey, U.S.A..
| | - Suleiman Y Sudah
- Department of Orthopaedic Surgery, Monmouth Medical Center, Monmouth, New Jersey, U.S.A
| | - Michael A Moverman
- Department of Orthopaedics, University of Utah School of Medicine, Salt Lake City, Utah, U.S.A
| | | | | | - Grant E Garrigues
- Midwest Orthopaedics at Rush University Medical Center, Chicago, Illinois, U.S.A
| | - Christopher S Klifto
- Department of Orthopaedic Surgery, Duke University School of Medicine, Durham, North Carolina, U.S.A
| | - Jonathan C Levy
- Levy Shoulder Center at Paley Orthopedic & Spine Institute, Boca Raton, Florida, U.S.A
| | - Surena Namdari
- Rothman Orthopaedic Institute at Thomas Jefferson University Hospitals, Philadelphia, Pennsylvania, U.S.A
| | | | - Mariano E Menendez
- Department of Orthopaedics, University of California Davis, Sacramento, California, U.S.A
| |
Collapse
|
8
|
Mavrych V, Ganguly P, Bolgova O. Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis. Clin Anat 2025; 38:200-210. [PMID: 39573871 DOI: 10.1002/ca.24244] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 10/24/2024] [Accepted: 11/04/2024] [Indexed: 04/27/2025]
Abstract
The increasing application of generative artificial intelligence large language models (LLMs) in various fields, including medical education, raises questions about their accuracy. The primary aim of our study was to undertake a detailed comparative analysis of the proficiencies and accuracies of six different LLMs (ChatGPT-4, ChatGPT-3.5-turbo, ChatGPT-3.5, Copilot, PaLM, Bard, and Gemini) in responding to medical multiple-choice questions (MCQs), and in generating clinical scenarios and MCQs for upper limb topics in a Gross Anatomy course for medical students. Selected chatbots were tested, answering 50 USMLE-style MCQs. The questions were randomly selected from the Gross Anatomy course exam database for medical students and reviewed by three independent experts. The results of five successive attempts to answer each set of questions by the chatbots were evaluated in terms of accuracy, relevance, and comprehensiveness. The best result was provided by ChatGPT-4, which answered 60.5% ± 1.9% of questions accurately, then Copilot (42.0% ± 0.0%) and ChatGPT-3.5 (41.0% ± 5.3%), followed by ChatGPT-3.5-turbo (38.5% ± 5.7%). Google PaLM 2 (34.5% ± 4.4%) and Bard (33.5% ± 3.0%) gave the poorest results. The overall performance of GPT-4 was statistically superior (p < 0.05) to those of Copilot, GPT-3.5, GPT-Turbo, PaLM2, and Bard by 18.6%, 19.5%, 22%, 26%, and 27%, respectively. Each chatbot was then asked to generate a clinical scenario for each of the three randomly selected topics-anatomical snuffbox, supracondylar fracture of the humerus, and the cubital fossa-and three related anatomical MCQs with five options each, and to indicate the correct answers. Two independent experts analyzed and graded 216 records received (0-5 scale). The best results were recorded for ChatGPT-4, then for Gemini, ChatGPT-3.5, and ChatGPT-3.5-turbo, Copilot, followed by Google PaLM 2; Copilot had the lowest grade. Technological progress notwithstanding, LLMs have yet to mature sufficiently to take over the role of teacher or facilitator completely within a Gross Anatomy course; however, they can be valuable tools for medical educators.
Collapse
Affiliation(s)
- Volodymyr Mavrych
- College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia
| | - Paul Ganguly
- College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia
| | - Olena Bolgova
- College of Medicine, Alfaisal University, Riyadh, Kingdom of Saudi Arabia
| |
Collapse
|
9
|
Diniz P, Grimm B, Mouton C, Ley C, Andersen TE, Seil R. High specificity of an AI-powered framework in cross-checking male professional football anterior cruciate ligament tear reports in public databases. Knee Surg Sports Traumatol Arthrosc 2024. [PMID: 39724452 DOI: 10.1002/ksa.12571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/13/2024] [Accepted: 12/15/2024] [Indexed: 12/28/2024]
Abstract
PURPOSE While public databases like Transfermarkt provide valuable data for assessing the impact of anterior cruciate ligament (ACL) injuries in professional footballers, they require robust verification methods due to accuracy concerns. We hypothesised that an artificial intelligence (AI)-powered framework could cross-check ACL tear-related information from large publicly available data sets with high specificity. METHODS The AI-powered framework uses Google Programmable Search Engine to search a curated, multilingual list of websites and OpenAI's GPT to translate search queries, appraise search results and analyse injury-related information in search result items (SRIs). Specificity was the chosen performance metric-the AI-powered framework's ability to accurately identify texts that do not mention an athlete suffering an ACL tear-with SRI as the evaluation unit. A database of ACL tears in male professional footballers from first- and second-tier leagues worldwide (1999-2024) was collected from Transfermarkt.com, and players were randomly selected for appraisal until enough SRIs were obtained to validate the framework's specificity. Player age at injury and time until return-to-play (RTP) were recorded and compared with Union of European Football Associations (UEFA) Elite Club Injury Study data. RESULTS Verification of 231 athletes yielded 1546 SRIs. Human analysis of the SRIs showed that 335 mentioned an ACL tear, corresponding to 83 athletes with ACL tears. Specificity and sensitivity of GPT in identifying mentions of ACL tears in a player were 99.3% and 88.4%, respectively. Mean age at rupture was 26.6 years (standard deviation: 4.6, 95% confidence interval [CI]: 25.6-27.6). Median RTP time was 225 days (interquartile range: 96, 95% CI: 209-251), which is comparable to reports using data from the UEFA Elite Club Injury Study. CONCLUSION This study shows that an AI-powered framework can achieve high specificity in cross-checking ACL tear reports in male professional football from public databases, markedly reducing manual workload and enhancing the reliability of media-based sports medicine research. LEVEL OF EVIDENCE Level III.
Collapse
Affiliation(s)
- Pedro Diniz
- Department of Orthopaedic Surgery, Centre Hospitalier de Luxembourg - Clinique d'Eich, Luxembourg, Luxembourg
- Luxembourg Institute of Research in Orthopaedics, Sports Medicine and Science (LIROMS), Luxembourg, Luxembourg
- Luxembourg Institute of Health (LIH), Luxembourg, Luxembourg
- Department of Bioengineering, iBB - Institute for Bioengineering and Biosciences, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Bernd Grimm
- Luxembourg Institute of Health (LIH), Luxembourg, Luxembourg
| | - Caroline Mouton
- Department of Orthopaedic Surgery, Centre Hospitalier de Luxembourg - Clinique d'Eich, Luxembourg, Luxembourg
- Luxembourg Institute of Research in Orthopaedics, Sports Medicine and Science (LIROMS), Luxembourg, Luxembourg
| | - Christophe Ley
- Department of Mathematics, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Thor Einar Andersen
- Oslo Sports Trauma Research Center, Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo, Norway
| | - Romain Seil
- Department of Orthopaedic Surgery, Centre Hospitalier de Luxembourg - Clinique d'Eich, Luxembourg, Luxembourg
- Luxembourg Institute of Research in Orthopaedics, Sports Medicine and Science (LIROMS), Luxembourg, Luxembourg
- Luxembourg Institute of Health (LIH), Luxembourg, Luxembourg
| |
Collapse
|
10
|
Hiredesai AN, Martinez CJ, Anderson ML, Howlett CP, Unadkat KD, Noland SS. Is Artificial Intelligence the Future of Radiology? Accuracy of ChatGPT in Radiologic Diagnosis of Upper Extremity Bony Pathology. Hand (N Y) 2024:15589447241298982. [PMID: 39641156 PMCID: PMC11624516 DOI: 10.1177/15589447241298982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
BACKGROUND Artificial intelligence (AI) is a promising tool to aid in diagnostic accuracy and patient communication. Prior literature has shown that ChatGPT answers medical questions and can accurately diagnose surgical conditions. The purpose of this study was to determine the accuracy of ChatGPT 4.0 in evaluating radiologic imaging of common orthopedic upper extremity bony pathologies, including identifying the imaging modality and diagnostic accuracy. METHODS Diagnostic imaging was sourced from an open-source radiology database for 6 common upper extremity bony pathologies: distal radius fracture (DRF), metacarpal fracture (MFX), carpometacarpal osteoarthritis (CMC), humerus fracture (HFX), scaphoid fracture (SFX), and scaphoid nonunion (SN). X-ray, computed tomography (CT), and magnetic resonance imaging (MRI) modalities were included. Fifty images were randomly selected from each pathology where possible. Images were uploaded to ChatGPT 4.0 and queried for imaging modality, laterality, and diagnosis. Each image query was completed in a new ChatGPT search tab. Multinomial linear regression was used to identify variations in ChatGPT's diagnostic accuracy across imaging modalities and medical conditions. RESULTS Overall, ChatGPT provided a diagnosis for 52% of images, with accuracy ranging from 0% to 55%. Diagnostic accuracy was significantly lower for SFX and MFX relative to HFX. ChatGPT was significantly less likely to provide a diagnosis for MRI relative to CT. Diagnostic accuracy ranged from 0% to 40% with regard to imaging modality (x-ray, CT, MRI) though this difference was not statistically significant. CONCLUSIONS ChatGPT's accuracy varied significantly between conditions and imaging modalities, though its iterative learning capabilities suggest potential for future diagnostic utility within hand surgery.
Collapse
|
11
|
Ghanem D. Integrating artificial intelligence in orthopaedic care and surgery: the revolutionary role of ChatGPT, as written with ChatGPT. Int J Surg 2024; 110:7593-7597. [PMID: 39453839 PMCID: PMC11634199 DOI: 10.1097/js9.0000000000002130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 10/15/2024] [Indexed: 10/27/2024]
Affiliation(s)
- Diane Ghanem
- Department of Orthopedic Surgery, American University of Beirut Medical Center, Beirut, Lebanon
- Department of Orthopedic Surgery, The Johns Hopkins Hospital, Baltimore, MD, USA
| |
Collapse
|
12
|
Zhang C, Liu S, Zhou X, Zhou S, Tian Y, Wang S, Xu N, Li W. Examining the Role of Large Language Models in Orthopedics: Systematic Review. J Med Internet Res 2024; 26:e59607. [PMID: 39546795 DOI: 10.2196/59607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/01/2024] [Accepted: 09/11/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent. OBJECTIVE The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges. METHODS PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of "large language model," "generative artificial intelligence," "ChatGPT," and "orthopaedics," were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment. RESULTS A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs' performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4's accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections. CONCLUSIONS LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision.
Collapse
Affiliation(s)
- Cheng Zhang
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| | - Shanshan Liu
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| | - Xingyu Zhou
- Peking University Health Science Center, Beijing, China
| | - Siyu Zhou
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| | - Yinglun Tian
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| | - Shenglin Wang
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| | - Nanfang Xu
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| | - Weishi Li
- Department of Orthopaedics, Peking University Third Hospital, Beijing, China
- Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China
- Beijing Key Laboratory of Spinal Disease Research, Beijing, China
| |
Collapse
|
13
|
Dergaa I, Ben Saad H, Glenn JM, Ben Aissa M, Taheri M, Swed S, Guelmami N, Chamari K. A thorough examination of ChatGPT-3.5 potential applications in medical writing: A preliminary study. Medicine (Baltimore) 2024; 103:e39757. [PMID: 39465713 PMCID: PMC11460921 DOI: 10.1097/md.0000000000039757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Indexed: 10/29/2024] Open
Abstract
Effective communication of scientific knowledge plays a crucial role in the advancement of medical research and health care. Technological advancements have introduced large language models such as Chat Generative Pre-Trained Transformer (ChatGPT), powered by artificial intelligence (AI), which has already shown promise in revolutionizing medical writing. This study aimed to conduct a detailed evaluation of ChatGPT-3.5's role in enhancing various aspects of medical writing. From May 10 to 12, 2023, the authors engaged in a series of interactions with ChatGPT-3.5 to evaluate its effectiveness in various tasks, particularly its application to medical writing, including vocabulary enhancement, text rewriting for plagiarism prevention, hypothesis generation, keyword generation, title generation, article summarization, simplification of medical jargon, transforming text from informal to scientific and data interpretation. The exploration of ChatGPT's functionalities in medical writing revealed its potential in enhancing various aspects of the writing process, demonstrating its efficiency in improving vocabulary usage, suggesting alternative phrasing, and providing grammar enhancements. While the results indicate the effectiveness of ChatGPT (version 3.5), the presence of certain imperfections highlights the current indispensability of human intervention to refine and validate outputs, ensuring accuracy and relevance in medical settings. The integration of AI into medical writing shows significant potential for improving clarity, efficiency, and reliability. This evaluation highlights both the benefits and limitations of using ChatGPT-3.5, emphasizing its ability to enhance vocabulary, prevent plagiarism, generate hypotheses, suggest keywords, summarize articles, simplify medical jargon, and transform informal text into an academic format. However, AI tools should not replace human expertise. It is crucial for medical professionals to ensure thorough human review and validation to maintain the accuracy and relevance of the content in case they eventually use AI as a supplementary resource in medical writing. Accepting this mutually symbiotic partnership holds the promise of improving medical research and patient outcomes, and it sets the stage for the fusion of AI and human knowledge to produce a novel approach to medical assessment. Thus, while AI can streamline certain tasks, experienced medical writers and researchers must perform final reviews to uphold high standards in medical communications.
Collapse
Affiliation(s)
- Ismail Dergaa
- Departement of Preventative Health, Primary Health Care Corporation (PHCC), Doha, Qatar
| | - Helmi Ben Saad
- Farhat HACHED Hospital, Service of Physiology and Functional Explorations, University of Sousse, Sousse, Tunisia
- Heart Failure (LR12SP09) Research Laboratory, Farhat HACHED Hospital, University of Sousse, Sousse, Tunisia
- Faculty of Medicine of Sousse, Laboratory of Physiology, University of Sousse, Sousse, Tunisia
| | - Jordan M. Glenn
- Department of Health, Exercise Science Research Center Human Performance and Recreation, University of Arkansas, Fayetteville, AR
| | - Mohamed Ben Aissa
- Department of Human and Social Sciences, Higher Institute of Sport and Physical Education of Kef, University of Jendouba, Jendouba, Tunisia
| | - Morteza Taheri
- Institute of Future Studies, Imam Khomeini International University, Qazvi, Iran
| | - Sarya Swed
- Faculty of Medicine, Aleppo University, Aleppo, Syria
| | - Noomen Guelmami
- Department of Health Sciences, Dipartimento di scienze della salute (DISSAL), Postgraduate School of Public Health, University of Genoa, Genoa, Italy
| | - Karim Chamari
- Naufar, Wellness and Recovery Center, Doha, Qatar
- High Institute of Sport and Physical Education, University of Manouba, Tunis, Tunisia
| |
Collapse
|
14
|
Carroll AN, Storms LA, Malempati C, Shanavas RV, Badarudeen S. Generative Artificial Intelligence and Prompt Engineering: A Primer for Orthopaedic Surgeons. JBJS Rev 2024; 12:01874474-202410000-00002. [PMID: 39361780 DOI: 10.2106/jbjs.rvw.24.00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
» Generative artificial intelligence (AI), a rapidly evolving field, has the potential to revolutionize orthopedic care by enhancing diagnostic accuracy, treatment planning, and patient management through data-driven insights and personalized strategies.» Unlike traditional AI, generative AI has the potential to generate relevant information for orthopaedic surgeons when instructed through prompts, automating tasks such as literature reviews, streamlining workflows, predicting health outcomes, and improving patient interactions.» Prompt engineering is essential for crafting effective prompts for large language models (LLMs), ensuring accurate and reliable AI-generated outputs, and promoting ethical decision-making in clinical settings.» Orthopaedic surgeons can choose between various prompt types-including open-ended, focused, and choice-based prompts-to tailor AI responses for specific clinical tasks to enhance the precision and utility of generated information.» Understanding the limitations of LLMs, such as token limits, context windows, and hallucinations, is crucial for orthopaedic surgeons to effectively use generative AI while addressing ethical concerns related to bias, privacy, and accountability.
Collapse
Affiliation(s)
- Amber N Carroll
- College of Medicine, University of Kentucky, Lexington, Kentucky
| | - Lewis A Storms
- College of Medicine, University of Kentucky, Lexington, Kentucky
| | - Chaitu Malempati
- Department of Orthopaedic Surgery and Sports Medicine, University of Kentucky, Lexington, Kentucky
| | | | - Sameer Badarudeen
- Department of Orthopaedic Surgery and Sports Medicine, University of Kentucky, Lexington, Kentucky
| |
Collapse
|
15
|
Quinn M, Milner JD, Schmitt P, Morrissey P, Lemme N, Marcaccio S, DeFroda S, Tabaddor R, Owens BD. Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared With ChatGPT-4 in Response to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines. Arthroscopy 2024:S0749-8063(24)00736-9. [PMID: 39313138 DOI: 10.1016/j.arthro.2024.09.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 08/31/2024] [Accepted: 09/05/2024] [Indexed: 09/25/2024]
Abstract
PURPOSE To assess the ability of ChatGPT-4 and Gemini to generate accurate and relevant responses to the 2022 American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPG) for anterior cruciate ligament reconstruction (ACLR). METHODS Responses from ChatGPT-4 and Gemini to prompts derived from all 15 AAOS guidelines were evaluated by 7 fellowship-trained orthopaedic sports medicine surgeons using a structured questionnaire assessing 5 key characteristics on a scale from 1 to 5. The prompts were categorized into 3 areas: diagnosis and preoperative management, surgical timing and technique, and rehabilitation and prevention. Statistical analysis included mean scoring, standard deviation, and 2-sided t tests to compare the performance between the 2 large language models (LLMs). Scores were then evaluated for inter-rater reliability (IRR). RESULTS Overall, both LLMs performed well with mean scores >4 for the 5 key characteristics. Gemini demonstrated superior performance in overall clarity (4.848 ± 0.36 vs 4.743 ± 0.481, P = .034), but all other characteristics demonstrated nonsignificant differences (P > .05). Gemini also demonstrated superior clarity in the surgical timing and technique (P = .038) as well as the prevention and rehabilitation (P = .044) subcategories. Additionally, Gemini had superior performance completeness scores in the rehabilitation and prevention subcategory (P = .044), but no statistically significant differences were found amongst the other subcategories. The overall IRR was found to be 0.71 (moderate). CONCLUSIONS Both Gemini and ChatGPT-4 demonstrate an overall good ability to generate accurate and relevant responses to question prompts based on the 2022 AAOS CPG for ACLR. However, Gemini demonstrated superior clarity in multiple domains in addition to superior completeness for questions pertaining to rehabilitation and prevention. CLINICAL RELEVANCE The current study addresses a current gap in the LLM and ACLR literature by comparing the performance of ChatGPT-4 to Gemini, which is growing in popularity with more than 300 million individual uses in May 2024 alone. Moreover, the results demonstrated superior performance of Gemini in both clarity and completeness, which are critical elements of a tool being used by patients for educational purposes. Additionally, the current study uses question prompts based on the AAOS CPG, which may be used as a method of standardization for future investigations on performance of LLM platforms. Thus, the results of this study may be of interest to both the readership of Arthroscopy and patients.
Collapse
Affiliation(s)
- Matthew Quinn
- Department of Orthopaedics, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A..
| | - John D Milner
- Department of Orthopaedics, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A
| | - Phillip Schmitt
- The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A
| | - Patrick Morrissey
- Department of Orthopaedics, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A
| | - Nicholas Lemme
- Department of Orthopaedics, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A
| | - Stephen Marcaccio
- Department of Orthopaedic Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, U.S.A
| | - Steven DeFroda
- Department of Orthopaedic Surgery, Missouri Orthopaedic Institute, University of Missouri, Columbia, Missouri, U.S.A
| | - Ramin Tabaddor
- Department of Orthopaedics, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A
| | - Brett D Owens
- Department of Orthopaedics, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, U.S.A
| |
Collapse
|
16
|
Fucarino A, Fabbrizio A, Garrido ND, Iuliano E, Reis VM, Sausa M, Vilaça-Alves J, Zimatore G, Baldari C, Macaluso F, Giorgio AD, Cantoia M. Emerging Technologies and Open-Source Platforms for Remote Physical Exercise: Innovations and Opportunities for Healthy Population-A Narrative Review. Healthcare (Basel) 2024; 12:1466. [PMID: 39120170 PMCID: PMC11312124 DOI: 10.3390/healthcare12151466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/15/2024] [Accepted: 07/21/2024] [Indexed: 08/10/2024] Open
Abstract
The emergence of tele-exercise as a response to the impact of technology on physical activity has opened up new possibilities for promoting physical health. By integrating innovative technologies and open-source platforms, tele-exercise encourages people to stay active. In our latest analysis, we delved into the scientific literature surrounding the use of tele-exercise technologies in training healthy individuals. After conducting an extensive search on the PubMed database using the keywords "tele-exercise" and "physical activity" (from 2020 to 2023), we identified 44 clinical trials that were applicable to tele-exercise, but less than 10% of them were aimed at healthy individuals, precisely 9.09% (four out of forty-four studies analyzed). Our review highlights the potential of tele-exercise to help maintain physical fitness and psychological well-being, especially when traditional fitness facilities are not an option. We also underscore the importance of interoperability, standardization, and the incorporation of biomechanics, exercise physiology, and neuroscience into the development of tele-exercise platforms. Nevertheless, despite these promising benefits, research has shown that there is still a significant gap in the knowledge concerning the definition and evaluation of training parameters for healthy individuals. As a result, we call for further research to establish evidence-based practices for tele-exercise in the healthy population.
Collapse
Affiliation(s)
- Alberto Fucarino
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Antonio Fabbrizio
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Nuno D. Garrido
- Research Center in Sports Sciences, Health Sciences and Human Development, CIDESD, 5000-801 Vila Real, Portugal; (N.D.G.); (V.M.R.); (J.V.-A.)
| | - Enzo Iuliano
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Victor Machado Reis
- Research Center in Sports Sciences, Health Sciences and Human Development, CIDESD, 5000-801 Vila Real, Portugal; (N.D.G.); (V.M.R.); (J.V.-A.)
| | - Martina Sausa
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - José Vilaça-Alves
- Research Center in Sports Sciences, Health Sciences and Human Development, CIDESD, 5000-801 Vila Real, Portugal; (N.D.G.); (V.M.R.); (J.V.-A.)
- Sciences Departament, University of Tra’s-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
| | - Giovanna Zimatore
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Carlo Baldari
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Filippo Macaluso
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Andrea De Giorgio
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| | - Manuela Cantoia
- Department of Theoretical and Applied Sciences, eCampus University, 22060 Novedrate, Italy; (A.F.); (A.F.); (E.I.); (M.S.); (G.Z.); (C.B.); (A.D.G.); (M.C.)
| |
Collapse
|
17
|
Johns WL, Martinazzi BJ, Miltenberg B, Nam HH, Hammoud S. ChatGPT Provides Unsatisfactory Responses to Frequently Asked Questions Regarding Anterior Cruciate Ligament Reconstruction. Arthroscopy 2024; 40:2067-2079.e1. [PMID: 38311261 DOI: 10.1016/j.arthro.2024.01.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 01/01/2024] [Accepted: 01/08/2024] [Indexed: 02/10/2024]
Abstract
PURPOSE To determine whether the free online artificial intelligence platform ChatGPT could accurately, adequately, and appropriately answer questions regarding anterior cruciate ligament (ACL) reconstruction surgery. METHODS A list of 10 questions about ACL surgery was created based on a review of frequently asked questions that appeared on websites of various orthopaedic institutions. Each question was separately entered into ChatGPT (version 3.5), and responses were recorded, scored, and graded independently by 3 authors. The reading level of the ChatGPT response was calculated using the WordCalc software package, and readability was assessed using the Flesch-Kincaid grade level, Simple Measure of Gobbledygook index, Coleman-Liau index, Gunning fog index, and automated readability index. RESULTS Of the 10 frequently asked questions entered into ChatGPT, 6 were deemed as unsatisfactory and requiring substantial clarification; 1, as adequate and requiring moderate clarification; 1, as adequate and requiring minor clarification; and 2, as satisfactory and requiring minimal clarification. The mean DISCERN score was 41 (inter-rater reliability, 0.721), indicating the responses to the questions were average. According to the readability assessments, a full understanding of the ChatGPT responses required 13.4 years of education, which corresponds to the reading level of a college sophomore. CONCLUSIONS Most of the ChatGPT-generated responses were outdated and failed to provide an adequate foundation for patients' understanding regarding their injury and treatment options. The reading level required to understand the responses was too advanced for some patients, leading to potential misunderstanding and misinterpretation of information. ChatGPT lacks the ability to differentiate and prioritize information that is presented to patients. CLINICAL RELEVANCE Recognizing the shortcomings in artificial intelligence platforms may equip surgeons to better set expectations and provide support for patients considering and preparing for ACL reconstruction.
Collapse
Affiliation(s)
- William L Johns
- Rothman Orthopaedic Institute, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, U.S.A
| | - Brandon J Martinazzi
- Rothman Orthopaedic Institute, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, U.S.A..
| | - Benjamin Miltenberg
- Rothman Orthopaedic Institute, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, U.S.A
| | - Hannah H Nam
- Penn State College of Medicine, Hershey, Pennsylvania, U.S.A
| | - Sommer Hammoud
- Rothman Orthopaedic Institute, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania, U.S.A
| |
Collapse
|
18
|
Yüce A, Yerli M, Misir A, Çakar M. Enhancing patient information texts in orthopaedics: How OpenAI's 'ChatGPT' can help. J Exp Orthop 2024; 11:e70019. [PMID: 39291057 PMCID: PMC11406043 DOI: 10.1002/jeo2.70019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/15/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024] Open
Abstract
Purpose The internet has become a primary source for patients seeking healthcare information, but the quality of online information, particularly in orthopaedics, often falls short. Orthopaedic surgeons now have the added responsibility of evaluating and guiding patients to credible online resources. This study aimed to assess ChatGPT's ability to identify deficiencies in patient information texts related to total hip arthroplasty websites and to evaluate its potential for enhancing the quality of these texts. Methods In August 2023, 25 websites related to total hip arthroplasty were assessed using a standardized search on Google. Peer-reviewed scientific articles, empty pages, dictionary definitions, and unrelated content were excluded. The remaining 10 websites were evaluated using the hip information scoring system (HISS). ChatGPT was then used to assess these texts, identify deficiencies and provide recommendations. Results The mean HISS score of the websites was 9.5, indicating low to moderate quality. However, after implementing ChatGPT's suggested improvements, the score increased to 21.5, signifying excellent quality. ChatGPT's recommendations included using simpler language, adding FAQs, incorporating patient experiences, addressing cost and insurance issues, detailing preoperative and postoperative phases, including references, and emphasizing emotional and psychological support. The study demonstrates that ChatGPT can significantly enhance patient information quality. Conclusion ChatGPT's role in elevating patient education regarding total hip arthroplasty is promising. This study sheds light on the potential of ChatGPT as an aid to orthopaedic surgeons in producing high-quality patient information materials. Although it cannot replace human expertise, it offers a valuable means of enhancing the quality of healthcare information available online. Level of Evidence Level IV.
Collapse
Affiliation(s)
- Ali Yüce
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| | - Mustafa Yerli
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| | - Abdulhamit Misir
- Department of Orthopedic and Traumatology Göztepe Medical Park Hospital İstanbul Turkey
| | - Murat Çakar
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| |
Collapse
|
19
|
Gaudiani MA, Castle JP, Abbas MJ, Pratt BA, Myles MD, Moutzouros V, Lynch TS. ChatGPT-4 Generates More Accurate and Complete Responses to Common Patient Questions About Anterior Cruciate Ligament Reconstruction Than Google's Search Engine. Arthrosc Sports Med Rehabil 2024; 6:100939. [PMID: 39006779 PMCID: PMC11240040 DOI: 10.1016/j.asmr.2024.100939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/27/2024] [Indexed: 07/16/2024] Open
Abstract
Purpose To replicate a patient's internet search to evaluate ChatGPT's appropriateness in answering common patient questions about anterior cruciate ligament reconstruction compared with a Google web search. Methods A Google web search was performed by searching the term "anterior cruciate ligament reconstruction." The top 20 frequently asked questions and responses were recorded. The prompt "What are the 20 most popular patient questions related to 'anterior cruciate ligament reconstruction?'" was input into ChatGPT and questions and responses were recorded. Questions were classified based on the Rothwell system and responses assessed via Flesch-Kincaid Grade Level, correctness, and completeness were for both Google web search and ChatGPT. Results Three of 20 (15%) questions were similar between Google web search and ChatGPT. The most common question types among the Google web search were value (8/20, 40%), fact (7/20, 35%), and policy (5/20, 25%). The most common question types amongst the ChatGPT search were fact (12/20, 60%), policy (6/20, 30%), and value (2/20, 10%). Mean Flesch-Kincaid Grade Level for Google web search responses was significantly lower (11.8 ± 3.8 vs 14.3 ± 2.2; P = .003) than for ChatGPT responses. The mean correctness for Google web search question answers was 1.47 ± 0.5, and mean completeness was 1.36 ± 0.5. Mean correctness for ChatGPT answers was 1.8 ± 0.4 and mean completeness was 1.9 ± 0.3, which were both significantly greater than Google web search answers (P = .03 and P = .0003). Conclusions ChatGPT-4 generated more accurate and complete responses to common patient questions about anterior cruciate ligament reconstruction than Google's search engine. Clinical Relevance The use of artificial intelligence such as ChatGPT is expanding. It is important to understand the quality of information as well as how the results of ChatGPT queries compare with those from Google web searches.
Collapse
Affiliation(s)
- Michael A. Gaudiani
- Department of Orthopedic Surgery, Henry Ford Health, Detroit, Michigan, U.S.A
| | - Joshua P. Castle
- Department of Orthopedic Surgery, Henry Ford Health, Detroit, Michigan, U.S.A
| | - Muhammad J. Abbas
- Department of Orthopedic Surgery, Henry Ford Health, Detroit, Michigan, U.S.A
| | - Brittaney A. Pratt
- Department of Orthopedic Surgery, Henry Ford Health, Detroit, Michigan, U.S.A
| | - Marquisha D. Myles
- Michigan State University College of Human Medicine, Detroit, Michigan, U.S.A
| | - Vasilios Moutzouros
- Department of Orthopedic Surgery, Henry Ford Health, Detroit, Michigan, U.S.A
| | - T. Sean Lynch
- Department of Orthopedic Surgery, Henry Ford Health, Detroit, Michigan, U.S.A
| |
Collapse
|
20
|
Artamonov A, Bachar-Avnieli I, Klang E, Lubovsky O, Atoun E, Bermant A, Rosinsky PJ. Responses From ChatGPT-4 Show Limited Correlation With Expert Consensus Statement on Anterior Shoulder Instability. Arthrosc Sports Med Rehabil 2024; 6:100923. [PMID: 39006799 PMCID: PMC11240044 DOI: 10.1016/j.asmr.2024.100923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 02/26/2024] [Indexed: 07/16/2024] Open
Abstract
Purpose To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI). Methods An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores. Results The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%). Conclusions The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI. Clinical Relevance As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.
Collapse
Affiliation(s)
| | - Ira Bachar-Avnieli
- Orthopedic Department, Barzilai Medical Center, Ashkelon, Israel
- Ben-Gurion University, Beer-Sheva, Israel
| | - Eyal Klang
- Sagol AI Hub at ARC Innovation, Sheba Medical Center, Ramat Gan, Israel
- Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Omri Lubovsky
- Orthopedic Department, Barzilai Medical Center, Ashkelon, Israel
- Ben-Gurion University, Beer-Sheva, Israel
| | - Ehud Atoun
- Orthopedic Department, Barzilai Medical Center, Ashkelon, Israel
- Ben-Gurion University, Beer-Sheva, Israel
| | - Alexander Bermant
- Orthopedic Department, Barzilai Medical Center, Ashkelon, Israel
- Ben-Gurion University, Beer-Sheva, Israel
| | - Philip J Rosinsky
- Orthopedic Department, Barzilai Medical Center, Ashkelon, Israel
- Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
21
|
Meşe İ, Altıntaş Taşlıçay C, Kuzan BN, Kuzan TY, Sivrioğlu AK. Educating the next generation of radiologists: a comparative report of ChatGPT and e-learning resources. Diagn Interv Radiol 2024; 30:163-174. [PMID: 38145370 PMCID: PMC11095068 DOI: 10.4274/dir.2023.232496] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/29/2023] [Indexed: 12/26/2023]
Abstract
Rapid technological advances have transformed medical education, particularly in radiology, which depends on advanced imaging and visual data. Traditional electronic learning (e-learning) platforms have long served as a cornerstone in radiology education, offering rich visual content, interactive sessions, and peer-reviewed materials. They excel in teaching intricate concepts and techniques that necessitate visual aids, such as image interpretation and procedural demonstrations. However, Chat Generative Pre-Trained Transformer (ChatGPT), an artificial intelligence (AI)-powered language model, has made its mark in radiology education. It can generate learning assessments, create lesson plans, act as a round-the-clock virtual tutor, enhance critical thinking, translate materials for broader accessibility, summarize vast amounts of information, and provide real-time feedback for any subject, including radiology. Concerns have arisen regarding ChatGPT's data accuracy, currency, and potential biases, especially in specialized fields such as radiology. However, the quality, accessibility, and currency of e-learning content can also be imperfect. To enhance the educational journey for radiology residents, the integration of ChatGPT with expert-curated e-learning resources is imperative for ensuring accuracy and reliability and addressing ethical concerns. While AI is unlikely to entirely supplant traditional radiology study methods, the synergistic combination of AI with traditional e-learning can create a holistic educational experience.
Collapse
Affiliation(s)
- İsmail Meşe
- University of Health Sciences Türkiye, Erenköy Mental Health and Neurology Training and Research Hospital, Clinic of Radiology, İstanbul, Türkiye
| | | | - Beyza Nur Kuzan
- Kartal Dr. Lütfi Kırdar City Hospital, Clinic of Radiology, İstanbul, Türkiye
| | - Taha Yusuf Kuzan
- Sancaktepe Şehit Prof. Dr. İlhan Varank Training and Research Hospital, Clinic of Radiology, İstanbul, Türkiye
| | | |
Collapse
|
22
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
23
|
Oeding JF, Yang L, Sanchez-Sotelo J, Camp CL, Karlsson J, Samuelsson K, Pearle AD, Ranawat AS, Kelly BT, Pareek A. A practical guide to the development and deployment of deep learning models for the orthopaedic surgeon: Part III, focus on registry creation, diagnosis, and data privacy. Knee Surg Sports Traumatol Arthrosc 2024; 32:518-528. [PMID: 38426614 DOI: 10.1002/ksa.12085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 03/02/2024]
Abstract
Deep learning is a subset of artificial intelligence (AI) with enormous potential to transform orthopaedic surgery. As has already become evident with the deployment of Large Language Models (LLMs) like ChatGPT (OpenAI Inc.), deep learning can rapidly enter clinical and surgical practices. As such, it is imperative that orthopaedic surgeons acquire a deeper understanding of the technical terminology, capabilities and limitations associated with deep learning models. The focus of this series thus far has been providing surgeons with an overview of the steps needed to implement a deep learning-based pipeline, emphasizing some of the important technical details for surgeons to understand as they encounter, evaluate or lead deep learning projects. However, this series would be remiss without providing practical examples of how deep learning models have begun to be deployed and highlighting the areas where the authors feel deep learning may have the most profound potential. While computer vision applications of deep learning were the focus of Parts I and II, due to the enormous impact that natural language processing (NLP) has had in recent months, NLP-based deep learning models are also discussed in this final part of the series. In this review, three applications that the authors believe can be impacted the most by deep learning but with which many surgeons may not be familiar are discussed: (1) registry construction, (2) diagnostic AI and (3) data privacy. Deep learning-based registry construction will be essential for the development of more impactful clinical applications, with diagnostic AI being one of those applications likely to augment clinical decision-making in the near future. As the applications of deep learning continue to grow, the protection of patient information will become increasingly essential; as such, applications of deep learning to enhance data privacy are likely to become more important than ever before. Level of Evidence: Level IV.
Collapse
Affiliation(s)
- Jacob F Oeding
- School of Medicine, Mayo Clinic Alix School of Medicine, Rochester, Minnesota, USA
- Department of Orthopaedics, Institute of Clinical Sciences, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Linjun Yang
- Orthopedic Surgery Artificial Intelligence Laboratory (OSAIL), Department of Orthopedic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Christopher L Camp
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, Minnesota, USA
| | - Jón Karlsson
- Department of Orthopaedics, Sahlgrenska University Hospital, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| | - Kristian Samuelsson
- Department of Orthopaedics, Sahlgrenska University Hospital, Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| | - Andrew D Pearle
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Anil S Ranawat
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Bryan T Kelly
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| | - Ayoosh Pareek
- Sports Medicine and Shoulder Service, Hospital for Special Surgery, New York, New York, USA
| |
Collapse
|
24
|
Cevik J, Lim B, Seth I, Sofiadellis F, Ross RJ, Cuomo R, Rozen WM. Assessment of the bias of artificial intelligence generated images and large language models on their depiction of a surgeon. ANZ J Surg 2024; 94:287-294. [PMID: 38087912 DOI: 10.1111/ans.18792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/22/2023] [Accepted: 11/12/2023] [Indexed: 03/20/2024]
Affiliation(s)
- Jevan Cevik
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| | - Bryan Lim
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| | - Ishith Seth
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| | - Foti Sofiadellis
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Richard J Ross
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Roberto Cuomo
- Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, Siena, 53100, Italy
| | - Warren M Rozen
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
- The Alfred Centre, Central Clinical School at Monash University, 99 Commercial Rd, Melbourne, Victoria, 3004, Australia
| |
Collapse
|
25
|
Ray PP. Letter to the editor 'Evaluating ChatGPT responses in the context of a 53-year-old male with a femoral neck fracture: a qualitative analysis'. EUROPEAN JOURNAL OF ORTHOPAEDIC SURGERY & TRAUMATOLOGY : ORTHOPEDIE TRAUMATOLOGIE 2024; 34:957-958. [PMID: 37864657 DOI: 10.1007/s00590-023-03766-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 10/09/2023] [Indexed: 10/23/2023]
Affiliation(s)
- Partha Pratim Ray
- Department of Computer Applications, Sikkim University, 6th Mile, PO-Tadong, Gangtok, Sikkim, 737102, India.
| |
Collapse
|
26
|
Ittarat M, Cheungpasitporn W, Chansangpetch S. Personalized Care in Eye Health: Exploring Opportunities, Challenges, and the Road Ahead for Chatbots. J Pers Med 2023; 13:1679. [PMID: 38138906 PMCID: PMC10744965 DOI: 10.3390/jpm13121679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/24/2023] Open
Abstract
In modern eye care, the adoption of ophthalmology chatbots stands out as a pivotal technological progression. These digital assistants present numerous benefits, such as better access to vital information, heightened patient interaction, and streamlined triaging. Recent evaluations have highlighted their performance in both the triage of ophthalmology conditions and ophthalmology knowledge assessment, underscoring their potential and areas for improvement. However, assimilating these chatbots into the prevailing healthcare infrastructures brings challenges. These encompass ethical dilemmas, legal compliance, seamless integration with electronic health records (EHR), and fostering effective dialogue with medical professionals. Addressing these challenges necessitates the creation of bespoke standards and protocols for ophthalmology chatbots. The horizon for these chatbots is illuminated by advancements and anticipated innovations, poised to redefine the delivery of eye care. The synergy of artificial intelligence (AI) and machine learning (ML) with chatbots amplifies their diagnostic prowess. Additionally, their capability to adapt linguistically and culturally ensures they can cater to a global patient demographic. In this article, we explore in detail the utilization of chatbots in ophthalmology, examining their accuracy, reliability, data protection, security, transparency, potential algorithmic biases, and ethical considerations. We provide a comprehensive review of their roles in the triage of ophthalmology conditions and knowledge assessment, emphasizing their significance and future potential in the field.
Collapse
Affiliation(s)
- Mantapond Ittarat
- Surin Hospital and Surin Medical Education Center, Suranaree University of Technology, Surin 32000, Thailand;
| | | | - Sunee Chansangpetch
- Center of Excellence in Glaucoma, Chulalongkorn University, Bangkok 10330, Thailand;
- Department of Ophthalmology, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai Red Cross Society, Bangkok 10330, Thailand
| |
Collapse
|
27
|
Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop 2023; 10:128. [PMID: 38038796 PMCID: PMC10692045 DOI: 10.1186/s40634-023-00700-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/16/2023] [Indexed: 12/02/2023] Open
Abstract
ChatGPT has quickly popularized since its release in November 2022. Currently, large language models (LLMs) and ChatGPT have been applied in various domains of medical science, including in cardiology, nephrology, orthopedics, ophthalmology, gastroenterology, and radiology. Researchers are exploring the potential of LLMs and ChatGPT for clinicians and surgeons in every domain. This study discusses how ChatGPT can help orthopedic clinicians and surgeons perform various medical tasks. LLMs and ChatGPT can help the patient community by providing suggestions and diagnostic guidelines. In this study, the use of LLMs and ChatGPT to enhance and expand the field of orthopedics, including orthopedic education, surgery, and research, is explored. Present LLMs have several shortcomings, which are discussed herein. However, next-generation and future domain-specific LLMs are expected to be more potent and transform patients' quality of life.
Collapse
Affiliation(s)
- Srijan Chatterjee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-Si, 24252, Gangwon-Do, Republic of Korea.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
28
|
Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell 2023; 6:1237704. [PMID: 38028668 PMCID: PMC10644239 DOI: 10.3389/frai.2023.1237704] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/05/2023] [Indexed: 12/01/2023] Open
Abstract
The release of ChatGPT has initiated new thinking about AI-based Chatbot and its application and has drawn huge public attention worldwide. Researchers and doctors have started thinking about the promise and application of AI-related large language models in medicine during the past few months. Here, the comprehensive review highlighted the overview of Chatbot and ChatGPT and their current role in medicine. Firstly, the general idea of Chatbots, their evolution, architecture, and medical use are discussed. Secondly, ChatGPT is discussed with special emphasis of its application in medicine, architecture and training methods, medical diagnosis and treatment, research ethical issues, and a comparison of ChatGPT with other NLP models are illustrated. The article also discussed the limitations and prospects of ChatGPT. In the future, these large language models and ChatGPT will have immense promise in healthcare. However, more research is needed in this direction.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | | | - Snehasish Dash
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging and Orthopedic Surgery, Hallym University Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do, Republic of Korea
| |
Collapse
|
29
|
Madry H. Shaping experimental orthopaedics. J Exp Orthop 2023; 10:95. [PMID: 37743440 PMCID: PMC10518299 DOI: 10.1186/s40634-023-00658-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/11/2023] [Indexed: 09/26/2023] Open
Affiliation(s)
- Henning Madry
- Institute of Experimental Orthopaedics, Saarland University, Kirrberger Straße, Building 37, 66421, Homburg, Saar, Germany.
| |
Collapse
|