1
|
Montagna M, Chiabrando F, De Lorenzo R, Rovere Querini P. Impact of Clinical Decision Support Systems on Medical Students' Case-Solving Performance: Comparison Study with a Focus Group. JMIR MEDICAL EDUCATION 2025; 11:e55709. [PMID: 40101183 PMCID: PMC11936302 DOI: 10.2196/55709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 10/25/2024] [Accepted: 10/26/2024] [Indexed: 03/20/2025]
Abstract
Background Health care practitioners use clinical decision support systems (CDSS) as an aid in the crucial task of clinical reasoning and decision-making. Traditional CDSS are online repositories (ORs) and clinical practice guidelines (CPG). Recently, large language models (LLMs) such as ChatGPT have emerged as potential alternatives. They have proven to be powerful, innovative tools, yet they are not devoid of worrisome risks. Objective This study aims to explore how medical students perform in an evaluated clinical case through the use of different CDSS tools. Methods The authors randomly divided medical students into 3 groups, CPG, n=6 (38%); OR, n=5 (31%); and ChatGPT, n=5 (31%); and assigned each group a different type of CDSS for guidance in answering prespecified questions, assessing how students' speed and ability at resolving the same clinical case varied accordingly. External reviewers evaluated all answers based on accuracy and completeness metrics (score: 1-5). The authors analyzed and categorized group scores according to the skill investigated: differential diagnosis, diagnostic workup, and clinical decision-making. Results Answering time showed a trend for the ChatGPT group to be the fastest. The mean scores for completeness were as follows: CPG 4.0, OR 3.7, and ChatGPT 3.8 (P=.49). The mean scores for accuracy were as follows: CPG 4.0, OR 3.3, and ChatGPT 3.7 (P=.02). Aggregating scores according to the 3 students' skill domains, trends in differences among the groups emerge more clearly, with the CPG group that performed best in nearly all domains and maintained almost perfect alignment between its completeness and accuracy. Conclusions This hands-on session provided valuable insights into the potential perks and associated pitfalls of LLMs in medical education and practice. It suggested the critical need to include teachings in medical degree courses on how to properly take advantage of LLMs, as the potential for misuse is evident and real.
Collapse
Affiliation(s)
- Marco Montagna
- School of Medicine, Vita-Salute San Raffaele University, Via Olgettina 58, Milan, 20132, Italy
| | - Filippo Chiabrando
- School of Medicine, Vita-Salute San Raffaele University, Via Olgettina 58, Milan, 20132, Italy
| | - Rebecca De Lorenzo
- School of Medicine, Vita-Salute San Raffaele University, Via Olgettina 58, Milan, 20132, Italy
| | - Patrizia Rovere Querini
- School of Medicine, Vita-Salute San Raffaele University, Via Olgettina 58, Milan, 20132, Italy
- Unit of Medical Specialties and Healthcare Continuity, IRCCS San Raffaele Scientific Institute, Milan, Italy
| |
Collapse
|
2
|
Aster A, Laupichler MC, Rockwell-Kollmann T, Masala G, Bala E, Raupach T. ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review. MEDICAL SCIENCE EDUCATOR 2025; 35:555-567. [PMID: 40144083 PMCID: PMC11933646 DOI: 10.1007/s40670-024-02206-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 10/24/2024] [Indexed: 03/28/2025]
Abstract
This review aims to provide a summary of all scientific publications on the use of large language models (LLMs) in medical education over the first year of their availability. A scoping literature review was conducted in accordance with the PRISMA recommendations for scoping reviews. Five scientific literature databases were searched using predefined search terms. The search yielded 1509 initial results, of which 145 studies were ultimately included. Most studies assessed LLMs' capabilities in passing medical exams. Some studies discussed advantages, disadvantages, and potential use cases of LLMs. Very few studies conducted empirical research. Many published studies lack methodological rigor. We therefore propose a research agenda to improve the quality of studies on LLM.
Collapse
Affiliation(s)
- Alexandra Aster
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Matthias Carl Laupichler
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Tamina Rockwell-Kollmann
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Gilda Masala
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Ebru Bala
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| | - Tobias Raupach
- Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127 Bonn, Germany
| |
Collapse
|
3
|
Carulli C, Rossi SMP, Magistrelli L, Annibaldi A, Troncone E. Can Artificial Intelligence Help Orthopaedic Surgeons in the Conservative Management of Knee Osteoarthritis? A Consensus Analysis. J Clin Med 2025; 14:690. [PMID: 39941360 PMCID: PMC11818703 DOI: 10.3390/jcm14030690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 01/09/2025] [Accepted: 01/16/2025] [Indexed: 02/16/2025] Open
Abstract
Background: Knee osteoarthritis is a prevalent condition that significantly impacts patients' quality of life. Effective management typically involves a combination of pharmacological and non-pharmacological treatments. However, establishing a consensus on the optimal treatment strategy is crucial for standardizing care. The present study is the result of a rigorous process that combines artificial intelligence with human expertise to improve the reliability of medical recommendations. Methods: A new software platform (Butterfly Decisions, 2021, Italy) was employed to leverage AI-assisted decision-making, facilitating the digitalization of the entire consensus process. The process started with data collection through an online survey including simulated clinical cases of knee osteoarthritis collected by 30 orthopedic surgeons; artificial intelligence (AI) analyzed the collected clinical data and identified the key concepts and relevant patterns. Subsequently, AI generated detailed statements summarizing key concepts extracted from the data and proposed a reformulation of the statements to be discussed during the discussion session of the advisory board. The advisory board, composed of four qualified, experienced specialists of knee osteoarthritis, evaluated statements, providing their agreement levels, confidence, and supporting evidence. The AI tools calculated the degree of certainty and contradiction for each statement based on these evaluations. The literature was critically evaluated to ensure that there was an evidence-based evaluation of the proposed treatment statements. Finally, revised versions were proposed to address the feedback, evidence was collected to refine the scientific report, and the board members evaluated the AI performance too. Results: The consensus analysis revealed a high level of agreement in the need for a multimodal approach to treating knee osteoarthritis. The feedback highlighted the importance of integrating physical therapy and weight management, non-pharmacological methods, with Symptomatic Slow-Acting Drug for Osteoarthritis (SYSADOAs) and pharmacological treatments, such as anti-inflammatory drugs and intra-articular knee injections. The board members found that AI was easy to use and understand and each statement was structured clearly and concisely. Conclusions: The expert consensus about knee osteoarthritis conservative management being facilitated with AI met with unanimous agreement. AI-assisted decision-making was shown to have excellent analytical capabilities, but algorithms needs to be trained by orthopaedic experts with the correct inputs. Future additional efforts are still required to evaluate the incorporation of AI in clinical workflows.
Collapse
Affiliation(s)
- Christian Carulli
- Orthopaedic Clinic, University of Florence, Careggi University Hospital, 50121 Florence, Italy
| | - Stefano Marco Paolo Rossi
- Department of Life Science, Health, and Health Professions, Università degli Studi Link, 00165 Rome, Italy;
- Sezione Chirurgia Protesica ad Indirizzo Robotico, Unità di Traumatologia dello Sport, Fondazione Poliambulanza, 25124 Brescia, Italy
| | - Luca Magistrelli
- Ortopedia e Traumatologia, APUANE-NOA Hospital, 54100 Massa, Italy;
| | | | | |
Collapse
|
4
|
Nissen L, Rother JF, Heinemann M, Reimer LM, Jonas S, Raupach T. A randomised cross-over trial assessing the impact of AI-generated individual feedback on written online assignments for medical students. MEDICAL TEACHER 2025:1-7. [PMID: 39831699 DOI: 10.1080/0142159x.2025.2451870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 01/07/2025] [Indexed: 01/22/2025]
Abstract
PURPOSE Self-testing has been proven to significantly improve not only simple learning outcomes, but also higher-order skills such as clinical reasoning in medical students. Previous studies have shown that self-testing was especially beneficial when it was presented with feedback, which leaves the question whether an immediate and personalized feedback further encourages this effect. Therefore, we hypothesised that individual feedback has a greater effect on learning outcomes, compared to generic feedback. MATERIALS AND METHODS In a randomised cross-over trial, German medical students were invited to voluntarily answer daily key-feature questions via an App. For half of the items they received a generalised feedback by an expert, while the feedback on the other half was generated immediately through ChatGPT. After the intervention, the students participated in a mandatory exit exam. RESULTS Those participants who used the app more frequently experienced a better learning outcome compared to those who did not use it frequently, even though this finding was only examined in a correlative nature. The individual ChatGPT generated feedback did not show a greater effect on exit exam scores compared to the expert comment (51.8 ± 22.0% vs. 55.8 ± 22.8%; p = 0.06). CONCLUSION This study proves the concept of providing personalised feedback on medical questions. Despite the promising results, improved prompting and further development of the application seems necessary to strengthen the possible impact of the personalised feedback. Our study closes a research gap and holds great potential for further use not only in medicine but also in other academic fields.
Collapse
Affiliation(s)
- Leon Nissen
- Institute for Digital Medicine, University Hospital Bonn, Bonn, Germany
| | | | - Marie Heinemann
- Institute for Digital Medicine, University Hospital Bonn, Bonn, Germany
| | - Lara Marie Reimer
- Institute for Digital Medicine, University Hospital Bonn, Bonn, Germany
| | - Stephan Jonas
- Institute for Digital Medicine, University Hospital Bonn, Bonn, Germany
| | - Tobias Raupach
- Institute of Medical Education, University Hospital Bonn, Bonn, Germany
| |
Collapse
|
5
|
Du W, Jin X, Harris JC, Brunetti A, Johnson E, Leung O, Li X, Walle S, Yu Q, Zhou X, Bian F, McKenzie K, Kanathanavanich M, Ozcelik Y, El-Sharkawy F, Koga S. Large language models in pathology: A comparative study of ChatGPT and Bard with pathology trainees on multiple-choice questions. Ann Diagn Pathol 2024; 73:152392. [PMID: 39515029 DOI: 10.1016/j.anndiagpath.2024.152392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 10/31/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Large language models (LLMs), such as ChatGPT and Bard, have shown potential in various medical applications. This study aimed to evaluate the performance of LLMs, specifically ChatGPT and Bard, in pathology by comparing their performance with those of pathology trainees, and to assess the consistency of their responses. We selected 150 multiple-choice questions from 15 subspecialties, excluding those with images. Both ChatGPT and Bard were tested on these questions across three separate sessions between June 2023 and January 2024, and their responses were compared with those of 16 pathology trainees (8 junior and 8 senior) from two hospitals. Questions were categorized into easy, intermediate, and difficult based on trainee performance. Consistency and variability in LLM responses were analyzed across three evaluation sessions. ChatGPT significantly outperformed Bard and trainees, achieving an average total score of 82.2% compared to Bard's 49.5%, junior trainees' 45.1%, and senior trainees' 56.0%. ChatGPT's performance was notably stronger in difficult questions (63.4%-68.3%) compared to Bard (31.7%-34.1%) and trainees (4.9%-48.8%). For easy questions, ChatGPT (83.1%-91.5%) and trainees (73.7%-100.0%) showed similar high scores. Consistency analysis revealed that ChatGPT showed a high consistency rate of 80%-85% across three tests, whereas Bard exhibited greater variability with consistency rates of 54%-61%. While LLMs show significant promise in pathology education and practice, continued development and human oversight are crucial for reliable clinical application.
Collapse
Affiliation(s)
- Wei Du
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Xueting Jin
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Jaryse Carol Harris
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Alessandro Brunetti
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Erika Johnson
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Olivia Leung
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Xingchen Li
- Department of Pathology and Laboratory Medicine, Pennsylvania Hospital, Philadelphia, PA, United States of America
| | - Selemon Walle
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Qing Yu
- Department of Pathology and Laboratory Medicine, Pennsylvania Hospital, Philadelphia, PA, United States of America
| | - Xiao Zhou
- Department of Pathology and Laboratory Medicine, Pennsylvania Hospital, Philadelphia, PA, United States of America
| | - Fang Bian
- Department of Pathology and Laboratory Medicine, Pennsylvania Hospital, Philadelphia, PA, United States of America
| | - Kajanna McKenzie
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Manita Kanathanavanich
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Yusuf Ozcelik
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Farah El-Sharkawy
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America
| | - Shunsuke Koga
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, United States of America.
| |
Collapse
|
6
|
Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, Liang Q, Zhang J, Li X. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC MEDICAL EDUCATION 2024; 24:1372. [PMID: 39593041 PMCID: PMC11590336 DOI: 10.1186/s12909-024-06309-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 11/05/2024] [Indexed: 11/28/2024]
Abstract
BACKGROUND This study aimed to evaluate the performance of GPT-3.5, GPT-4, GPT-4o and Google Bard on the United States Medical Licensing Examination (USMLE), the Professional and Linguistic Assessments Board (PLAB), the Hong Kong Medical Licensing Examination (HKMLE) and the National Medical Licensing Examination (NMLE). METHODS This study was conducted in June 2023. Four LLMs (Large Language Models) (GPT-3.5, GPT-4, GPT-4o and Google Bard) were applied to four medical standardized tests (USMLE, PLAB, HKMLE and NMLE). All questions are multiple-choice questions and were sourced from the question banks of these examinations. RESULTS In USMLE step 1, step 2CK and Step 3, there are accuracy rates of 91.5%, 94.2% and 92.7% provided from GPT-4o, 93.2%, 95.0% and 92.0% provided from GPT-4, 65.6%, 71.6% and 68.5% provided from GPT-3.5, and 64.3%, 55.6%, 58.1% from Google Bard, respectively. In PLAB, HKMLE and NMLE, GPT-4o scored 93.3%, 91.7% and 84.9%, GPT-4 scored 86.7%, 89.6% and 69.8%, GPT-3.5 scored 80.0%, 68.1% and 60.4%, and Google Bard scored 54.2%, 71.7% and 61.3%. There was significant difference in the accuracy rates of four LLMs in the four medical licensing examinations. CONCLUSION GPT-4o performed better in the medical licensing examinations than other three LLMs. The performance of the four models in the NMLE examination needs further improvement. CLINICAL TRIAL NUMBER Not applicable.
Collapse
Affiliation(s)
- Yikai Chen
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China
| | - Xiujie Huang
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China
| | - Fangjie Yang
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China
| | - Haiming Lin
- Department of Orthopaedics, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515000, China
- School of Dentistry, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Haoyu Lin
- Department of Thyroid Breast Surgery, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515000, China
| | - Zhuoqun Zheng
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China
| | - Qifeng Liang
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China
| | - Jinhai Zhang
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China.
| | - Xinxin Li
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China.
| |
Collapse
|
7
|
Bicknell BT, Butler D, Whalen S, Ricks J, Dixon CJ, Clark AB, Spaedy O, Skelton A, Edupuganti N, Dzubinski L, Tate H, Dyess G, Lindeman B, Lehmann LS. ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis. JMIR MEDICAL EDUCATION 2024; 10:e63430. [PMID: 39504445 PMCID: PMC11611793 DOI: 10.2196/63430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 09/02/2024] [Accepted: 09/14/2024] [Indexed: 09/16/2024]
Abstract
Background Recent studies, including those by the National Board of Medical Examiners, have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of LLM performance in specific medical content areas, thus limiting an assessment of their potential utility in medical education. Objective This study aimed to assess and compare the accuracy of successive ChatGPT versions (GPT-3.5, GPT-4, and GPT-4 Omni) in USMLE disciplines, clinical clerkships, and the clinical skills of diagnostics and management. Methods This study used 750 clinical vignette-based multiple-choice questions to characterize the performance of successive ChatGPT versions (ChatGPT 3.5 [GPT-3.5], ChatGPT 4 [GPT-4], and ChatGPT 4 Omni [GPT-4o]) across USMLE disciplines, clinical clerkships, and in clinical skills (diagnostics and management). Accuracy was assessed using a standardized protocol, with statistical analyses conducted to compare the models' performances. Results GPT-4o achieved the highest accuracy across 750 multiple-choice questions at 90.4%, outperforming GPT-4 and GPT-3.5, which scored 81.1% and 60.0%, respectively. GPT-4o's highest performances were in social sciences (95.5%), behavioral and neuroscience (94.2%), and pharmacology (93.2%). In clinical skills, GPT-4o's diagnostic accuracy was 92.7% and management accuracy was 88.8%, significantly higher than its predecessors. Notably, both GPT-4o and GPT-4 significantly outperformed the medical student average accuracy of 59.3% (95% CI 58.3-60.3). Conclusions GPT-4o's performance in USMLE disciplines, clinical clerkships, and clinical skills indicates substantial improvements over its predecessors, suggesting significant potential for the use of this technology as an educational aid for medical students. These findings underscore the need for careful consideration when integrating LLMs into medical education, emphasizing the importance of structured curricula to guide their appropriate use and the need for ongoing critical analyses to ensure their reliability and effectiveness.
Collapse
Affiliation(s)
- Brenton T Bicknell
- UAB Heersink School of Medicine, 1670 University Blvd, Birmingham, AL, 35233, United States, 1 2566539498
| | - Danner Butler
- University of South Alabama Whiddon College of Medicine, Mobile, AL, United States
| | - Sydney Whalen
- University of Illinois College of Medicine, Chicago, IL, United States
| | - James Ricks
- Harvard Medical School, Boston, MA, United States
| | - Cory J Dixon
- Alabama College of Osteopathic Medicine, Dothan, AL, United States
| | | | - Olivia Spaedy
- Saint Louis University School of Medicine, St. Louis, MO, United States
| | - Adam Skelton
- UAB Heersink School of Medicine, 1670 University Blvd, Birmingham, AL, 35233, United States, 1 2566539498
| | - Neel Edupuganti
- Medical College of Georgia, Augusta University, Augusta, GA, United States
| | - Lance Dzubinski
- University of Colorado Anschutz Medical Campus School of Medicine, Aurora, CO, United States
| | - Hudson Tate
- UAB Heersink School of Medicine, 1670 University Blvd, Birmingham, AL, 35233, United States, 1 2566539498
| | - Garrett Dyess
- University of South Alabama Whiddon College of Medicine, Mobile, AL, United States
| | - Brenessa Lindeman
- UAB Heersink School of Medicine, 1670 University Blvd, Birmingham, AL, 35233, United States, 1 2566539498
| | - Lisa Soleymani Lehmann
- Harvard Medical School, Boston, MA, United States
- Mass General Brigham, Boston, MA, United States
| |
Collapse
|
8
|
Liu F, Chang X, Zhu Q, Huang Y, Li Y, Wang H. Assessing clinical medicine students' acceptance of large language model: based on technology acceptance model. BMC MEDICAL EDUCATION 2024; 24:1251. [PMID: 39490999 PMCID: PMC11533422 DOI: 10.1186/s12909-024-06232-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/21/2024] [Indexed: 11/05/2024]
Abstract
While large language models (LLMs) have demonstrated significant potential in medical education, there is limited understanding of medical students' acceptance of LLMs and the factors influencing their use. This study explores medical students' acceptance of LLMs in learning and examines the factors influencing this acceptance through the lens of the Technology Acceptance Model (TAM). A questionnaire survey conducted among Chinese medical students revealed a high willingness to use LLMs in their studies. The findings suggest that attitudes play a crucial role in predicting medical students' behavioral intentions to use LLMs, mediating the effects of perceived usefulness, perceived ease of use, and perceived risk. Additionally, perceived risk and social influence directly impact behavioral intentions. This study provides compelling evidence supporting the applicability of the TAM to the acceptance of LLMs in medical education, highlighting the necessity for medical students to utilize LLMs as an auxiliary tool in their learning process.
Collapse
Affiliation(s)
- Fuze Liu
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, No. 1 Shuaifuyuan, Beijing, 100730, People's Republic of China
| | - Xiao Chang
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, No. 1 Shuaifuyuan, Beijing, 100730, People's Republic of China
| | - Qi Zhu
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, No. 1 Shuaifuyuan, Beijing, 100730, People's Republic of China
| | - Yue Huang
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, No. 1 Shuaifuyuan, Beijing, 100730, People's Republic of China
| | - Yifei Li
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, No. 1 Shuaifuyuan, Beijing, 100730, People's Republic of China
| | - Hai Wang
- Department of Orthopaedic Surgery, Peking Union Medical College Hospital, Peking Union Medical College and Chinese Academy of Medical Sciences, No. 1 Shuaifuyuan, Beijing, 100730, People's Republic of China.
| |
Collapse
|
9
|
Lim B, Seth I, Cuomo R, Kenney PS, Ross RJ, Sofiadellis F, Pentangelo P, Ceccaroni A, Alfano C, Rozen WM. Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients. Aesthetic Plast Surg 2024; 48:4712-4724. [PMID: 38898239 PMCID: PMC11645314 DOI: 10.1007/s00266-024-04157-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024]
Abstract
BACKGROUND Abdominoplasty is a common operation, used for a range of cosmetic and functional issues, often in the context of divarication of recti, significant weight loss, and after pregnancy. Despite this, patient-surgeon communication gaps can hinder informed decision-making. The integration of large language models (LLMs) in healthcare offers potential for enhancing patient information. This study evaluated the feasibility of using LLMs for answering perioperative queries. METHODS This study assessed the efficacy of four leading LLMs-OpenAI's ChatGPT-3.5, Anthropic's Claude, Google's Gemini, and Bing's CoPilot-using fifteen unique prompts. All outputs were evaluated using the Flesch-Kincaid, Flesch Reading Ease score, and Coleman-Liau index for readability assessment. The DISCERN score and a Likert scale were utilized to evaluate quality. Scores were assigned by two plastic surgical residents and then reviewed and discussed until a consensus was reached by five plastic surgeon specialists. RESULTS ChatGPT-3.5 required the highest level for comprehension, followed by Gemini, Claude, then CoPilot. Claude provided the most appropriate and actionable advice. In terms of patient-friendliness, CoPilot outperformed the rest, enhancing engagement and information comprehensiveness. ChatGPT-3.5 and Gemini offered adequate, though unremarkable, advice, employing more professional language. CoPilot uniquely included visual aids and was the only model to use hyperlinks, although they were not very helpful and acceptable, and it faced limitations in responding to certain queries. CONCLUSION ChatGPT-3.5, Gemini, Claude, and Bing's CoPilot showcased differences in readability and reliability. LLMs offer unique advantages for patient care but require careful selection. Future research should integrate LLM strengths and address weaknesses for optimal patient education. LEVEL OF EVIDENCE V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Bryan Lim
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Ishith Seth
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Roberto Cuomo
- Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, Siena, Italy.
| | - Peter Sinkjær Kenney
- Department of Plastic Surgery, Velje Hospital, Beriderbakken 4, 7100, Vejle, Denmark
- Department of Plastic and Breast Surgery, Aarhus University Hospital, Aarhus, Denmark
| | - Richard J Ross
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Foti Sofiadellis
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | | | | | | | - Warren Matthew Rozen
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| |
Collapse
|
10
|
Gan W, Ouyang J, Li H, Xue Z, Zhang Y, Dong Q, Huang J, Zheng X, Zhang Y. Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial. J Med Internet Res 2024; 26:e57037. [PMID: 39163598 PMCID: PMC11372336 DOI: 10.2196/57037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/10/2024] [Accepted: 06/27/2024] [Indexed: 08/22/2024] Open
Abstract
BACKGROUND ChatGPT is a natural language processing model developed by OpenAI, which can be iteratively updated and optimized to accommodate the changing and complex requirements of human verbal communication. OBJECTIVE The study aimed to evaluate ChatGPT's accuracy in answering orthopedics-related multiple-choice questions (MCQs) and assess its short-term effects as a learning aid through a randomized controlled trial. In addition, long-term effects on student performance in other subjects were measured using final examination results. METHODS We first evaluated ChatGPT's accuracy in answering MCQs pertaining to orthopedics across various question formats. Then, 129 undergraduate medical students participated in a randomized controlled study in which the ChatGPT group used ChatGPT as a learning tool, while the control group was prohibited from using artificial intelligence software to support learning. Following a 2-week intervention, the 2 groups' understanding of orthopedics was assessed by an orthopedics test, and variations in the 2 groups' performance in other disciplines were noted through a follow-up at the end of the semester. RESULTS ChatGPT-4.0 answered 1051 orthopedics-related MCQs with a 70.60% (742/1051) accuracy rate, including 71.8% (237/330) accuracy for A1 MCQs, 73.7% (330/448) accuracy for A2 MCQs, 70.2% (92/131) accuracy for A3/4 MCQs, and 58.5% (83/142) accuracy for case analysis MCQs. As of April 7, 2023, a total of 129 individuals participated in the experiment. However, 19 individuals withdrew from the experiment at various phases; thus, as of July 1, 2023, a total of 110 individuals accomplished the trial and completed all follow-up work. After we intervened in the learning style of the students in the short term, the ChatGPT group answered more questions correctly than the control group (ChatGPT group: mean 141.20, SD 26.68; control group: mean 130.80, SD 25.56; P=.04) in the orthopedics test, particularly on A1 (ChatGPT group: mean 46.57, SD 8.52; control group: mean 42.18, SD 9.43; P=.01), A2 (ChatGPT group: mean 60.59, SD 10.58; control group: mean 56.66, SD 9.91; P=.047), and A3/4 MCQs (ChatGPT group: mean 19.57, SD 5.48; control group: mean 16.46, SD 4.58; P=.002). At the end of the semester, we found that the ChatGPT group performed better on final examinations in surgery (ChatGPT group: mean 76.54, SD 9.79; control group: mean 72.54, SD 8.11; P=.02) and obstetrics and gynecology (ChatGPT group: mean 75.98, SD 8.94; control group: mean 72.54, SD 8.66; P=.04) than the control group. CONCLUSIONS ChatGPT answers orthopedics-related MCQs accurately, and students using it excel in both short-term and long-term assessments. Our findings strongly support ChatGPT's integration into medical education, enhancing contemporary instructional methods. TRIAL REGISTRATION Chinese Clinical Trial Registry Chictr2300071774; https://www.chictr.org.cn/hvshowproject.html ?id=225740&v=1.0.
Collapse
Affiliation(s)
- Wenyi Gan
- The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Jianfeng Ouyang
- Department of Joint Surgery and Sports Medicine, Zhuhai People's Hospital (Zhuhai Hospital Affiliated With Jinan University), Zhuhai, Guangdong, China
| | - Hua Li
- Department of Orthopaedics, Beijing Jishuitan Hospital, Beijing, China
| | - Zhaowen Xue
- The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Yiming Zhang
- The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Qiu Dong
- The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Jiadong Huang
- Jinan University-University of Birmingham Joint Institute, Jinan University, Guangzhou, China
| | - Xiaofei Zheng
- The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Yiyi Zhang
- The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, Guangzhou, China
| |
Collapse
|
11
|
Gencer G, Gencer K. A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education. Cureus 2024; 16:e66517. [PMID: 39246999 PMCID: PMC11380914 DOI: 10.7759/cureus.66517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/09/2024] [Indexed: 09/10/2024] Open
Abstract
Background This study aims to evaluate the performance of ChatGPT in the medical specialization exam (MSE) that medical graduates take when choosing their postgraduate specialization and to reveal how artificial intelligence-supported education can increase the quality and academic success of medical education. The research aims to explore the potential applications and advantages of artificial intelligence in medical education and examine ways in which this technology can contribute to student learning and exam preparation. Methodology A total of 240 MSE questions were posed to ChatGPT, 120 of which were basic medical sciences questions and 120 were clinical medical sciences questions. A total of 18,481 people participated in the exam. The performance of medical school graduates was compared with ChatGPT-3.5 in terms of answering these questions correctly. The average score for ChatGPT-3.5 was calculated by averaging the minimum and maximum scores. Calculations were done using the R.4.0.2 environment. Results The general average score of graduates was a minimum of 7.51 in basic sciences and a maximum of 81.46, while in clinical sciences, the average was a minimum of 12.51 and a maximum of 80.78. ChatGPT, on the other hand, had an average of at least 60.00 in basic sciences and a maximum of 72.00, with an average of at least 66.25 and a maximum of 77.00 in clinical sciences. The rate of correct answers in basic medical sciences for graduates was 43.03%, while for ChatGPT was 60.00%. In clinical medical sciences, the rate of correct answers for graduates was 53.29%, while for ChatGPT was 64.16%. ChatGPT performed best with a 91.66% correct answer rate in Obstetrics and Gynecology and an 86.36% correct answer rate in Medical Microbiology. The least successful area for ChatGPT was Anatomy, with a 28.00% correct answer rate, a subfield of basic medical sciences. Graduates outperformed ChatGPT in the Anatomy and Physiology subfields. Significant differences were found in all comparisons between ChatGPT and graduates. Conclusions This study shows that artificial intelligence models such as ChatGPT can provide significant advantages to graduates, as they score higher than medical school graduates. In terms of these benefits, recommended applications include interactive support, private lessons, learning material production, personalized learning plans, self-assessment, motivation boosting, and 24/7 access, among a variety of benefits. As a result, artificial intelligence-supported education can play an important role in improving the quality of medical education and increasing student success.
Collapse
Affiliation(s)
- Gülcan Gencer
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Afyonkarahisar Health Sciences University, Afyonkarahisar, TUR
| | - Kerem Gencer
- Department of Computer Engineering, Faculty of Engineering, Afyon Kocatepe University, Afyonkarahisar, TUR
| |
Collapse
|
12
|
Yüce A, Yerli M, Misir A, Çakar M. Enhancing patient information texts in orthopaedics: How OpenAI's 'ChatGPT' can help. J Exp Orthop 2024; 11:e70019. [PMID: 39291057 PMCID: PMC11406043 DOI: 10.1002/jeo2.70019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/15/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024] Open
Abstract
Purpose The internet has become a primary source for patients seeking healthcare information, but the quality of online information, particularly in orthopaedics, often falls short. Orthopaedic surgeons now have the added responsibility of evaluating and guiding patients to credible online resources. This study aimed to assess ChatGPT's ability to identify deficiencies in patient information texts related to total hip arthroplasty websites and to evaluate its potential for enhancing the quality of these texts. Methods In August 2023, 25 websites related to total hip arthroplasty were assessed using a standardized search on Google. Peer-reviewed scientific articles, empty pages, dictionary definitions, and unrelated content were excluded. The remaining 10 websites were evaluated using the hip information scoring system (HISS). ChatGPT was then used to assess these texts, identify deficiencies and provide recommendations. Results The mean HISS score of the websites was 9.5, indicating low to moderate quality. However, after implementing ChatGPT's suggested improvements, the score increased to 21.5, signifying excellent quality. ChatGPT's recommendations included using simpler language, adding FAQs, incorporating patient experiences, addressing cost and insurance issues, detailing preoperative and postoperative phases, including references, and emphasizing emotional and psychological support. The study demonstrates that ChatGPT can significantly enhance patient information quality. Conclusion ChatGPT's role in elevating patient education regarding total hip arthroplasty is promising. This study sheds light on the potential of ChatGPT as an aid to orthopaedic surgeons in producing high-quality patient information materials. Although it cannot replace human expertise, it offers a valuable means of enhancing the quality of healthcare information available online. Level of Evidence Level IV.
Collapse
Affiliation(s)
- Ali Yüce
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| | - Mustafa Yerli
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| | - Abdulhamit Misir
- Department of Orthopedic and Traumatology Göztepe Medical Park Hospital İstanbul Turkey
| | - Murat Çakar
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| |
Collapse
|
13
|
Rao SJ, Isath A, Krishnan P, Tangsrivimol JA, Virk HUH, Wang Z, Glicksberg BS, Krittanawong C. ChatGPT: A Conceptual Review of Applications and Utility in the Field of Medicine. J Med Syst 2024; 48:59. [PMID: 38836893 DOI: 10.1007/s10916-024-02075-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 05/07/2024] [Indexed: 06/06/2024]
Abstract
Artificial Intelligence, specifically advanced language models such as ChatGPT, have the potential to revolutionize various aspects of healthcare, medical education, and research. In this narrative review, we evaluate the myriad applications of ChatGPT in diverse healthcare domains. We discuss its potential role in clinical decision-making, exploring how it can assist physicians by providing rapid, data-driven insights for diagnosis and treatment. We review the benefits of ChatGPT in personalized patient care, particularly in geriatric care, medication management, weight loss and nutrition, and physical activity guidance. We further delve into its potential to enhance medical research, through the analysis of large datasets, and the development of novel methodologies. In the realm of medical education, we investigate the utility of ChatGPT as an information retrieval tool and personalized learning resource for medical students and professionals. There are numerous promising applications of ChatGPT that will likely induce paradigm shifts in healthcare practice, education, and research. The use of ChatGPT may come with several benefits in areas such as clinical decision making, geriatric care, medication management, weight loss and nutrition, physical fitness, scientific research, and medical education. Nevertheless, it is important to note that issues surrounding ethics, data privacy, transparency, inaccuracy, and inadequacy persist. Prior to widespread use in medicine, it is imperative to objectively evaluate the impact of ChatGPT in a real-world setting using a risk-based approach.
Collapse
Affiliation(s)
- Shiavax J Rao
- Department of Medicine, MedStar Union Memorial Hospital, Baltimore, MD, USA
| | - Ameesh Isath
- Department of Cardiology, Westchester Medical Center and New York Medical College, Valhalla, NY, USA
| | - Parvathy Krishnan
- Department of Pediatrics, Westchester Medical Center and New York Medical College, Valhalla, NY, USA
| | - Jonathan A Tangsrivimol
- Division of Neurosurgery, Department of Surgery, Chulabhorn Hospital, Chulabhorn Royal Academy, Bangkok, 10210, Thailand
- Department of Neurological Surgery, Weill Cornell Medicine Brain and Spine Center, New York, NY, 10022, USA
| | - Hafeez Ul Hassan Virk
- Harrington Heart & Vascular Institute, Case Western Reserve University, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
| | - Zhen Wang
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
- Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Benjamin S Glicksberg
- Hasso Plattner Institute for Digital Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chayakrit Krittanawong
- Cardiology Division, NYU Langone Health and NYU School of Medicine, 550 First Avenue, New York, NY, 10016, USA.
| |
Collapse
|
14
|
Pohl NB, Derector E, Rivlin M, Bachoura A, Tosti R, Kachooei AR, Beredjiklian PK, Fletcher DJ. A quality and readability comparison of artificial intelligence and popular health website education materials for common hand surgery procedures. HAND SURGERY & REHABILITATION 2024; 43:101723. [PMID: 38782361 DOI: 10.1016/j.hansur.2024.101723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 05/16/2024] [Accepted: 05/18/2024] [Indexed: 05/25/2024]
Abstract
INTRODUCTION ChatGPT and its application in producing patient education materials for orthopedic hand disorders has not been extensively studied. This study evaluated the quality and readability of educational information pertaining to common hand surgeries from patient education websites and information produced by ChatGPT. METHODS Patient education information for four hand surgeries (carpal tunnel release, trigger finger release, Dupuytren's contracture, and ganglion cyst surgery) was extracted from ChatGPT (at a scientific and fourth-grade reading level), WebMD, and Mayo Clinic. In a blinded and randomized fashion, five fellowship-trained orthopaedic hand surgeons evaluated the quality of information using a modified DISCERN criteria. Readability and reading grade level were assessed using Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) equations. RESULTS The Mayo Clinic website scored higher in terms of quality for carpal tunnel release information (p = 0.004). WebMD scored higher for Dupuytren's contracture release (p < 0.001), ganglion cyst surgery (p = 0.003), and overall quality (p < 0.001). ChatGPT - 4th Grade Reading Level, ChatGPT - Scientific Reading Level, WebMD, and Mayo Clinic written materials on average exceeded recommended reading grade levels (4th-6th grade) by at least four grade levels (10th, 14th, 13th, and 11th grade, respectively). CONCLUSIONS ChatGPT provides inferior education materials compared to patient-friendly websites. When prompted to provide more easily read materials, ChatGPT generates less robust information compared to patient-friendly websites and does not adequately simplify the educational information. ChatGPT has potential to improve the quality and readability of patient education materials but currently, patient-friendly websites provide superior quality at similar reading comprehension levels.
Collapse
Affiliation(s)
- Nicholas B Pohl
- Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Philadelphia, PA, USA.
| | - Evan Derector
- Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Philadelphia, PA, USA
| | - Michael Rivlin
- Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Philadelphia, PA, USA
| | - Abdo Bachoura
- Department of Orthopaedic Surgery, Rothman Orthopaedics Florida, Orlando, FL, USA
| | - Rick Tosti
- Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Philadelphia, PA, USA
| | - Amir R Kachooei
- Department of Orthopaedic Surgery, Rothman Orthopaedics Florida, Orlando, FL, USA
| | - Pedro K Beredjiklian
- Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Philadelphia, PA, USA
| | - Daniel J Fletcher
- Department of Orthopaedic Surgery, Rothman Orthopaedic Institute, Philadelphia, PA, USA
| |
Collapse
|
15
|
Dsouza JM. A Student's Viewpoint on ChatGPT Use and Automation Bias in Medical Education. JMIR MEDICAL EDUCATION 2024; 10:e57696. [PMID: 38623729 PMCID: PMC11034419 DOI: 10.2196/57696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 03/03/2024] [Accepted: 03/28/2024] [Indexed: 04/17/2024]
|
16
|
Wu Y, Zheng Y, Feng B, Yang Y, Kang K, Zhao A. Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students. JMIR MEDICAL EDUCATION 2024; 10:e52483. [PMID: 38598263 PMCID: PMC11043925 DOI: 10.2196/52483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/03/2023] [Accepted: 01/17/2024] [Indexed: 04/11/2024]
Abstract
ChatGPT (OpenAI), a cutting-edge natural language processing model, holds immense promise for revolutionizing medical education. With its remarkable performance in language-related tasks, ChatGPT offers personalized and efficient learning experiences for medical students and doctors. Through training, it enhances clinical reasoning and decision-making skills, leading to improved case analysis and diagnosis. The model facilitates simulated dialogues, intelligent tutoring, and automated question-answering, enabling the practical application of medical knowledge. However, integrating ChatGPT into medical education raises ethical and legal concerns. Safeguarding patient data and adhering to data protection regulations are critical. Transparent communication with students, physicians, and patients is essential to ensure their understanding of the technology's purpose and implications, as well as the potential risks and benefits. Maintaining a balance between personalized learning and face-to-face interactions is crucial to avoid hindering critical thinking and communication skills. Despite challenges, ChatGPT offers transformative opportunities. Integrating it with problem-based learning, team-based learning, and case-based learning methodologies can further enhance medical education. With proper regulation and supervision, ChatGPT can contribute to a well-rounded learning environment, nurturing skilled and knowledgeable medical professionals ready to tackle health care challenges. By emphasizing ethical considerations and human-centric approaches, ChatGPT's potential can be fully harnessed in medical education, benefiting both students and patients alike.
Collapse
Affiliation(s)
- Yijun Wu
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Yue Zheng
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Baijie Feng
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Yuqi Yang
- West China School of Medicine, Sichuan University, Chengdu, China
| | - Kai Kang
- Cancer Center, West China Hospital, Sichuan University, Chengdu, China
- Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
| | - Ailin Zhao
- Department of Hematology, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
17
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
18
|
Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu NY, Bartlett R, Hanson J, Haas M, Spadafore M, Grafton-Clarke C, Gasiea RY, Michie C, Corral J, Kwan B, Dolmans D, Thammasitboon S. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. MEDICAL TEACHER 2024; 46:446-470. [PMID: 38423127 DOI: 10.1080/0142159x.2024.2314198] [Citation(s) in RCA: 59] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/31/2024] [Indexed: 03/02/2024]
Abstract
BACKGROUND Artificial Intelligence (AI) is rapidly transforming healthcare, and there is a critical need for a nuanced understanding of how AI is reshaping teaching, learning, and educational practice in medical education. This review aimed to map the literature regarding AI applications in medical education, core areas of findings, potential candidates for formal systematic review and gaps for future research. METHODS This rapid scoping review, conducted over 16 weeks, employed Arksey and O'Malley's framework and adhered to STORIES and BEME guidelines. A systematic and comprehensive search across PubMed/MEDLINE, EMBASE, and MedEdPublish was conducted without date or language restrictions. Publications included in the review spanned undergraduate, graduate, and continuing medical education, encompassing both original studies and perspective pieces. Data were charted by multiple author pairs and synthesized into various thematic maps and charts, ensuring a broad and detailed representation of the current landscape. RESULTS The review synthesized 278 publications, with a majority (68%) from North American and European regions. The studies covered diverse AI applications in medical education, such as AI for admissions, teaching, assessment, and clinical reasoning. The review highlighted AI's varied roles, from augmenting traditional educational methods to introducing innovative practices, and underscores the urgent need for ethical guidelines in AI's application in medical education. CONCLUSION The current literature has been charted. The findings underscore the need for ongoing research to explore uncharted areas and address potential risks associated with AI use in medical education. This work serves as a foundational resource for educators, policymakers, and researchers in navigating AI's evolving role in medical education. A framework to support future high utility reporting is proposed, the FACETS framework.
Collapse
Affiliation(s)
- Morris Gordon
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
- Blackpool Hospitals NHS Foundation Trust, Blackpool, UK
| | - Michelle Daniel
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Aderonke Ajiboye
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
| | - Hussein Uraiby
- Department of Cellular Pathology, University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Nicole Y Xu
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Rangana Bartlett
- Department of Cognitive Science, University of California, San Diego, CA, USA
| | - Janice Hanson
- Department of Medicine and Office of Education, School of Medicine, Washington University in Saint Louis, Saint Louis, MO, USA
| | - Mary Haas
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Maxwell Spadafore
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | | | - Colin Michie
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
| | - Janet Corral
- Department of Medicine, University of Nevada Reno, School of Medicine, Reno, NV, USA
| | - Brian Kwan
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Diana Dolmans
- School of Health Professions Education, Faculty of Health, Maastricht University, Maastricht, NL, USA
| | - Satid Thammasitboon
- Center for Research, Innovation and Scholarship in Health Professions Education, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
19
|
Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models for generating medical examinations: systematic review. BMC MEDICAL EDUCATION 2024; 24:354. [PMID: 38553693 PMCID: PMC10981304 DOI: 10.1186/s12909-024-05239-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/28/2024] [Indexed: 04/01/2024]
Abstract
BACKGROUND Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs. METHODS The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool. RESULTS Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify. CONCLUSIONS LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
Collapse
Affiliation(s)
- Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Ha'Hadas St. 1, Rishon Le Zion, Zefat, 7550598, Israel.
| | - Vera Sorin
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Tel-Aviv University School of Medicine, Tel Aviv, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Ramat Gan, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Tel-Aviv University School of Medicine, Tel Aviv, Israel
| | - Benjamin S Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
20
|
Magalhães Araujo S, Cruz-Correia R. Incorporating ChatGPT in Medical Informatics Education: Mixed Methods Study on Student Perceptions and Experiential Integration Proposals. JMIR MEDICAL EDUCATION 2024; 10:e51151. [PMID: 38506920 PMCID: PMC10993110 DOI: 10.2196/51151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/29/2023] [Accepted: 11/10/2023] [Indexed: 03/21/2024]
Abstract
BACKGROUND The integration of artificial intelligence (AI) technologies, such as ChatGPT, in the educational landscape has the potential to enhance the learning experience of medical informatics students and prepare them for using AI in professional settings. The incorporation of AI in classes aims to develop critical thinking by encouraging students to interact with ChatGPT and critically analyze the responses generated by the chatbot. This approach also helps students develop important skills in the field of biomedical and health informatics to enhance their interaction with AI tools. OBJECTIVE The aim of the study is to explore the perceptions of students regarding the use of ChatGPT as a learning tool in their educational context and provide professors with examples of prompts for incorporating ChatGPT into their teaching and learning activities, thereby enhancing the educational experience for students in medical informatics courses. METHODS This study used a mixed methods approach to gain insights from students regarding the use of ChatGPT in education. To accomplish this, a structured questionnaire was applied to evaluate students' familiarity with ChatGPT, gauge their perceptions of its use, and understand their attitudes toward its use in academic and learning tasks. Learning outcomes of 2 courses were analyzed to propose ChatGPT's incorporation in master's programs in medicine and medical informatics. RESULTS The majority of students expressed satisfaction with the use of ChatGPT in education, finding it beneficial for various purposes, including generating academic content, brainstorming ideas, and rewriting text. While some participants raised concerns about potential biases and the need for informed use, the overall perception was positive. Additionally, the study proposed integrating ChatGPT into 2 specific courses in the master's programs in medicine and medical informatics. The incorporation of ChatGPT was envisioned to enhance student learning experiences and assist in project planning, programming code generation, examination preparation, workflow exploration, and technical interview preparation, thus advancing medical informatics education. In medical teaching, it will be used as an assistant for simplifying the explanation of concepts and solving complex problems, as well as for generating clinical narratives and patient simulators. CONCLUSIONS The study's valuable insights into medical faculty students' perspectives and integration proposals for ChatGPT serve as an informative guide for professors aiming to enhance medical informatics education. The research delves into the potential of ChatGPT, emphasizes the necessity of collaboration in academic environments, identifies subject areas with discernible benefits, and underscores its transformative role in fostering innovative and engaging learning experiences. The envisaged proposals hold promise in empowering future health care professionals to work in the rapidly evolving era of digital health care.
Collapse
Affiliation(s)
- Sabrina Magalhães Araujo
- Center for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Ricardo Cruz-Correia
- Center for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- Department of Community Medicine, Information and Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
- Working Group Education, European Federation for Medical Informatics, Le Mont-sur-Lausanne, Switzerland
| |
Collapse
|
21
|
Xu X, Chen Y, Miao J. Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review. JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS 2024; 21:6. [PMID: 38486402 PMCID: PMC11035906 DOI: 10.3352/jeehp.2024.21.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 03/05/2024] [Indexed: 03/19/2024]
Abstract
BACKGROUND ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored. METHODS A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included. RESULTS ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students. CONCLUSION ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.
Collapse
Affiliation(s)
- Xiaojun Xu
- Division of Hematology/Oncology, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Centre for Child Health, Zhejiang, China
| | - Yixiao Chen
- Division of Hematology/Oncology, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Centre for Child Health, Zhejiang, China
| | - Jing Miao
- Division of Hematology/Oncology, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Centre for Child Health, Zhejiang, China
| |
Collapse
|
22
|
Li Y, Li J. Generative artificial intelligence in medical education: way to solve the problems. Postgrad Med J 2024; 100:203-204. [PMID: 38061077 DOI: 10.1093/postmj/qgad116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 11/10/2023] [Indexed: 02/20/2024]
Affiliation(s)
- Yanxing Li
- Department of Clinical Medicine, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710000, People's Republic of China
| | - Jianjun Li
- Department of Cardiology, Changzhi Medical College, Jincheng People's Hospital, Jincheng, Shanxi 048000, People's Republic of China
| |
Collapse
|
23
|
Levin G, Horesh N, Brezinov Y, Meyer R. Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis. BJOG 2024; 131:378-380. [PMID: 37604703 DOI: 10.1111/1471-0528.17641] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/04/2023] [Accepted: 08/05/2023] [Indexed: 08/23/2023]
Affiliation(s)
- Gabriel Levin
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Quebec, Quebec City, Canada
- Faculty of Medicine, Department of Gynecologic Oncology, Hadassah Medical Center, Hebrew University Jerusalem, Jerusalem, Israel
| | - Nir Horesh
- Ellen Leifer Shulman and Steven Shulman Digestive Disease Center, Cleveland Clinic Florida, Florida, Weston, USA
| | - Yoav Brezinov
- Lady Davis Institute for Cancer Research, Jewish General Hospital, McGill University, Quebec, Quebec City, Canada
| | - Raanan Meyer
- Division of Minimally Invasive Gynecologic Surgery, Department of Obstetrics and Gynecology, Cedars Sinai Medical Center, California, Los Angeles, USA
| |
Collapse
|
24
|
Bečulić H, Begagić E, Skomorac R, Mašović A, Selimović E, Pojskić M. ChatGPT's contributions to the evolution of neurosurgical practice and education: a systematic review of benefits, concerns and limitations. MEDICINSKI GLASNIK : OFFICIAL PUBLICATION OF THE MEDICAL ASSOCIATION OF ZENICA-DOBOJ CANTON, BOSNIA AND HERZEGOVINA 2024; 21:126-131. [PMID: 37950660 DOI: 10.17392/1661-23] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/12/2023] [Accepted: 10/14/2023] [Indexed: 11/13/2023]
Abstract
Aim This study provides a comprehensive review of the current literature on the use of ChatGPT, a generative Artificial Intelligence (AI) tool, in neurosurgery. The study examines potential benefits and limitations of ChatGPT in neurosurgical practice and education. Methods The study involved a systematic review of the current literature on the use of AI in neurosurgery, with a focus on ChatGPT. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed to ensure a comprehensive and transparent review process. Thirteen studies met the inclusion criteria and were included in the final analysis. The data extracted from the included studies were analysed and synthesized to provide an overview of the current state of research on the use of ChatGPT in neurosurgery. Results The ChatGPT showed a potential to complement and enhance neurosurgical practice. However, there are risks and limitations associated with its use, including question format limitations, validation challenges, and algorithmic bias. The study highlights the importance of validating machine-generated content for accuracy and addressing ethical concerns associated with AI technologies. The study also identifies potential benefits of ChatGPT, such as providing personalized treatment plans, supporting surgical planning and navigation, and enhancing large data processing efficiency and accuracy. Conclusion The integration of AI technologies into neurosurgery should be approached with caution and careful consideration of ethical and validation issues. Continued research and development of AI tools in neurosurgery can help us further understand their potential benefits and limitations.
Collapse
Affiliation(s)
- Hakija Bečulić
- Department of Neurosurgery, Cantonal Hospital Zenica, Zenica, Bosnia and Herzegovina
- Department of Anatomy, School of Medicine, University of Zenica, Zenica, Bosnia and Herzegovina
| | - Emir Begagić
- Deparment of General Medicine, School of Medicine, University of Zenica, Zenica, Bosnia and Herzegovina
| | - Rasim Skomorac
- Department of Neurosurgery, Cantonal Hospital Zenica, Zenica, Bosnia and Herzegovina
- Deparment of Surgery, School of Medicine, University of Zenica, Zenica, Bosnia and Herzegovina
| | - Anes Mašović
- Department of Neurosurgery, Cantonal Hospital Zenica, Zenica, Bosnia and Herzegovina
| | - Edin Selimović
- Deparment of Surgery, School of Medicine, University of Zenica, Zenica, Bosnia and Herzegovina
| | - Mirza Pojskić
- Department of Neurosurgery, University of Marburg, Marburg, Germany
| |
Collapse
|
25
|
Nguyen T. ChatGPT in Medical Education: A Precursor for Automation Bias? JMIR MEDICAL EDUCATION 2024; 10:e50174. [PMID: 38231545 PMCID: PMC10831594 DOI: 10.2196/50174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Artificial intelligence (AI) in health care has the promise of providing accurate and efficient results. However, AI can also be a black box, where the logic behind its results is nonrational. There are concerns if these questionable results are used in patient care. As physicians have the duty to provide care based on their clinical judgment in addition to their patients' values and preferences, it is crucial that physicians validate the results from AI. Yet, there are some physicians who exhibit a phenomenon known as automation bias, where there is an assumption from the user that AI is always right. This is a dangerous mindset, as users exhibiting automation bias will not validate the results, given their trust in AI systems. Several factors impact a user's susceptibility to automation bias, such as inexperience or being born in the digital age. In this editorial, I argue that these factors and a lack of AI education in the medical school curriculum cause automation bias. I also explore the harms of automation bias and why prospective physicians need to be vigilant when using AI. Furthermore, it is important to consider what attitudes are being taught to students when introducing ChatGPT, which could be some students' first time using AI, prior to their use of AI in the clinical setting. Therefore, in attempts to avoid the problem of automation bias in the long-term, in addition to incorporating AI education into the curriculum, as is necessary, the use of ChatGPT in medical education should be limited to certain tasks. Otherwise, having no constraints on what ChatGPT should be used for could lead to automation bias.
Collapse
Affiliation(s)
- Tina Nguyen
- The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
26
|
Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, Cabeza-Osorio L, Abasolo-Alcázar L, León-Mateos L, Fernández-Gutiérrez B, Rodríguez-Rodríguez L. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 2023; 13:22129. [PMID: 38092821 PMCID: PMC10719375 DOI: 10.1038/s41598-023-49483-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models' accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1-Q3), 4.5 (2.33-4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5-4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain.
| | - Zulema Rosales-Rosado
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Dalifer Freites-Nuñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Inés Pérez-Sancristóbal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Esperanza Pato-Cour
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | | | - Luis Cabeza-Osorio
- Medicina Interna, Hospital Universitario del Henares, Avenida de Marie Curie, 0, 28822, Madrid, Spain
- Facultad de Medicina, Universidad Francisco de Vitoria, Carretera Pozuelo, Km 1800, 28223, Madrid, Spain
| | - Lydia Abasolo-Alcázar
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Leticia León-Mateos
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
- Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| |
Collapse
|
27
|
Barrington NM, Gupta N, Musmar B, Doyle D, Panico N, Godbole N, Reardon T, D’Amico RS. A Bibliometric Analysis of the Rise of ChatGPT in Medical Research. Med Sci (Basel) 2023; 11:61. [PMID: 37755165 PMCID: PMC10535733 DOI: 10.3390/medsci11030061] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/04/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023] Open
Abstract
The rapid emergence of publicly accessible artificial intelligence platforms such as large language models (LLMs) has led to an equally rapid increase in articles exploring their potential benefits and risks. We performed a bibliometric analysis of ChatGPT literature in medicine and science to better understand publication trends and knowledge gaps. Following title, abstract, and keyword searches of PubMed, Embase, Scopus, and Web of Science databases for ChatGPT articles published in the medical field, articles were screened for inclusion and exclusion criteria. Data were extracted from included articles, with citation counts obtained from PubMed and journal metrics obtained from Clarivate Journal Citation Reports. After screening, 267 articles were included in the study, most of which were editorials or correspondence with an average of 7.5 +/- 18.4 citations per publication. Published articles on ChatGPT were authored largely in the United States, India, and China. The topics discussed included use and accuracy of ChatGPT in research, medical education, and patient counseling. Among non-surgical specialties, radiology published the most ChatGPT-related articles, while plastic surgery published the most articles among surgical specialties. The average citation number among the top 20 most-cited articles was 60.1 +/- 35.3. Among journals with the most ChatGPT-related publications, there were on average 10 +/- 3.7 publications. Our results suggest that managing the inevitable ethical and safety issues that arise with the implementation of LLMs will require further research exploring the capabilities and accuracy of ChatGPT, to generate policies guiding the adoption of artificial intelligence in medicine and science.
Collapse
Affiliation(s)
- Nikki M. Barrington
- Chicago Medical School, Rosalind Franklin University, North Chicago, IL 60064, USA
| | - Nithin Gupta
- School of Osteopathic Medicine, Campbell University, Lillington, NC 27546, USA
| | - Basel Musmar
- Faculty of Medicine and Health Sciences, An-Najah National University, Nablus P.O. Box 7, West Bank, Palestine
| | - David Doyle
- Central Michigan College of Medicine, Mount Pleasant, MI 48858, USA
| | - Nicholas Panico
- Lake Erie College of Osteopathic Medicine, Erie, PA 16509, USA
| | - Nikhil Godbole
- School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Taylor Reardon
- Department of Neurology, Henry Ford Hospital, Detroit, MI 48202, USA
| | - Randy S. D’Amico
- Department of Neurosurgery, Lenox Hill Hospital, New York, NY 10075, USA
| |
Collapse
|
28
|
Iqbal J, Cortés Jaimes DC, Makineni P, Subramani S, Hemaida S, Thugu TR, Butt AN, Sikto JT, Kaur P, Lak MA, Augustine M, Shahzad R, Arain M. Reimagining Healthcare: Unleashing the Power of Artificial Intelligence in Medicine. Cureus 2023; 15:e44658. [PMID: 37799217 PMCID: PMC10549955 DOI: 10.7759/cureus.44658] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2023] [Indexed: 10/07/2023] Open
Abstract
Artificial intelligence (AI) has opened new medical avenues and revolutionized diagnostic and therapeutic practices, allowing healthcare providers to overcome significant challenges associated with cost, disease management, accessibility, and treatment optimization. Prominent AI technologies such as machine learning (ML) and deep learning (DL) have immensely influenced diagnostics, patient monitoring, novel pharmaceutical discoveries, drug development, and telemedicine. Significant innovations and improvements in disease identification and early intervention have been made using AI-generated algorithms for clinical decision support systems and disease prediction models. AI has remarkably impacted clinical drug trials by amplifying research into drug efficacy, adverse events, and candidate molecular design. AI's precision and analysis regarding patients' genetic, environmental, and lifestyle factors have led to individualized treatment strategies. During the COVID-19 pandemic, AI-assisted telemedicine set a precedent for remote healthcare delivery and patient follow-up. Moreover, AI-generated applications and wearable devices have allowed ambulatory monitoring of vital signs. However, apart from being immensely transformative, AI's contribution to healthcare is subject to ethical and regulatory concerns. AI-backed data protection and algorithm transparency should be strictly adherent to ethical principles. Vigorous governance frameworks should be in place before incorporating AI in mental health interventions through AI-operated chatbots, medical education enhancements, and virtual reality-based training. The role of AI in medical decision-making has certain limitations, necessitating the importance of hands-on experience. Therefore, reaching an optimal balance between AI's capabilities and ethical considerations to ensure impartial and neutral performance in healthcare applications is crucial. This narrative review focuses on AI's impact on healthcare and the importance of ethical and balanced incorporation to make use of its full potential.
Collapse
Affiliation(s)
| | - Diana Carolina Cortés Jaimes
- Epidemiology, Universidad Autónoma de Bucaramanga, Bucaramanga, COL
- Medicine, Pontificia Universidad Javeriana, Bogotá, COL
| | - Pallavi Makineni
- Medicine, All India Institute of Medical Sciences, Bhubaneswar, Bhubaneswar, IND
| | - Sachin Subramani
- Medicine and Surgery, Employees' State Insurance Corporation (ESIC) Medical College, Gulbarga, IND
| | - Sarah Hemaida
- Internal Medicine, Istanbul Okan University, Istanbul, TUR
| | - Thanmai Reddy Thugu
- Internal Medicine, Sri Padmavathi Medical College for Women, Sri Venkateswara Institute of Medical Sciences (SVIMS), Tirupati, IND
| | - Amna Naveed Butt
- Medicine/Internal Medicine, Allama Iqbal Medical College, Lahore, PAK
| | | | - Pareena Kaur
- Medicine, Punjab Institute of Medical Sciences, Jalandhar, IND
| | | | | | - Roheen Shahzad
- Medicine, Combined Military Hospital (CMH) Lahore Medical College and Institute of Dentistry, Lahore, PAK
| | - Mustafa Arain
- Internal Medicine, Civil Hospital Karachi, Karachi, PAK
| |
Collapse
|