1
|
Weisman D, Sugarman A, Huang YM, Gelberg L, Ganz PA, Comulada WS. Development of a GPT-4-Powered Virtual Simulated Patient and Communication Training Platform for Medical Students to Practice Discussing Abnormal Mammogram Results With Patients: Multiphase Study. JMIR Form Res 2025; 9:e65670. [PMID: 40246299 PMCID: PMC12046251 DOI: 10.2196/65670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 01/21/2025] [Accepted: 02/14/2025] [Indexed: 04/19/2025] Open
Abstract
BACKGROUND Standardized patients (SPs) prepare medical students for difficult conversations with patients. Despite their value, SP-based simulation training is constrained by available resources and competing clinical demands. Researchers are turning to artificial intelligence and large language models, such as generative pretrained transformers, to create communication training that incorporates virtual simulated patients (VSPs). GPT-4 is a large language model advance allowing developers to design virtual simulation scenarios using text-based prompts instead of relying on branching path simulations with prescripted dialogue. These nascent developmental practices have not taken root in the literature to guide other researchers in developing their own simulations. OBJECTIVE This study aims to describe our developmental process and lessons learned for creating a GPT-4-driven VSP. We designed the VSP to help medical student learners rehearse discussing abnormal mammography results with a patient as a primary care physician (PCP). We aimed to assess GPT-4's ability to generate appropriate VSP responses to learners during spoken conversations and provide appropriate feedback on learner performance. METHODS A research team comprised of physicians, a medical student, an educator, an SP program director, a learning experience designer, and a health care researcher conducted the study. A formative phase with in-depth knowledge user interviews informed development, followed by a development phase to create the virtual training module. The team conducted interviews with 5 medical students, 5 PCPs, and 5 breast cancer survivors. They then developed a VSP using simulation authoring software and provided the GPT-4-enabled VSP with an initial prompt consisting of a scenario description, emotional state, and expectations for learner dialogue. It was iteratively refined through an agile design process involving repeated cycles of testing, documenting issues, and revising the prompt. As an exploratory feature, the simulation used GPT-4 to provide written feedback to learners about their performance communicating with the VSP and their adherence to guidelines for difficult conversations. RESULTS In-depth interviews helped establish the appropriate timing, mode of communication, and protocol for conversations between PCPs and patients during the breast cancer screening process. The scenario simulated a telephone call between a physician and patient to discuss the abnormal results of a diagnostic mammogram that that indicated a need for a biopsy. Preliminary testing was promising. The VSP asked sensible questions about their mammography results and responded to learner inquiries using a voice replete with appropriate emotional inflections. GPT-4 generated performance feedback that successfully identified strengths and areas for improvement using relevant quotes from the learner-VSP conversation, but it occasionally misidentified learner adherence to communication protocols. CONCLUSIONS GPT-4 streamlined development and facilitated more dynamic, humanlike interactions between learners and the VSP compared to branching path simulations. For the next steps, we will pilot-test the VSP with medical students to evaluate its feasibility and acceptability.
Collapse
Affiliation(s)
- Dan Weisman
- UCLA Simulation Center, University of California, Los Angeles, Los Angeles, CA, United States
| | - Alanna Sugarman
- David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Yue Ming Huang
- UCLA Simulation Center, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Lillian Gelberg
- Department of Family Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Health Policy and Management, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, United States
| | - Patricia A Ganz
- David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Health Policy and Management, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, United States
| | - Warren Scott Comulada
- Department of Health Policy and Management, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
2
|
Berikol GB, Kanbakan A, Ilhan B, Doğanay F. Mapping artificial intelligence models in emergency medicine: A scoping review on artificial intelligence performance in emergency care and education. Turk J Emerg Med 2025; 25:67-91. [PMID: 40248473 PMCID: PMC12002153 DOI: 10.4103/tjem.tjem_45_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 03/07/2025] [Accepted: 03/08/2025] [Indexed: 04/19/2025] Open
Abstract
Artificial intelligence (AI) is increasingly improving the processes such as emergency patient care and emergency medicine education. This scoping review aims to map the use and performance of AI models in emergency medicine regarding AI concepts. The findings show that AI-based medical imaging systems provide disease detection with 85%-90% accuracy in imaging techniques such as X-ray and computed tomography scans. In addition, AI-supported triage systems were found to be successful in correctly classifying low- and high-urgency patients. In education, large language models have provided high accuracy rates in evaluating emergency medicine exams. However, there are still challenges in the integration of AI into clinical workflows and model generalization capacity. These findings demonstrate the potential of updated AI models, but larger-scale studies are still needed.
Collapse
Affiliation(s)
| | - Altuğ Kanbakan
- Department of Emergency Medicine, Ufuk University School of Medicine, Ankara, Türkiye
| | - Buğra Ilhan
- Department of Emergency Medicine, Kırıkkale University School of Medicine, Kırıkkale, Türkiye
| | - Fatih Doğanay
- Department of Emergency Medicine, University of Health Sciences School of Medicine, İstanbul, Türkiye
| |
Collapse
|
3
|
Morreale MK, Balon R, Beresin EV, Seritan A, Castillo EG, Thomas LA, Louie AK, Aggarwal R, Guerrero APS, Coverdale J, Brenner AM. Artificial Intelligence and Medical Education, Academic Writing, and Journal Policies: A Focus on Large Language Models. ACADEMIC PSYCHIATRY : THE JOURNAL OF THE AMERICAN ASSOCIATION OF DIRECTORS OF PSYCHIATRIC RESIDENCY TRAINING AND THE ASSOCIATION FOR ACADEMIC PSYCHIATRY 2025; 49:5-9. [PMID: 39384717 DOI: 10.1007/s40596-024-02071-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2024]
Affiliation(s)
| | | | - Eugene V Beresin
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Andreea Seritan
- University of California San Francisco, San Francisco, CA, USA
| | - Enrico G Castillo
- Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Lia A Thomas
- VA North Texas Health Care System, Dallas, TX, USA
- University of Texas Southwestern Medical Center, Dallas, TX, USA
| | | | | | | | | | - Adam M Brenner
- University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
4
|
Li R, Wu T. Evolution of Artificial Intelligence in Medical Education From 2000 to 2024: Bibliometric Analysis. Interact J Med Res 2025; 14:e63775. [PMID: 39883926 PMCID: PMC11826936 DOI: 10.2196/63775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 11/04/2024] [Accepted: 11/26/2024] [Indexed: 02/01/2025] Open
Abstract
BACKGROUND Incorporating artificial intelligence (AI) into medical education has gained significant attention for its potential to enhance teaching and learning outcomes. However, it lacks a comprehensive study depicting the academic performance and status of AI in the medical education domain. OBJECTIVE This study aims to analyze the social patterns, productive contributors, knowledge structure, and clusters since the 21st century. METHODS Documents were retrieved from the Web of Science Core Collection database from 2000 to 2024. VOSviewer, Incites, and Citespace were used to analyze the bibliometric metrics, which were categorized by country, institution, authors, journals, and keywords. The variables analyzed encompassed counts, citations, H-index, impact factor, and collaboration metrics. RESULTS Altogether, 7534 publications were initially retrieved and 2775 were included for analysis. The annual count and citation of papers exhibited exponential trends since 2018. The United States emerged as the lead contributor due to its high productivity and recognition levels. Stanford University, Johns Hopkins University, National University of Singapore, Mayo Clinic, University of Arizona, and University of Toronto were representative institutions in their respective fields. Cureus, JMIR Medical Education, Medical Teacher, and BMC Medical Education ranked as the top four most productive journals. The resulting heat map highlighted several high-frequency keywords, including performance, education, AI, and model. The citation burst time of terms revealed that AI technologies shifted from imaging processing (2000), augmented reality (2013), and virtual reality (2016) to decision-making (2020) and model (2021). Keywords such as mortality and robotic surgery persisted into 2023, suggesting the ongoing recognition and interest in these areas. CONCLUSIONS This study provides valuable insights and guidance for researchers who are interested in educational technology, as well as recommendations for pioneering institutions and journal submissions. Along with the rapid growth of AI, medical education is expected to gain much more benefits.
Collapse
Affiliation(s)
- Rui Li
- Emergency Department, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Tong Wu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
5
|
Janumpally R, Nanua S, Ngo A, Youens K. Generative artificial intelligence in graduate medical education. Front Med (Lausanne) 2025; 11:1525604. [PMID: 39867924 PMCID: PMC11758457 DOI: 10.3389/fmed.2024.1525604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Accepted: 12/23/2024] [Indexed: 01/28/2025] Open
Abstract
Generative artificial intelligence (GenAI) is rapidly transforming various sectors, including healthcare and education. This paper explores the potential opportunities and risks of GenAI in graduate medical education (GME). We review the existing literature and provide commentary on how GenAI could impact GME, including five key areas of opportunity: electronic health record (EHR) workload reduction, clinical simulation, individualized education, research and analytics support, and clinical decision support. We then discuss significant risks, including inaccuracy and overreliance on AI-generated content, challenges to authenticity and academic integrity, potential biases in AI outputs, and privacy concerns. As GenAI technology matures, it will likely come to have an important role in the future of GME, but its integration should be guided by a thorough understanding of both its benefits and limitations.
Collapse
Affiliation(s)
| | | | | | - Kenneth Youens
- Clinical Informatics Fellowship Program, Baylor Scott & White Health, Round Rock, TX, United States
| |
Collapse
|
6
|
Gupta N, Khatri K, Malik Y, Lakhani A, Kanwal A, Aggarwal S, Dahuja A. Exploring prospects, hurdles, and road ahead for generative artificial intelligence in orthopedic education and training. BMC MEDICAL EDUCATION 2024; 24:1544. [PMID: 39732679 DOI: 10.1186/s12909-024-06592-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 12/20/2024] [Indexed: 12/30/2024]
Abstract
Generative Artificial Intelligence (AI), characterized by its ability to generate diverse forms of content including text, images, video and audio, has revolutionized many fields, including medical education. Generative AI leverages machine learning to create diverse content, enabling personalized learning, enhancing resource accessibility, and facilitating interactive case studies. This narrative review explores the integration of generative artificial intelligence (AI) into orthopedic education and training, highlighting its potential, current challenges, and future trajectory. A review of recent literature was conducted to evaluate the current applications, identify potential benefits, and outline limitations of integrating generative AI in orthopedic education. Key findings indicate that generative AI holds substantial promise in enhancing orthopedic training through its various applications such as providing real-time explanations, adaptive learning materials tailored to individual student's specific needs, and immersive virtual simulations. However, despite its potential, the integration of generative AI into orthopedic education faces significant issues such as accuracy, bias, inconsistent outputs, ethical and regulatory concerns and the critical need for human oversight. Although generative AI models such as ChatGPT and others have shown impressive capabilities, their current performance on orthopedic exams remains suboptimal, highlighting the need for further development to match the complexity of clinical reasoning and knowledge application. Future research should focus on addressing these challenges through ongoing research, optimizing generative AI models for medical content, exploring best practices for ethical AI usage, curriculum integration and evaluating the long-term impact of these technologies on learning outcomes. By expanding AI's knowledge base, refining its ability to interpret clinical images, and ensuring reliable, unbiased outputs, generative AI holds the potential to revolutionize orthopedic education. This work aims to provides a framework for incorporating generative AI into orthopedic curricula to create a more effective, engaging, and adaptive learning environment for future orthopedic practitioners.
Collapse
Affiliation(s)
- Nikhil Gupta
- Department of Pharmacology, All India Institute of Medical Sciences, Bathinda, Punjab, 151001, India
| | - Kavin Khatri
- Department of Orthopedics, Postgraduate Institute of Medical Education and Research (PGIMER) Satellite Centre, Sangrur, Punjab, 148001, India.
| | - Yogender Malik
- Department of Forensic Medicine and Toxicology, Bhagat Phool Singh Govt Medical College for Women, Khanpur Kalan, Sonepat, Haryana, 131305, India
| | - Amit Lakhani
- Department of Orthopedics, Dr B.R. Ambedkar State Institute of Medical Sciences, Mohali, Punjab, 160055, India
| | - Abhinav Kanwal
- Department of Pharmacology, All India Institute of Medical Sciences, Bathinda, Punjab, 151001, India.
| | - Sameer Aggarwal
- Department of Orthopedics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, 160012, India
| | - Anshul Dahuja
- Department of Orthopedics, Guru Gobind Singh Medical College and Hospital, Faridkot, Punjab, 151203, India
| |
Collapse
|
7
|
Sorin V, Brin D, Barash Y, Konen E, Charney A, Nadkarni G, Klang E. Large Language Models and Empathy: Systematic Review. J Med Internet Res 2024; 26:e52597. [PMID: 39661968 DOI: 10.2196/52597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 02/27/2024] [Accepted: 10/20/2024] [Indexed: 12/13/2024] Open
Abstract
BACKGROUND Empathy, a fundamental aspect of human interaction, is characterized as the ability to experience another being's emotions within oneself. In health care, empathy is a fundamental for health care professionals and patients' interaction. It is a unique quality to humans that large language models (LLMs) are believed to lack. OBJECTIVE We aimed to review the literature on the capacity of LLMs in demonstrating empathy. METHODS We conducted a literature search on MEDLINE, Google Scholar, PsyArXiv, medRxiv, and arXiv between December 2022 and February 2024. We included English-language full-length publications that evaluated empathy in LLMs' outputs. We excluded papers evaluating other topics related to emotional intelligence that were not specifically empathy. The included studies' results, including the LLMs used, performance in empathy tasks, and limitations of the models, along with studies' metadata were summarized. RESULTS A total of 12 studies published in 2023 met the inclusion criteria. ChatGPT-3.5 (OpenAI) was evaluated in all studies, with 6 studies comparing it with other LLMs such GPT-4, LLaMA (Meta), and fine-tuned chatbots. Seven studies focused on empathy within a medical context. The studies reported LLMs to exhibit elements of empathy, including emotions recognition and emotional support in diverse contexts. Evaluation metric included automatic metrics such as Recall-Oriented Understudy for Gisting Evaluation and Bilingual Evaluation Understudy, and human subjective evaluation. Some studies compared performance on empathy with humans, while others compared between different models. In some cases, LLMs were observed to outperform humans in empathy-related tasks. For example, ChatGPT-3.5 was evaluated for its responses to patients' questions from social media, where ChatGPT's responses were preferred over those of humans in 78.6% of cases. Other studies used subjective readers' assigned scores. One study reported a mean empathy score of 1.84-1.9 (scale 0-2) for their fine-tuned LLM, while a different study evaluating ChatGPT-based chatbots reported a mean human rating of 3.43 out of 4 for empathetic responses. Other evaluations were based on the level of the emotional awareness scale, which was reported to be higher for ChatGPT-3.5 than for humans. Another study evaluated ChatGPT and GPT-4 on soft-skills questions in the United States Medical Licensing Examination, where GPT-4 answered 90% of questions correctly. Limitations were noted, including repetitive use of empathic phrases, difficulty following initial instructions, overly lengthy responses, sensitivity to prompts, and overall subjective evaluation metrics influenced by the evaluator's background. CONCLUSIONS LLMs exhibit elements of cognitive empathy, recognizing emotions and providing emotionally supportive responses in various contexts. Since social skills are an integral part of intelligence, these advancements bring LLMs closer to human-like interactions and expand their potential use in applications requiring emotional intelligence. However, there remains room for improvement in both the performance of these models and the evaluation strategies used for assessing soft skills.
Collapse
Affiliation(s)
- Vera Sorin
- Department of Radiology, Mayo Clinic, Rochester, MN, United States
| | - Dana Brin
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat Gan, Israel
- The Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Yiftach Barash
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat Gan, Israel
- The Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Sheba Medical Center, Ramat Gan, Israel
- The Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Alexander Charney
- Division of Data-Driven and Digital Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Girish Nadkarni
- Division of Data-Driven and Digital Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
8
|
Kıyak YS. Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots. MEDICAL SCIENCE EDUCATOR 2024; 34:1571-1576. [PMID: 39758489 PMCID: PMC11699172 DOI: 10.1007/s40670-024-02146-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/08/2024] [Indexed: 01/07/2025]
Abstract
The integration of artificial intelligence (AI) chatbots, especially large language models (LLMs), holds significant potential for medical education. This article provides ten tips to help medical educators who have limited experience using LLM-based chatbots to support their teaching and assessment practices. These tips cover critical areas such as selecting appropriate models, employing prompt engineering techniques, and optimizing chatbot outputs to meet educational needs. By following these tips, medical educators can leverage the capabilities of AI chatbots to improve the learning experience of students.
Collapse
Affiliation(s)
- Yavuz Selim Kıyak
- Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Gazi Üniversitesi Hastanesi E Blok 9. Kat, 06500 Beşevler, Ankara, Turkey
| |
Collapse
|
9
|
Lucas HC, Upperman JS, Robinson JR. A systematic review of large language models and their implications in medical education. MEDICAL EDUCATION 2024; 58:1276-1285. [PMID: 38639098 DOI: 10.1111/medu.15402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/20/2024]
Abstract
INTRODUCTION In the past year, the use of large language models (LLMs) has generated significant interest and excitement because of their potential to revolutionise various fields, including medical education for aspiring physicians. Although medical students undergo a demanding educational process to become competent health care professionals, the emergence of LLMs presents a promising solution to challenges like information overload, time constraints and pressure on clinical educators. However, integrating LLMs into medical education raises critical concerns and challenges for educators, professionals and students. This systematic review aims to explore LLM applications in medical education, specifically their impact on medical students' learning experiences. METHODS A systematic search was performed in PubMed, Web of Science and Embase for articles discussing the applications of LLMs in medical education using selected keywords related to LLMs and medical education, from the time of ChatGPT's debut until February 2024. Only articles available in full text or English were reviewed. The credibility of each study was critically appraised by two independent reviewers. RESULTS The systematic review identified 166 studies, of which 40 were found by review to be relevant to the study. Among the 40 relevant studies, key themes included LLM capabilities, benefits such as personalised learning and challenges regarding content accuracy. Importantly, 42.5% of these studies specifically evaluated LLMs in a novel way, including ChatGPT, in contexts such as medical exams and clinical/biomedical information, highlighting their potential in replicating human-level performance in medical knowledge. The remaining studies broadly discussed the prospective role of LLMs in medical education, reflecting a keen interest in their future potential despite current constraints. CONCLUSIONS The responsible implementation of LLMs in medical education offers a promising opportunity to enhance learning experiences. However, ensuring information accuracy, emphasising skill-building and maintaining ethical safeguards are crucial. Continuous critical evaluation and interdisciplinary collaboration are essential for the appropriate integration of LLMs in medical education.
Collapse
Affiliation(s)
| | - Jeffrey S Upperman
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jamie R Robinson
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
10
|
Jin HK, Lee HE, Kim E. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC MEDICAL EDUCATION 2024; 24:1013. [PMID: 39285377 PMCID: PMC11406751 DOI: 10.1186/s12909-024-05944-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 08/22/2024] [Indexed: 09/19/2024]
Abstract
BACKGROUND ChatGPT, a recently developed artificial intelligence (AI) chatbot, has demonstrated improved performance in examinations in the medical field. However, thus far, an overall evaluation of the potential of ChatGPT models (ChatGPT-3.5 and GPT-4) in a variety of national health licensing examinations is lacking. This study aimed to provide a comprehensive assessment of the ChatGPT models' performance in national licensing examinations for medical, pharmacy, dentistry, and nursing research through a meta-analysis. METHODS Following the PRISMA protocol, full-text articles from MEDLINE/PubMed, EMBASE, ERIC, Cochrane Library, Web of Science, and key journals were reviewed from the time of ChatGPT's introduction to February 27, 2024. Studies were eligible if they evaluated the performance of a ChatGPT model (ChatGPT-3.5 or GPT-4); related to national licensing examinations in the fields of medicine, pharmacy, dentistry, or nursing; involved multiple-choice questions; and provided data that enabled the calculation of effect size. Two reviewers independently completed data extraction, coding, and quality assessment. The JBI Critical Appraisal Tools were used to assess the quality of the selected articles. Overall effect size and 95% confidence intervals [CIs] were calculated using a random-effects model. RESULTS A total of 23 studies were considered for this review, which evaluated the accuracy of four types of national licensing examinations. The selected articles were in the fields of medicine (n = 17), pharmacy (n = 3), nursing (n = 2), and dentistry (n = 1). They reported varying accuracy levels, ranging from 36 to 77% for ChatGPT-3.5 and 64.4-100% for GPT-4. The overall effect size for the percentage of accuracy was 70.1% (95% CI, 65-74.8%), which was statistically significant (p < 0.001). Subgroup analyses revealed that GPT-4 demonstrated significantly higher accuracy in providing correct responses than its earlier version, ChatGPT-3.5. Additionally, in the context of health licensing examinations, the ChatGPT models exhibited greater proficiency in the following order: pharmacy, medicine, dentistry, and nursing. However, the lack of a broader set of questions, including open-ended and scenario-based questions, and significant heterogeneity were limitations of this meta-analysis. CONCLUSIONS This study sheds light on the accuracy of ChatGPT models in four national health licensing examinations across various countries and provides a practical basis and theoretical support for future research. Further studies are needed to explore their utilization in medical and health education by including a broader and more diverse range of questions, along with more advanced versions of AI chatbots.
Collapse
Affiliation(s)
- Hye Kyung Jin
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea
| | - Ha Eun Lee
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea
| | - EunYoung Kim
- Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea.
- Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea.
- Division of Licensing of Medicines and Regulatory Science, The Graduate School of Pharmaceutical Management, and Regulatory Science Policy, The Graduate School of Pharmaceutical Regulatory Sciences, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea.
| |
Collapse
|
11
|
Holderried F, Stegemann-Philipps C, Herrmann-Werner A, Festl-Wietek T, Holderried M, Eickhoff C, Mahling M. A Language Model-Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study. JMIR MEDICAL EDUCATION 2024; 10:e59213. [PMID: 39150749 PMCID: PMC11364946 DOI: 10.2196/59213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 08/17/2024]
Abstract
BACKGROUND Although history taking is fundamental for diagnosing medical conditions, teaching and providing feedback on the skill can be challenging due to resource constraints. Virtual simulated patients and web-based chatbots have thus emerged as educational tools, with recent advancements in artificial intelligence (AI) such as large language models (LLMs) enhancing their realism and potential to provide feedback. OBJECTIVE In our study, we aimed to evaluate the effectiveness of a Generative Pretrained Transformer (GPT) 4 model to provide structured feedback on medical students' performance in history taking with a simulated patient. METHODS We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot. To that end, we designed a chatbot to simulate patients' responses and provide immediate feedback on the comprehensiveness of the students' history taking. Students' interactions with the chatbot were analyzed, and feedback from the chatbot was compared with feedback from a human rater. We measured interrater reliability and performed a descriptive analysis to assess the quality of feedback. RESULTS Most of the study's participants were in their third year of medical school. A total of 1894 question-answer pairs from 106 conversations were included in our analysis. GPT-4's role-play and responses were medically plausible in more than 99% of cases. Interrater reliability between GPT-4 and the human rater showed "almost perfect" agreement (Cohen κ=0.832). Less agreement (κ<0.6) detected for 8 out of 45 feedback categories highlighted topics about which the model's assessments were overly specific or diverged from human judgement. CONCLUSIONS The GPT model was effective in providing structured feedback on history-taking dialogs provided by medical students. Although we unraveled some limitations regarding the specificity of feedback for certain feedback categories, the overall high agreement with human raters suggests that LLMs can be a valuable tool for medical education. Our findings, thus, advocate the careful integration of AI-driven feedback mechanisms in medical training and highlight important aspects when LLMs are used in that context.
Collapse
Affiliation(s)
- Friederike Holderried
- Tübingen Institute for Medical Education (TIME), Medical Faculty, University of Tübingen, Tübingen, Germany
| | | | - Anne Herrmann-Werner
- Tübingen Institute for Medical Education (TIME), Medical Faculty, University of Tübingen, Tübingen, Germany
| | - Teresa Festl-Wietek
- Tübingen Institute for Medical Education (TIME), Medical Faculty, University of Tübingen, Tübingen, Germany
| | - Martin Holderried
- Department of Medical Development, Process and Quality Management, University Hospital Tübingen, Tübingen, Germany
| | - Carsten Eickhoff
- Institute for Applied Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Moritz Mahling
- Tübingen Institute for Medical Education (TIME), Medical Faculty, University of Tübingen, Tübingen, Germany
- Department of Medical Development, Process and Quality Management, University Hospital Tübingen, Tübingen, Germany
| |
Collapse
|
12
|
Tong W, Zhang X, Zeng H, Pan J, Gong C, Zhang H. Reforming China's Secondary Vocational Medical Education: Adapting to the Challenges and Opportunities of the AI Era. JMIR MEDICAL EDUCATION 2024; 10:e48594. [PMID: 39149865 PMCID: PMC11337726 DOI: 10.2196/48594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/03/2024] [Accepted: 06/11/2024] [Indexed: 08/17/2024]
Abstract
Unlabelled China's secondary vocational medical education is essential for training primary health care personnel and enhancing public health responses. This education system currently faces challenges, primarily due to its emphasis on knowledge acquisition that overshadows the development and application of skills, especially in the context of emerging artificial intelligence (AI) technologies. This article delves into the impact of AI on medical practices and uses this analysis to suggest reforms for the vocational medical education system in China. AI is found to significantly enhance diagnostic capabilities, therapeutic decision-making, and patient management. However, it also brings about concerns such as potential job losses and necessitates the adaptation of medical professionals to new technologies. Proposed reforms include a greater focus on critical thinking, hands-on experiences, skill development, medical ethics, and integrating humanities and AI into the curriculum. These reforms require ongoing evaluation and sustained research to effectively prepare medical students for future challenges in the field.
Collapse
Affiliation(s)
- Wenting Tong
- Department of Pharmacy, Gannan Healthcare Vocational College, Ganzhou, China
| | - Xiaowen Zhang
- Department of Rehabilitation and Elderly Care, Gannan Healthcare Vocational College, Ganzhou, China
| | - Haiping Zeng
- Department of Gastrointestinal Surgery, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, China
- Department of Gastrointestinal Surgery, First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Jianping Pan
- Scientific Research Division, Gannan Healthcare Vocational College, Ganzhou, China
| | - Chao Gong
- Student Work Division, Gannan Healthcare Vocational College, Ganzhou, China
| | - Hui Zhang
- Department of Rehabilitation and Elderly Care, Gannan Healthcare Vocational College, Ganzhou, China
- Department of Infertility and Sexual Medicine, Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
- Department of Urology, Dongguan Hospital Affiliated to Guangzhou University of Chinese Medicine, 22 Songshanhu Avenue, Guangdong Province, Dongguan, 523080, China, 86 0769 2638 5365
| |
Collapse
|
13
|
Burke-Garcia A, Soskin Hicks R. Scaling the Idea of Opinion Leadership to Address Health Misinformation: The Case for "Health Communication AI". JOURNAL OF HEALTH COMMUNICATION 2024; 29:396-399. [PMID: 38832662 DOI: 10.1080/10810730.2024.2357575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
There is strong evidence of the impact of opinion leaders in health promotion programs. Early work by Burke-Garcia suggests that social media influencers are the opinion leaders of the digital age as they come from the communities they influence, have built trust with them, and may be useful in combating misinformation by disseminating credible and timely health information and prompting consideration of health behaviors. AI has contributed to the spread of misinformation, but it can also be a vital part of the solution, informing and educating in real time and at scale. Personalized, empathetic messaging is crucial, though, and research supports that individuals are drawn to empathetic AI responses and prefer them to human responses in some digital environments. This mimics what we know about influencers and how they approach communicating with their followers. Blending what we know about social media influencers as opinion leaders with the power and scale of AI can enable us to address the spread of misinformation. This paper reviews the knowledge base and proposes the development of something we term "Health Communication AI" - perhaps the newest form of opinion leader - to fight health misinformation.
Collapse
Affiliation(s)
- A Burke-Garcia
- Public Health Department, NORC at the University of Chicago, Bethesda, Maryland, USA
| | - R Soskin Hicks
- Public Health Department, NORC at the University of Chicago, Bethesda, Maryland, USA
| |
Collapse
|
14
|
Preiksaitis C, Ashenburg N, Bunney G, Chu A, Kabeer R, Riley F, Ribeira R, Rose C. The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review. JMIR Med Inform 2024; 12:e53787. [PMID: 38728687 PMCID: PMC11127144 DOI: 10.2196/53787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/20/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM. OBJECTIVE Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field. METHODS Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data. RESULTS A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills. CONCLUSIONS LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians' AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied.
Collapse
Affiliation(s)
- Carl Preiksaitis
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Nicholas Ashenburg
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Gabrielle Bunney
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Andrew Chu
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Rana Kabeer
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Fran Riley
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Ryan Ribeira
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Christian Rose
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| |
Collapse
|
15
|
Bonfitto GR, Roletto A, Savardi M, Fasulo SV, Catania D, Signoroni A. Harnessing ChatGPT dialogues to address claustrophobia in MRI - A radiographers' education perspective. Radiography (Lond) 2024; 30:737-744. [PMID: 38428198 DOI: 10.1016/j.radi.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 03/03/2024]
Abstract
INTRODUCTION The healthcare sector invests significantly in communication skills training, but not always with satisfactory results. Recently, generative Large Language Models, have shown promising results in medical education. This study aims to use ChatGPT to simulate radiographer-patient conversations about the critical moment of claustrophobia management during MRI, exploring how Artificial Intelligence can improve radiographers' communication skills. METHODS This study exploits specifically designed prompts on ChatGPT-3.5 and ChatGPT-4 to generate simulated conversations between virtual claustrophobic patients and six radiographers with varying levels of work experience focusing on their differences in model size and language generation capabilities. Success rates and responses were analysed. The methods of radiographers in convincing virtual patients to undergo MRI despite claustrophobia were also evaluated. RESULTS A total of 60 simulations were conducted, achieving a success rate of 96.7% (58/60). ChatGPT-3.5 exhibited errors in 40% (12/30) of the simulations, while ChatGPT-4 showed no errors. In terms of radiographers' communication during the simulations, out of 164 responses, 70.2% (115/164) were categorized as "Supportive Instructions," followed by "Music Therapy" at 18.3% (30/164). Experts mainly used "Supportive Instructions" (82.2%, 51/62) and "Breathing Techniques" (9.7%, 6/62). Intermediate participants favoured "Music Therapy" (26%, 13/50), while Beginner participants frequently utilized "Mild Sedation" (15.4%, 8/52). CONCLUSION The simulation of clinical scenarios via ChatGPT proves valuable in assessing and testing radiographers' communication skills, especially in managing claustrophobic patients during MRI. This pilot study highlights the potential of ChatGPT in preclinical training, recognizing different training needs at different levels of professional experience. IMPLICATIONS FOR PRACTICE This study is relevant in radiography practice, where AI is increasingly widespread, as it explores a new way to improve the training of radiographers.
Collapse
Affiliation(s)
- G R Bonfitto
- Department of Information Engineering, University of Brescia, Via Branze 38, 25123 Brescia, Italy; IRCCS Ospedale San Raffaele, Via Olgettina 60, 20132 Milano, Italy.
| | - A Roletto
- Department of Mechanical and Industrial Engineering, Università degli Studi di Brescia, Via Branze 38, 25123 Brescia, Italy; IRCCS Ospedale San Raffaele, Via Olgettina 60, 20132 Milano, Italy.
| | - M Savardi
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Viale Europa 11, 25121, Brescia, Italy.
| | - S V Fasulo
- IRCCS Ospedale San Raffaele, Via Olgettina 60, 20132 Milano, Italy.
| | - D Catania
- IRCCS Ospedale San Raffaele, Via Olgettina 60, 20132 Milano, Italy.
| | - A Signoroni
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Viale Europa 11, 25121, Brescia, Italy.
| |
Collapse
|
16
|
Rengers TA, Thiels CA, Salehinejad H. Academic Surgery in the Era of Large Language Models: A Review. JAMA Surg 2024; 159:445-450. [PMID: 38353991 DOI: 10.1001/jamasurg.2023.6496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Importance This review aims to assess the benefits and risks of implementing large language model (LLM) solutions in an academic surgical setting. Observations The integration of LLMs and artificial intelligence (AI) into surgical practice has generated international attention with the emergence of OpenAI's ChatGPT and Google's Bard. From an administrative standpoint, LLMs have the potential to revolutionize academic practices by reducing administrative burdens and improving efficiency. LLMs have the potential to facilitate surgical research by increasing writing efficiency, building predictive models, and aiding in large dataset analysis. From a clinical standpoint, LLMs can enhance efficiency by triaging patient concerns and generating automated responses. However, challenges exist, such as the need for improved LLM generalization performance, validating content, and addressing ethical concerns. In addition, patient privacy, potential bias in training, and legal responsibility are important considerations that require attention. Research and precautionary measures are necessary to ensure safe and unbiased use of LLMs in surgery. Conclusions and Relevance Although limitations exist, LLMs hold promise for enhancing surgical efficiency while still prioritizing patient care. The authors recommend that the academic surgical community further investigate the potential applications of LLMs while being cautious about potential harms.
Collapse
Affiliation(s)
- Timothy A Rengers
- Mayo Clinic Alix School of Medicine, Mayo Clinic, Rochester, Minnesota
| | - Cornelius A Thiels
- Division of Hepatobiliary and Pancreas Surgery, Mayo Clinic, Rochester, Minnesota
| | - Hojjat Salehinejad
- Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
17
|
Gordon M, Daniel M, Ajiboye A, Uraiby H, Xu NY, Bartlett R, Hanson J, Haas M, Spadafore M, Grafton-Clarke C, Gasiea RY, Michie C, Corral J, Kwan B, Dolmans D, Thammasitboon S. A scoping review of artificial intelligence in medical education: BEME Guide No. 84. MEDICAL TEACHER 2024; 46:446-470. [PMID: 38423127 DOI: 10.1080/0142159x.2024.2314198] [Citation(s) in RCA: 59] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/31/2024] [Indexed: 03/02/2024]
Abstract
BACKGROUND Artificial Intelligence (AI) is rapidly transforming healthcare, and there is a critical need for a nuanced understanding of how AI is reshaping teaching, learning, and educational practice in medical education. This review aimed to map the literature regarding AI applications in medical education, core areas of findings, potential candidates for formal systematic review and gaps for future research. METHODS This rapid scoping review, conducted over 16 weeks, employed Arksey and O'Malley's framework and adhered to STORIES and BEME guidelines. A systematic and comprehensive search across PubMed/MEDLINE, EMBASE, and MedEdPublish was conducted without date or language restrictions. Publications included in the review spanned undergraduate, graduate, and continuing medical education, encompassing both original studies and perspective pieces. Data were charted by multiple author pairs and synthesized into various thematic maps and charts, ensuring a broad and detailed representation of the current landscape. RESULTS The review synthesized 278 publications, with a majority (68%) from North American and European regions. The studies covered diverse AI applications in medical education, such as AI for admissions, teaching, assessment, and clinical reasoning. The review highlighted AI's varied roles, from augmenting traditional educational methods to introducing innovative practices, and underscores the urgent need for ethical guidelines in AI's application in medical education. CONCLUSION The current literature has been charted. The findings underscore the need for ongoing research to explore uncharted areas and address potential risks associated with AI use in medical education. This work serves as a foundational resource for educators, policymakers, and researchers in navigating AI's evolving role in medical education. A framework to support future high utility reporting is proposed, the FACETS framework.
Collapse
Affiliation(s)
- Morris Gordon
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
- Blackpool Hospitals NHS Foundation Trust, Blackpool, UK
| | - Michelle Daniel
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Aderonke Ajiboye
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
| | - Hussein Uraiby
- Department of Cellular Pathology, University Hospitals of Leicester NHS Trust, Leicester, UK
| | - Nicole Y Xu
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Rangana Bartlett
- Department of Cognitive Science, University of California, San Diego, CA, USA
| | - Janice Hanson
- Department of Medicine and Office of Education, School of Medicine, Washington University in Saint Louis, Saint Louis, MO, USA
| | - Mary Haas
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Maxwell Spadafore
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | | | - Colin Michie
- School of Medicine and Dentistry, University of Central Lancashire, Preston, UK
| | - Janet Corral
- Department of Medicine, University of Nevada Reno, School of Medicine, Reno, NV, USA
| | - Brian Kwan
- School of Medicine, University of California, San Diego, SanDiego, CA, USA
| | - Diana Dolmans
- School of Health Professions Education, Faculty of Health, Maastricht University, Maastricht, NL, USA
| | - Satid Thammasitboon
- Center for Research, Innovation and Scholarship in Health Professions Education, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
18
|
Xu X, Chen Y, Miao J. Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review. JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS 2024; 21:6. [PMID: 38486402 PMCID: PMC11035906 DOI: 10.3352/jeehp.2024.21.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 03/05/2024] [Indexed: 03/19/2024]
Abstract
BACKGROUND ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored. METHODS A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included. RESULTS ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students. CONCLUSION ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.
Collapse
Affiliation(s)
- Xiaojun Xu
- Division of Hematology/Oncology, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Centre for Child Health, Zhejiang, China
| | - Yixiao Chen
- Division of Hematology/Oncology, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Centre for Child Health, Zhejiang, China
| | - Jing Miao
- Division of Hematology/Oncology, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Centre for Child Health, Zhejiang, China
| |
Collapse
|
19
|
Mu Y, He D. The Potential Applications and Challenges of ChatGPT in the Medical Field. Int J Gen Med 2024; 17:817-826. [PMID: 38476626 PMCID: PMC10929156 DOI: 10.2147/ijgm.s456659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
ChatGPT, an AI-driven conversational large language model (LLM), has garnered significant scholarly attention since its inception, owing to its manifold applications in the realm of medical science. This study primarily examines the merits, limitations, anticipated developments, and practical applications of ChatGPT in clinical practice, healthcare, medical education, and medical research. It underscores the necessity for further research and development to enhance its performance and deployment. Moreover, future research avenues encompass ongoing enhancements and standardization of ChatGPT, mitigating its limitations, and exploring its integration and applicability in translational and personalized medicine. Reflecting the narrative nature of this review, a focused literature search was performed to identify relevant publications on ChatGPT's use in medicine. This process was aimed at gathering a broad spectrum of insights to provide a comprehensive overview of the current state and future prospects of ChatGPT in the medical domain. The objective is to aid healthcare professionals in understanding the groundbreaking advancements associated with the latest artificial intelligence tools, while also acknowledging the opportunities and challenges presented by ChatGPT.
Collapse
Affiliation(s)
- Yonglin Mu
- Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Dawei He
- Department of Urology, Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
20
|
Park J. Medical students' patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study. JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS 2023; 20:29. [PMID: 38096895 PMCID: PMC10725745 DOI: 10.3352/jeehp.2023.20.29] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 10/29/2023] [Indexed: 12/18/2023]
Abstract
PURPOSE This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students' perceptions of ChatGPT as a feedback tool in the classroom. METHODS The study included 99 2nd-year pre-medical students who participated in a "Leadership and Communication" course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students' perceptions of ChatGPT's feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023. RESULTS The students responded by indicating that ChatGPT's feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT's feedback was "after the first round of discussion, for revisions." There was a significant difference in satisfaction with ChatGPT's feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were "providing answers to questions" and "summarizing information," and the worst disadvantage was "producing information without supporting evidence." CONCLUSION The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.
Collapse
Affiliation(s)
- Janghee Park
- Department of Medical Education, Soonchunhyang University College of Medicine, Cheonan, Korea
| |
Collapse
|
21
|
Preiksaitis C, Rose C. Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review. JMIR MEDICAL EDUCATION 2023; 9:e48785. [PMID: 37862079 PMCID: PMC10625095 DOI: 10.2196/48785] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 07/28/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023]
Abstract
BACKGROUND Generative artificial intelligence (AI) technologies are increasingly being utilized across various fields, with considerable interest and concern regarding their potential application in medical education. These technologies, such as Chat GPT and Bard, can generate new content and have a wide range of possible applications. OBJECTIVE This study aimed to synthesize the potential opportunities and limitations of generative AI in medical education. It sought to identify prevalent themes within recent literature regarding potential applications and challenges of generative AI in medical education and use these to guide future areas for exploration. METHODS We conducted a scoping review, following the framework by Arksey and O'Malley, of English language articles published from 2022 onward that discussed generative AI in the context of medical education. A literature search was performed using PubMed, Web of Science, and Google Scholar databases. We screened articles for inclusion, extracted data from relevant studies, and completed a quantitative and qualitative synthesis of the data. RESULTS Thematic analysis revealed diverse potential applications for generative AI in medical education, including self-directed learning, simulation scenarios, and writing assistance. However, the literature also highlighted significant challenges, such as issues with academic integrity, data accuracy, and potential detriments to learning. Based on these themes and the current state of the literature, we propose the following 3 key areas for investigation: developing learners' skills to evaluate AI critically, rethinking assessment methodology, and studying human-AI interactions. CONCLUSIONS The integration of generative AI in medical education presents exciting opportunities, alongside considerable challenges. There is a need to develop new skills and competencies related to AI as well as thoughtful, nuanced approaches to examine the growing use of generative AI in medical education.
Collapse
Affiliation(s)
- Carl Preiksaitis
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| | - Christian Rose
- Department of Emergency Medicine, Stanford University School of Medicine, Palo Alto, CA, United States
| |
Collapse
|