1
|
Cook DA, Overgaard J, Pankratz VS, Del Fiol G, Aakre CA. Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback. J Med Internet Res 2025; 27:e68486. [PMID: 39854611 PMCID: PMC12008702 DOI: 10.2196/68486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 01/03/2025] [Accepted: 01/13/2025] [Indexed: 01/26/2025] Open
Abstract
BACKGROUND Virtual patients (VPs) are computer screen-based simulations of patient-clinician encounters. VP use is limited by cost and low scalability. OBJECTIVE We aimed to show that VPs powered by large language models (LLMs) can generate authentic dialogues, accurately represent patient preferences, and provide personalized feedback on clinical performance. We also explored using LLMs to rate the quality of dialogues and feedback. METHODS We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI's generative pretrained transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough diagnosis and diabetes management), each with permutations representing different patient preferences, we created 60 conversations (dialogues plus feedback): 48 with a human clinician and 12 "self-chat" dialogues with GPT role-playing both the VP and clinician. Primary outcomes were dialogue authenticity and feedback quality, rated using novel instruments for which we conducted a validation study collecting evidence of content, internal structure (reproducibility), relations with other variables, and response process. Each conversation was rated by 3 physicians and by GPT. Secondary outcomes included user experience, bias, patient preferences represented in the dialogues, and conversation features that influenced authenticity. RESULTS The average cost per conversation was US $0.51 for GPT-4.0-Turbo and US $0.02 for GPT-3.5-Turbo. Mean (SD) conversation ratings, maximum 6, were overall dialogue authenticity 4.7 (0.7), overall user experience 4.9 (0.7), and average feedback quality 4.7 (0.6). For dialogues created using GPT-4.0-Turbo, physician ratings of patient preferences aligned with intended preferences in 20 to 47 of 48 dialogues (42%-98%). Subgroup comparisons revealed higher ratings for dialogues using GPT-4.0-Turbo versus GPT-3.5-Turbo and for human-generated versus self-chat dialogues. Feedback ratings were similar for human-generated versus GPT-generated ratings, whereas authenticity ratings were lower. We did not perceive bias in any conversation. Dialogue features that detracted from authenticity included that GPT was verbose or used atypical vocabulary (93/180, 51.7% of conversations), was overly agreeable (n=56, 31%), repeated the question as part of the response (n=47, 26%), was easily convinced by clinician suggestions (n=35, 19%), or was not disaffected by poor clinician performance (n=32, 18%). For feedback, detractors included excessively positive feedback (n=42, 23%), failure to mention important weaknesses or strengths (n=41, 23%), or factual inaccuracies (n=39, 22%). Regarding validation of dialogue and feedback scores, items were meticulously developed (content evidence), and we confirmed expected relations with other variables (higher ratings for advanced LLMs and human-generated dialogues). Reproducibility was suboptimal, due largely to variation in LLM performance rather than rater idiosyncrasies. CONCLUSIONS LLM-powered VPs can simulate patient-clinician dialogues, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings.
Collapse
Affiliation(s)
- David A Cook
- Division of General Internal Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN, United States
- Multidisciplinary Simulation Center, Mayo Clinic College of Medicine and Science, Rochester, MN, United States
| | - Joshua Overgaard
- Division of General Internal Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN, United States
| | - V Shane Pankratz
- Health Sciences Center, University of New Mexico, Albuquerque, NM, United States
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Chris A Aakre
- Division of General Internal Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN, United States
| |
Collapse
|
2
|
Gin BC, O'Sullivan PS, Hauer KE, Abdulnour RE, Mackenzie M, Ten Cate O, Boscardin CK. Entrustment and EPAs for Artificial Intelligence (AI): A Framework to Safeguard the Use of AI in Health Professions Education. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2025; 100:264-272. [PMID: 39761533 DOI: 10.1097/acm.0000000000005930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
ABSTRACT In this article, the authors propose a repurposing of the concept of entrustment to help guide the use of artificial intelligence (AI) in health professions education (HPE). Entrustment can help identify and mitigate the risks of incorporating generative AI tools with limited transparency about their accuracy, source material, and disclosure of bias into HPE practice. With AI's growing role in education-related activities, like automated medical school application screening and feedback quality and content appraisal, there is a critical need for a trust-based approach to ensure these technologies are beneficial and safe. Drawing parallels with HPE's entrustment concept, which assesses a trainee's readiness to perform clinical tasks-or entrustable professional activities-the authors propose assessing the trustworthiness of AI tools to perform an HPE-related task across 3 characteristics: ability (competence to perform tasks accurately), integrity (transparency and honesty), and benevolence (alignment with ethical principles). The authors draw on existing theories of entrustment decision-making to envision a structured way to decide on AI's role and level of engagement in HPE-related tasks, including proposing an AI-specific entrustment scale. Identifying tasks that AI could be entrusted with provides a focus around which considerations of trustworthiness and entrustment decision-making may be synthesized, making explicit the risks associated with AI use and identifying strategies to mitigate these risks. Responsible, trustworthy, and ethical use of AI requires health professions educators to develop safeguards for using it in teaching, learning, and practice-guardrails that can be operationalized via applying the entrustment concept to AI. Without such safeguards, HPE practice stands to be shaped by the oncoming wave of AI innovations tied to commercial motivations, rather than modeled after HPE principles-principles rooted in the trust and transparency that are built together with trainees and patients.
Collapse
|
3
|
Hallquist E, Gupta I, Montalbano M, Loukas M. Applications of Artificial Intelligence in Medical Education: A Systematic Review. Cureus 2025; 17:e79878. [PMID: 40034416 PMCID: PMC11872247 DOI: 10.7759/cureus.79878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2025] [Indexed: 03/05/2025] Open
Abstract
Artificial intelligence (AI) models, like Chat Generative Pre-Trained Transformer (OpenAI, San Francisco, CA), have recently gained significant popularity due to their ability to make autonomous decisions and engage in complex interactions. To fully harness the potential of these learning machines, users must understand their strengths and limitations. As AI tools become increasingly prevalent in our daily lives, it is essential to explore how this technology has been used so far in healthcare and medical education, as well as the areas of medicine where it can be applied. This paper systematically reviews the published literature on the PubMed database from its inception up to June 6, 2024, focusing on studies that used AI at some level in medical education, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Several papers identified where AI was used to generate medical exam questions, produce clinical scripts for diseases, improve the diagnostic and clinical skills of students and clinicians, serve as a learning aid, and automate analysis tasks such as screening residency applications. AI shows promise at various levels and in different areas of medical education, and our paper highlights some of these areas. This review also emphasizes the importance of educators and students understanding AI's principles, capabilities, and limitations before integration. In conclusion, AI has potential in medical education, but more research needs to be done to fully explore additional areas of applications, address the current gaps in knowledge, and its future potential in training healthcare professionals.
Collapse
Affiliation(s)
- Eric Hallquist
- Department of Family Medicine, Prevea Shawano Avenue Health Center, Green Bay, USA
| | - Ishank Gupta
- Department of Anatomical Sciences, St. George's University School of Medicine, St. George, GRD
| | - Michael Montalbano
- Department of Anatomical Sciences, St. George's University School of Medicine, St. George, GRD
| | - Marios Loukas
- Department of Anatomical Sciences, St. George's University School of Medicine, St. George, GRD
- Department of Clinical Anatomy, Mayo Clinic, Rochester, USA
| |
Collapse
|
4
|
Fernandes RD, de Vries I, McEwen L, Mann S, Phillips T, Zevin B. Evaluating the Quality of Narrative Feedback for Entrustable Professional Activities in a Surgery Residency Program. Ann Surg 2024; 280:916-924. [PMID: 38660808 DOI: 10.1097/sla.0000000000006308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
OBJECTIVE To assess the quality of narrative feedback given to surgical residents during the first 5 years of competency-based medical education implementation. BACKGROUND Competency-based medical education requires ongoing formative assessments and feedback on learners' performance. METHODS We conducted a retrospective cross-sectional study using assessments of entrustable professional activities (EPAs) in the Surgical Foundations curriculum at Queen's University from 2017 to 2022. Two raters independently evaluated the quality of narrative feedback using the Quality of Assessment of Learning score (0-5). RESULTS A total of 3900 EPA assessments were completed over 5 years. Of assessments, 57% (2229/3900) had narrative feedback documented with a mean Quality of Assessment of Learning score of 2.16 ± 1.49. Of these, 1614 (72.4%) provided evidence about the resident's performance, 951 (42.7%) provided suggestions for improvement, and 499/2229 (22.4%) connected suggestions to the evidence. There was no meaningful change in narrative feedback quality over time ( r = 0.067, P = 0.002). Variables associated with lower quality of narrative feedback include: attending role (2.04 ± 1.48) compared with the medical student (3.13 ± 1.12, P < 0.001) and clinical fellow (2.47 ± 1.54, P < 0.001), concordant specialties between the assessor and learner (2.06 ± 1.50 vs 2.21 ± 1.49, P = 0.025), completion of the assessment 1 month or more after the encounter versus 1 week (1.85 ± 1.48 vs 2.23 ± 1.49, P < 0.001), and resident entrusted versus not entrusted to perform the assessed EPA (2.13 ± 1.45 vs 2.35 ± 1.66; P = 0.008). The quality of narrative feedback was similar for assessments completed under direct and indirect observation (2.18 ± 1.47 vs 2.06 ± 1.54; P = 0.153). CONCLUSIONS Just over half of the EPA assessments of surgery residents contained narrative feedback with overall fair quality. There was no meaningful change in the quality of feedback over 5 years. These findings prompt future research and faculty development.
Collapse
Affiliation(s)
| | - Ingrid de Vries
- Faculty of Education, Queen's University, Kingston, ON, Canada
| | - Laura McEwen
- Postgraduate Medical Education, Queen's University, Kingston, ON, Canada
- Department of Pediatrics, Queen's University, Kingston, ON, Canada
| | - Steve Mann
- Department of Surgery, Queen's University, Kingston, ON, Canada
| | | | - Boris Zevin
- Department of Surgery, Queen's University, Kingston, ON, Canada
| |
Collapse
|
5
|
Huang G, Li Y, Jameel S, Long Y, Papanastasiou G. From explainable to interpretable deep learning for natural language processing in healthcare: How far from reality? Comput Struct Biotechnol J 2024; 24:362-373. [PMID: 38800693 PMCID: PMC11126530 DOI: 10.1016/j.csbj.2024.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 05/29/2024] Open
Abstract
Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretable Artificial Intelligence" (XIAI) is introduced to distinguish XAI from IAI. Different models are further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms are the most prevalent emerging IAI technique. The use of IAI is growing, distinguishing it from XAI. The major challenges identified are that most XIAI does not explore "global" modelling processes, the lack of best practices, and the lack of systematic evaluation and benchmarks. One important opportunity is to use attention mechanisms to enhance multi-modal XIAI for personalized medicine. Additionally, combining DL with causal logic holds promise. Our discussion encourages the integration of XIAI in Large Language Models (LLMs) and domain-specific smaller models. In conclusion, XIAI adoption in healthcare requires dedicated in-house expertise. Collaboration with domain experts, end-users, and policymakers can lead to ready-to-use XIAI methods across NLP and medical tasks. While challenges exist, XIAI techniques offer a valuable foundation for interpretable NLP algorithms in healthcare.
Collapse
Affiliation(s)
- Guangming Huang
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Yingya Li
- Harvard Medical School and Boston Children's Hospital, Boston, 02115, United States
| | - Shoaib Jameel
- Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, United Kingdom
| | - Yunfei Long
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | | |
Collapse
|
6
|
Gin BC, Ten Cate O, O'Sullivan PS, Boscardin C. Assessing supervisor versus trainee viewpoints of entrustment through cognitive and affective lenses: an artificial intelligence investigation of bias in feedback. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024; 29:1571-1592. [PMID: 38388855 PMCID: PMC11549112 DOI: 10.1007/s10459-024-10311-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 01/21/2024] [Indexed: 02/24/2024]
Abstract
The entrustment framework redirects assessment from considering only trainees' competence to decision-making about their readiness to perform clinical tasks independently. Since trainees and supervisors both contribute to entrustment decisions, we examined the cognitive and affective factors that underly their negotiation of trust, and whether trainee demographic characteristics may bias them. Using a document analysis approach, we adapted large language models (LLMs) to examine feedback dialogs (N = 24,187, each with an associated entrustment rating) between medical student trainees and their clinical supervisors. We compared how trainees and supervisors differentially documented feedback dialogs about similar tasks by identifying qualitative themes and quantitatively assessing their correlation with entrustment ratings. Supervisors' themes predominantly reflected skills related to patient presentations, while trainees' themes were broader-including clinical performance and personal qualities. To examine affect, we trained an LLM to measure feedback sentiment. On average, trainees used more negative language (5.3% lower probability of positive sentiment, p < 0.05) compared to supervisors, while documenting higher entrustment ratings (+ 0.08 on a 1-4 scale, p < 0.05). We also found biases tied to demographic characteristics: trainees' documentation reflected more positive sentiment in the case of male trainees (+ 1.3%, p < 0.05) and of trainees underrepresented in medicine (UIM) (+ 1.3%, p < 0.05). Entrustment ratings did not appear to reflect these biases, neither when documented by trainee nor supervisor. As such, bias appeared to influence the emotive language trainees used to document entrustment more than the degree of entrustment they experienced. Mitigating these biases is nonetheless important because they may affect trainees' assimilation into their roles and formation of trusting relationships.
Collapse
Affiliation(s)
- Brian C Gin
- Department of Pediatrics, University of California San Francisco, 550 16th St Floor 4, UCSF Box 0110, San Francisco, CA, 94158, USA.
| | - Olle Ten Cate
- Utrecht Center for Research and Development of Health Professions Education, University Medical Center, Utrecht, the Netherlands
- Department of Medicine, University of California San Francisco, San Francisco, USA
| | - Patricia S O'Sullivan
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Department of Surgery, University of California San Francisco, San Francisco, USA
| | - Christy Boscardin
- Department of Medicine, University of California San Francisco, San Francisco, USA
- Department of Anesthesia, University of California San Francisco, San Francisco, USA
| |
Collapse
|
7
|
Dabbagh A, Madadi F, Larijani B. Role of AI in Competency-Based Medical Education: Using EPA as the Magicbox. ARCHIVES OF IRANIAN MEDICINE 2024; 27:633-635. [PMID: 39534999 PMCID: PMC11558609 DOI: 10.34172/aim.31795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/18/2024] [Indexed: 11/16/2024]
Affiliation(s)
- Ali Dabbagh
- Anesthesiology Research Center, Shahid Modarres Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Anesthesiology, School of Medicine, Shahid Modarres Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Firoozeh Madadi
- Anesthesiology Research Center, Shahid Modarres Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Anesthesiology, School of Medicine, Ayatollah Taleghani Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Bagher Larijani
- Endocrinology Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
8
|
Kerth JL, Bosse HM. The Future of Postgraduate Training? An Update on the Use of Entrustable Professional Activities in Pediatric Professional Training. Acad Pediatr 2024; 24:1035-1037. [PMID: 38971522 DOI: 10.1016/j.acap.2024.06.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 06/05/2024] [Accepted: 06/28/2024] [Indexed: 07/08/2024]
Affiliation(s)
- Janna-Lina Kerth
- Department of General Pediatrics, Pediatric Cardiology and Neonatology, Medical Faculty, University Children's Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany.
| | - Hans Martin Bosse
- Department of General Pediatrics, Pediatric Cardiology and Neonatology, Medical Faculty, University Children's Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
9
|
de Laat JM, van der Horst-Schrivers AN, Appelman-Dijkstra NM, Bisschop PH, Dreijerink KM, Drent ML, van de Klauw MM, de Ranitz WL, Stades AM, Stikkelbroeck NM, Timmers HJ, ten Cate O. Assessment of Entrustable Professional Activities Among Dutch Endocrine Supervisors. JOURNAL OF CME 2024; 13:2360137. [PMID: 38831939 PMCID: PMC11146265 DOI: 10.1080/28338073.2024.2360137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 05/19/2024] [Indexed: 06/05/2024]
Abstract
Entrustable Professional Activities (EPAs) are an important tool to support individualisation of medical training in a competency-based setting and are increasingly implemented in the clinical speciality training for endocrinologist. This study aims to assess interrater agreement and factors that potentially impact EPA scores. Five known factors that affect entrustment decisions in health profesions training (capability, integrity, reliability, humility, agency) were used in this study. A case-vignette study using standardised written cases. Case vignettes (n = 6) on the topics thyroid disease, pituitary disease, adrenal disease, calcium and bone disorders, diabetes mellitus, and gonadal disorders were written by two endocrinologists and a medical education expert and assessed by endocrinologists experienced in the supervision of residents in training. Primary outcome is the inter-rater agreement of entrustment decisions for endocrine EPAs among raters. Secondary outcomes included the dichotomous interrater agreement (entrusted vs. non-entrusted), and an exploration of factors that impact decision-making. The study protocol was registered and approved by the Ethical Review Board of the Netherlands Association for Medical Education (NVMO-ERB # 2020.2.5). Nine endocrinologists from six different academic regions participated. Overall, the Fleiss Kappa measure of agreement for the EPA level was 0.11 (95% CI: 0.03-0.22) and for the entrustment decision 0.24 (95% CI 0.11-0.37). Of the five features that impacted the entrustment decision, capability was ranked as the most important by a majority of raters (56%-67%) in every case. There is a considerable discrepancy between the EPA levels assigned by different raters. These findings emphasise the need to base entrustment decisions on multiple observations, made by a team of supervisors and enriched with factors other than direct medical competence.
Collapse
Affiliation(s)
- Joanne M. de Laat
- Department of Internal Medicine, Division of Endocrinology, Radboud University Medical Center, Nijmegen, The Netherlands
- Utrecht Center for Research and Development of Health Professions Education, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | - Peter H. Bisschop
- Department of Endocrinology and Metabolism, Amsterdam UMC, Location Academic Medical Center, Amsterdam, The Netherlands
| | - Koen M.A. Dreijerink
- Department of Internal Medicine, Amsterdam UMC, Location VU University Medical Center, Amsterdam, The Netherlands
| | - Madeleine L. Drent
- Department of Internal Medicine, Amsterdam UMC, Location VU University Medical Center, Amsterdam, The Netherlands
| | - Melanie M. van de Klauw
- Department of Endocrinology, University Medical Center Groningen, Groningen, The Netherlands
| | - Wendela L. de Ranitz
- Department of Endocrinology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Aline M.E. Stades
- Department of Endocrinology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Nike M.M.L. Stikkelbroeck
- Department of Internal Medicine, Division of Endocrinology, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Henri J.L.M. Timmers
- Department of Internal Medicine, Division of Endocrinology, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Olle ten Cate
- Utrecht Center for Research and Development of Health Professions Education, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
10
|
Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156. MEDICAL TEACHER 2023; 45:565-573. [PMID: 36862064 DOI: 10.1080/0142159x.2023.2180340] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The use of Artificial Intelligence (AI) in medical education has the potential to facilitate complicated tasks and improve efficiency. For example, AI could help automate assessment of written responses, or provide feedback on medical image interpretations with excellent reliability. While applications of AI in learning, instruction, and assessment are growing, further exploration is still required. There exist few conceptual or methodological guides for medical educators wishing to evaluate or engage in AI research. In this guide, we aim to: 1) describe practical considerations involved in reading and conducting studies in medical education using AI, 2) define basic terminology and 3) identify which medical education problems and data are ideally-suited for using AI.
Collapse
Affiliation(s)
- Martin G Tolsgaard
- Copenhagen Academy for Medical Education and Simulation (CAMES), Copenhagen, Denmark
- Department of Obstetrics, Copenhagen University Hospital Rigshospitalet, Copenhagen, Denmark
| | - Martin V Pusic
- Department of Pediatrics, Harvard University, Boston, MA, USA
| | | | - Brian Gin
- Department of Pediatrics, University of California San Francisco, San Francisco, USA
| | - Morten Bo Svendsen
- Copenhagen Academy for Medical Education and Simulation (CAMES), Copenhagen, Denmark
| | - Mark D Syer
- School of Computing, Queen's University, Kingston, Canada
| | - Ryan Brydges
- Allan Waters Family Simulation Centre, St. Michael's Hospital, Unity Health Toronto & Department of Medicine, University of Toronto, Toronto, Canada
| | | | - Christy K Boscardin
- Department of Medicine and Anesthesia, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
11
|
Masters K. Ethical use of Artificial Intelligence in Health Professions Education: AMEE Guide No. 158. MEDICAL TEACHER 2023; 45:574-584. [PMID: 36912253 DOI: 10.1080/0142159x.2023.2186203] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Health Professions Education (HPE) has benefitted from the advances in Artificial Intelligence (AI) and is set to benefit more in the future. Just as any technological advance opens discussions about ethics, so the implications of AI for HPE ethics need to be identified, anticipated, and accommodated so that HPE can utilise AI without compromising crucial ethical principles. Rather than focussing on AI technology, this Guide focuses on the ethical issues likely to face HPE teachers and administrators as they encounter and use AI systems in their teaching environment. While many of the ethical principles may be familiar to readers in other contexts, they will be viewed in light of AI, and some unfamiliar issues will be introduced. They include data gathering, anonymity, privacy, consent, data ownership, security, bias, transparency, responsibility, autonomy, and beneficence. In the Guide, each topic explains the concept and its importance and gives some indication of how to cope with its complexities. Ideas are drawn from personal experience and the relevant literature. In most topics, further reading is suggested so that readers may further explore the concepts at their leisure. The aim is for HPE teachers and decision-makers at all levels to be alert to these issues and to take proactive action to be prepared to deal with the ethical problems and opportunities that AI usage presents to HPE.
Collapse
Affiliation(s)
- Ken Masters
- Medical Education and Informatics Department, College of Medicine and Health Sciences, Sultan Qaboos University, Muscat, Sultanate of Oman
| |
Collapse
|
12
|
Gin BC. Evolving natural language processing towards a subjectivist inductive paradigm. MEDICAL EDUCATION 2023; 57:384-387. [PMID: 36739578 DOI: 10.1111/medu.15024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Affiliation(s)
- Brian C Gin
- Department of Pediatrics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
13
|
Maimone C, Dolan BM, Green MM, Sanguino SM, Garcia PM, O’Brien CL. Utilizing Natural Language Processing of Narrative Feedback to Develop a Predictive Model of Pre-Clerkship Performance: Lessons Learned. PERSPECTIVES ON MEDICAL EDUCATION 2023; 12:141-148. [PMID: 37151853 PMCID: PMC10162355 DOI: 10.5334/pme.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/19/2023] [Indexed: 05/09/2023]
Abstract
Background Natural language processing is a promising technique that can be used to create efficiencies in the review of narrative feedback to learners. The Feinberg School of Medicine has implemented formal review of pre-clerkship narrative feedback since 2014 through its portfolio assessment system but this process requires considerable time and effort. This article describes how natural language processing was used to build a predictive model of pre-clerkship student performance that can be utilized to assist competency committee reviews. Approach The authors took an iterative and inductive approach to the analysis, which allowed them to identify characteristics of narrative feedback that are both predictive of performance and useful to faculty reviewers. Words and phrases were manually grouped into topics that represented concepts illustrating student performance. Topics were reviewed by experienced reviewers, tested for consistency across time, and checked to ensure they did not demonstrate bias. Outcomes Sixteen topic groups of words and phrases were found to be predictive of performance. The best-fitting model used a combination of topic groups, word counts, and categorical ratings. The model had an AUC value of 0.92 on the training data and 0.88 on the test data. Reflection A thoughtful, careful approach to using natural language processing was essential. Given the idiosyncrasies of narrative feedback in medical education, standard natural language processing packages were not adequate for predicting student outcomes. Rather, employing qualitative techniques including repeated member checking and iterative revision resulted in a useful and salient predictive model.
Collapse
Affiliation(s)
- Christina Maimone
- Associate director of research data services, Northwestern IT Research Computing Services, Northwestern University, Evanston, Illinois, USA
| | - Brigid M. Dolan
- Associate professor of medicine and medical education and director of assessment, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Marianne M. Green
- Raymond H. Curry, MD Professor of Medical Education, professor of medicine, and vice dean for education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Sandra M. Sanguino
- Associate professor of pediatrics and senior associate dean of medical education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Patricia M. Garcia
- Professor of obstetrics and gynecology and medical education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Celia Laird O’Brien
- Assistant professor of medical education and assistant dean of program evaluation and accreditation, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|