Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gin BC, Ten Cate O, O'Sullivan PS, Hauer KE, Boscardin C. Exploring how feedback reflects entrustment decisions using artificial intelligence. Med Educ 2022;56:303-311. [PMID: 34773415 DOI: 10.1111/medu.14696] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 11/02/2021] [Accepted: 11/05/2021] [Indexed: 06/13/2023]

For:	Gin BC, Ten Cate O, O'Sullivan PS, Hauer KE, Boscardin C. Exploring how feedback reflects entrustment decisions using artificial intelligence. Med Educ 2022;56:303-311. [PMID: 34773415 DOI: 10.1111/medu.14696] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 11/02/2021] [Accepted: 11/05/2021] [Indexed: 06/13/2023]

Number

Cited by Other Article(s)

Cook DA, Overgaard J, Pankratz VS, Del Fiol G, Aakre CA. Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback. J Med Internet Res 2025;27:e68486. [PMID: 39854611 PMCID: PMC12008702 DOI: 10.2196/68486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 01/03/2025] [Accepted: 01/13/2025] [Indexed: 01/26/2025] Open

Abstract

BACKGROUND

Virtual patients (VPs) are computer screen-based simulations of patient-clinician encounters. VP use is limited by cost and low scalability.

OBJECTIVE

We aimed to show that VPs powered by large language models (LLMs) can generate authentic dialogues, accurately represent patient preferences, and provide personalized feedback on clinical performance. We also explored using LLMs to rate the quality of dialogues and feedback.

METHODS

We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI's generative pretrained transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough diagnosis and diabetes management), each with permutations representing different patient preferences, we created 60 conversations (dialogues plus feedback): 48 with a human clinician and 12 "self-chat" dialogues with GPT role-playing both the VP and clinician. Primary outcomes were dialogue authenticity and feedback quality, rated using novel instruments for which we conducted a validation study collecting evidence of content, internal structure (reproducibility), relations with other variables, and response process. Each conversation was rated by 3 physicians and by GPT. Secondary outcomes included user experience, bias, patient preferences represented in the dialogues, and conversation features that influenced authenticity.

RESULTS

The average cost per conversation was US $0.51 for GPT-4.0-Turbo and US $0.02 for GPT-3.5-Turbo. Mean (SD) conversation ratings, maximum 6, were overall dialogue authenticity 4.7 (0.7), overall user experience 4.9 (0.7), and average feedback quality 4.7 (0.6). For dialogues created using GPT-4.0-Turbo, physician ratings of patient preferences aligned with intended preferences in 20 to 47 of 48 dialogues (42%-98%). Subgroup comparisons revealed higher ratings for dialogues using GPT-4.0-Turbo versus GPT-3.5-Turbo and for human-generated versus self-chat dialogues. Feedback ratings were similar for human-generated versus GPT-generated ratings, whereas authenticity ratings were lower. We did not perceive bias in any conversation. Dialogue features that detracted from authenticity included that GPT was verbose or used atypical vocabulary (93/180, 51.7% of conversations), was overly agreeable (n=56, 31%), repeated the question as part of the response (n=47, 26%), was easily convinced by clinician suggestions (n=35, 19%), or was not disaffected by poor clinician performance (n=32, 18%). For feedback, detractors included excessively positive feedback (n=42, 23%), failure to mention important weaknesses or strengths (n=41, 23%), or factual inaccuracies (n=39, 22%). Regarding validation of dialogue and feedback scores, items were meticulously developed (content evidence), and we confirmed expected relations with other variables (higher ratings for advanced LLMs and human-generated dialogues). Reproducibility was suboptimal, due largely to variation in LLM performance rather than rater idiosyncrasies.

CONCLUSIONS

LLM-powered VPs can simulate patient-clinician dialogues, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings.

Collapse

Gin BC, O'Sullivan PS, Hauer KE, Abdulnour RE, Mackenzie M, Ten Cate O, Boscardin CK. Entrustment and EPAs for Artificial Intelligence (AI): A Framework to Safeguard the Use of AI in Health Professions Education. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2025;100:264-272. [PMID: 39761533 DOI: 10.1097/acm.0000000000005930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]

Abstract

ABSTRACT

In this article, the authors propose a repurposing of the concept of entrustment to help guide the use of artificial intelligence (AI) in health professions education (HPE). Entrustment can help identify and mitigate the risks of incorporating generative AI tools with limited transparency about their accuracy, source material, and disclosure of bias into HPE practice. With AI's growing role in education-related activities, like automated medical school application screening and feedback quality and content appraisal, there is a critical need for a trust-based approach to ensure these technologies are beneficial and safe. Drawing parallels with HPE's entrustment concept, which assesses a trainee's readiness to perform clinical tasks-or entrustable professional activities-the authors propose assessing the trustworthiness of AI tools to perform an HPE-related task across 3 characteristics: ability (competence to perform tasks accurately), integrity (transparency and honesty), and benevolence (alignment with ethical principles). The authors draw on existing theories of entrustment decision-making to envision a structured way to decide on AI's role and level of engagement in HPE-related tasks, including proposing an AI-specific entrustment scale. Identifying tasks that AI could be entrusted with provides a focus around which considerations of trustworthiness and entrustment decision-making may be synthesized, making explicit the risks associated with AI use and identifying strategies to mitigate these risks. Responsible, trustworthy, and ethical use of AI requires health professions educators to develop safeguards for using it in teaching, learning, and practice-guardrails that can be operationalized via applying the entrustment concept to AI. Without such safeguards, HPE practice stands to be shaped by the oncoming wave of AI innovations tied to commercial motivations, rather than modeled after HPE principles-principles rooted in the trust and transparency that are built together with trainees and patients.

Collapse

Hallquist E, Gupta I, Montalbano M, Loukas M. Applications of Artificial Intelligence in Medical Education: A Systematic Review. Cureus 2025;17:e79878. [PMID: 40034416 PMCID: PMC11872247 DOI: 10.7759/cureus.79878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2025] [Indexed: 03/05/2025] Open

Fernandes RD, de Vries I, McEwen L, Mann S, Phillips T, Zevin B. Evaluating the Quality of Narrative Feedback for Entrustable Professional Activities in a Surgery Residency Program. Ann Surg 2024;280:916-924. [PMID: 38660808 DOI: 10.1097/sla.0000000000006308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]

Abstract

OBJECTIVE

To assess the quality of narrative feedback given to surgical residents during the first 5 years of competency-based medical education implementation.

BACKGROUND

Competency-based medical education requires ongoing formative assessments and feedback on learners' performance.

METHODS

We conducted a retrospective cross-sectional study using assessments of entrustable professional activities (EPAs) in the Surgical Foundations curriculum at Queen's University from 2017 to 2022. Two raters independently evaluated the quality of narrative feedback using the Quality of Assessment of Learning score (0-5).

RESULTS

A total of 3900 EPA assessments were completed over 5 years. Of assessments, 57% (2229/3900) had narrative feedback documented with a mean Quality of Assessment of Learning score of 2.16 ± 1.49. Of these, 1614 (72.4%) provided evidence about the resident's performance, 951 (42.7%) provided suggestions for improvement, and 499/2229 (22.4%) connected suggestions to the evidence. There was no meaningful change in narrative feedback quality over time ( r = 0.067, P = 0.002). Variables associated with lower quality of narrative feedback include: attending role (2.04 ± 1.48) compared with the medical student (3.13 ± 1.12, P < 0.001) and clinical fellow (2.47 ± 1.54, P < 0.001), concordant specialties between the assessor and learner (2.06 ± 1.50 vs 2.21 ± 1.49, P = 0.025), completion of the assessment 1 month or more after the encounter versus 1 week (1.85 ± 1.48 vs 2.23 ± 1.49, P < 0.001), and resident entrusted versus not entrusted to perform the assessed EPA (2.13 ± 1.45 vs 2.35 ± 1.66; P = 0.008). The quality of narrative feedback was similar for assessments completed under direct and indirect observation (2.18 ± 1.47 vs 2.06 ± 1.54; P = 0.153).

CONCLUSIONS

Just over half of the EPA assessments of surgery residents contained narrative feedback with overall fair quality. There was no meaningful change in the quality of feedback over 5 years. These findings prompt future research and faculty development.

Collapse

Huang G, Li Y, Jameel S, Long Y, Papanastasiou G. From explainable to interpretable deep learning for natural language processing in healthcare: How far from reality? Comput Struct Biotechnol J 2024;24:362-373. [PMID: 38800693 PMCID: PMC11126530 DOI: 10.1016/j.csbj.2024.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 05/29/2024] Open

Gin BC, Ten Cate O, O'Sullivan PS, Boscardin C. Assessing supervisor versus trainee viewpoints of entrustment through cognitive and affective lenses: an artificial intelligence investigation of bias in feedback. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024;29:1571-1592. [PMID: 38388855 PMCID: PMC11549112 DOI: 10.1007/s10459-024-10311-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 01/21/2024] [Indexed: 02/24/2024]

Abstract

The entrustment framework redirects assessment from considering only trainees' competence to decision-making about their readiness to perform clinical tasks independently. Since trainees and supervisors both contribute to entrustment decisions, we examined the cognitive and affective factors that underly their negotiation of trust, and whether trainee demographic characteristics may bias them. Using a document analysis approach, we adapted large language models (LLMs) to examine feedback dialogs (N = 24,187, each with an associated entrustment rating) between medical student trainees and their clinical supervisors. We compared how trainees and supervisors differentially documented feedback dialogs about similar tasks by identifying qualitative themes and quantitatively assessing their correlation with entrustment ratings. Supervisors' themes predominantly reflected skills related to patient presentations, while trainees' themes were broader-including clinical performance and personal qualities. To examine affect, we trained an LLM to measure feedback sentiment. On average, trainees used more negative language (5.3% lower probability of positive sentiment, p < 0.05) compared to supervisors, while documenting higher entrustment ratings (+ 0.08 on a 1-4 scale, p < 0.05). We also found biases tied to demographic characteristics: trainees' documentation reflected more positive sentiment in the case of male trainees (+ 1.3%, p < 0.05) and of trainees underrepresented in medicine (UIM) (+ 1.3%, p < 0.05). Entrustment ratings did not appear to reflect these biases, neither when documented by trainee nor supervisor. As such, bias appeared to influence the emotive language trainees used to document entrustment more than the degree of entrustment they experienced. Mitigating these biases is nonetheless important because they may affect trainees' assimilation into their roles and formation of trusting relationships.

Collapse

Dabbagh A, Madadi F, Larijani B. Role of AI in Competency-Based Medical Education: Using EPA as the Magicbox. ARCHIVES OF IRANIAN MEDICINE 2024;27:633-635. [PMID: 39534999 PMCID: PMC11558609 DOI: 10.34172/aim.31795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/18/2024] [Indexed: 11/16/2024]

Kerth JL, Bosse HM. The Future of Postgraduate Training? An Update on the Use of Entrustable Professional Activities in Pediatric Professional Training. Acad Pediatr 2024;24:1035-1037. [PMID: 38971522 DOI: 10.1016/j.acap.2024.06.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 06/05/2024] [Accepted: 06/28/2024] [Indexed: 07/08/2024]

de Laat JM, van der Horst-Schrivers AN, Appelman-Dijkstra NM, Bisschop PH, Dreijerink KM, Drent ML, van de Klauw MM, de Ranitz WL, Stades AM, Stikkelbroeck NM, Timmers HJ, ten Cate O. Assessment of Entrustable Professional Activities Among Dutch Endocrine Supervisors. JOURNAL OF CME 2024;13:2360137. [PMID: 38831939 PMCID: PMC11146265 DOI: 10.1080/28338073.2024.2360137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 05/19/2024] [Indexed: 06/05/2024]

Abstract

Entrustable Professional Activities (EPAs) are an important tool to support individualisation of medical training in a competency-based setting and are increasingly implemented in the clinical speciality training for endocrinologist. This study aims to assess interrater agreement and factors that potentially impact EPA scores. Five known factors that affect entrustment decisions in health profesions training (capability, integrity, reliability, humility, agency) were used in this study. A case-vignette study using standardised written cases. Case vignettes (n = 6) on the topics thyroid disease, pituitary disease, adrenal disease, calcium and bone disorders, diabetes mellitus, and gonadal disorders were written by two endocrinologists and a medical education expert and assessed by endocrinologists experienced in the supervision of residents in training. Primary outcome is the inter-rater agreement of entrustment decisions for endocrine EPAs among raters. Secondary outcomes included the dichotomous interrater agreement (entrusted vs. non-entrusted), and an exploration of factors that impact decision-making. The study protocol was registered and approved by the Ethical Review Board of the Netherlands Association for Medical Education (NVMO-ERB # 2020.2.5). Nine endocrinologists from six different academic regions participated. Overall, the Fleiss Kappa measure of agreement for the EPA level was 0.11 (95% CI: 0.03-0.22) and for the entrustment decision 0.24 (95% CI 0.11-0.37). Of the five features that impacted the entrustment decision, capability was ranked as the most important by a majority of raters (56%-67%) in every case. There is a considerable discrepancy between the EPA levels assigned by different raters. These findings emphasise the need to base entrustment decisions on multiple observations, made by a team of supervisors and enriched with factors other than direct medical competence.

Collapse

Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide No. 156. MEDICAL TEACHER 2023;45:565-573. [PMID: 36862064 DOI: 10.1080/0142159x.2023.2180340] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]

Masters K. Ethical use of Artificial Intelligence in Health Professions Education: AMEE Guide No. 158. MEDICAL TEACHER 2023;45:574-584. [PMID: 36912253 DOI: 10.1080/0142159x.2023.2186203] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]

Gin BC. Evolving natural language processing towards a subjectivist inductive paradigm. MEDICAL EDUCATION 2023;57:384-387. [PMID: 36739578 DOI: 10.1111/medu.15024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 01/31/2023] [Indexed: 06/18/2023]

Maimone C, Dolan BM, Green MM, Sanguino SM, Garcia PM, O’Brien CL. Utilizing Natural Language Processing of Narrative Feedback to Develop a Predictive Model of Pre-Clerkship Performance: Lessons Learned. PERSPECTIVES ON MEDICAL EDUCATION 2023;12:141-148. [PMID: 37151853 PMCID: PMC10162355 DOI: 10.5334/pme.40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/19/2023] [Indexed: 05/09/2023]