1
|
Bazerbachi F, Murad F, Kubiliun N, Adams MA, Shahidi N, Visrodia K, Essex E, Raju G, Greenberg C, Day LW, Elmunzer BJ. Video recording in GI endoscopy. VIDEOGIE : AN OFFICIAL VIDEO JOURNAL OF THE AMERICAN SOCIETY FOR GASTROINTESTINAL ENDOSCOPY 2025; 10:67-80. [PMID: 40012896 PMCID: PMC11852952 DOI: 10.1016/j.vgie.2024.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2025]
Abstract
The current approach to procedure reporting in endoscopy aims to capture essential findings and interventions but inherently sacrifices the rich detail and nuance of the entire endoscopic experience. Endoscopic video recording (EVR) provides a complete archive of the procedure, extending the utility of the encounter beyond diagnosis and intervention, and potentially adding significant value to the care of the patient and the field in general. This white paper outlines the potential of EVR in clinical care, quality improvement, education, and artificial intelligence-driven innovation, and addresses critical considerations surrounding technology, regulation, ethics, and privacy. As with other medical imaging modalities, growing adoption of EVR is inevitable, and proactive engagement of professional societies and practitioners is essential to harness the full potential of this technology toward improving clinical care, education, and research.
Collapse
Affiliation(s)
- Fateh Bazerbachi
- CentraCare, Interventional Endoscopy Program, St Cloud Hospital, St Cloud, Minnesota, USA
- Division of Gastroenterology, Hepatology and Nutrition, University of Minnesota, Minneapolis, Minnesota, USA
| | - Faris Murad
- Illinois Masonic Medical Center, Center for Advanced Care, Chicago, Illinois, USA
| | - Nisa Kubiliun
- Division of Digestive and Liver Diseases, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Megan A Adams
- Division of Gastroenterology, University of Michigan Medical School, Institute for Healthcare Policy and Innovation, Ann Arbor, Michigan, USA; Institute for Healthcare Policy and Innovation, Ann Arbor, Michigan, USA
| | - Neal Shahidi
- Division of Gastroenterology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Kavel Visrodia
- Columbia University Irving Medical Center - New York Presbyterian Hospital, New York, New York, USA
| | - Eden Essex
- American Society for GI Endoscopy, Downers Grove, Illinois, USA
| | - Gottumukkala Raju
- Division of Internal Medicine, Department of Gastroenterology Hepatology and Nutrition, MD Anderson Cancer Center, Houston, Texas, USA
| | - Caprice Greenberg
- Department of Surgery, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Lukejohn W Day
- Division of Gastroenterology, Department of Medicine, University of California San Francisco, San Francisco, California, USA
| | - B Joseph Elmunzer
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
2
|
Roberts C, Burgess A, Mossman K, Kumar K. Professional judgement: a social practice perspective on a multiple mini-interview for specialty training selection. BMC MEDICAL EDUCATION 2025; 25:18. [PMID: 39754259 DOI: 10.1186/s12909-024-06535-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 12/16/2024] [Indexed: 01/06/2025]
Abstract
BACKGROUND Interviewers' judgements play a critical role in competency-based assessments for selection such as the multiple-mini-interview (MMI). Much of the published research focuses on the psychometrics of selection and the impact of rater subjectivity. Within the context of selecting for entry into specialty postgraduate training, we used an interpretivist and socio-constructivist approach to explore how and why interviewers make judgments in high stakes selection settings whilst taking part in an MMI. METHODS We explored MMI interviewers' work processes through an institutional observational approach, based on the notion that interviewers' judgements are socially constructed and mediated by multiple factors. We gathered data through document analysis, and observations of interviewer training, candidate interactions with interviewers, and interviewer meetings. Interviews included informal encounters in a large selection centre. Data analysis balanced description and explicit interpretation of the meanings and functions of the interviewers' actions and behaviours. RESULTS Three themes were developed from the data showing how interviewers make professional judgements, specifically by; 'Balancing the interplay of rules and agency,' 'Participating in moderation and shared meaning making; and 'A culture of reflexivity and professional growth.' Interviewers balanced the following of institutional rules with making judgment choices based on personal expertise and knowledge. They engaged in dialogue, moderation, and shared meaning with fellow interviewers which enabled their consideration of multiple perspectives of the candidate's performance. Interviewers engaged in self-evaluation and reflection throughout, with professional learning and growth as primary care physicians and supervisors being an emergent outcome. CONCLUSION This study offers insights into the judgment-making processes of interviewers in high-stakes MMI contexts, highlighting the balance between structured protocols and personal expertise within a socially constructed framework. By linking MMI practices to the broader work-based assessment literature, we contribute to advancing the design and implementation of more valid and fair selection tools for postgraduate training. Additionally, the study underscores the dual benefit of MMIs-not only as a selection tool but also as a platform for interviewers' professional growth. These insights offer practical implications for refining future MMI practices and improving the fairness of high-stakes selection processes.
Collapse
Affiliation(s)
- Chris Roberts
- School of Medicine and Population Health, Division of Medicine, The University of Sheffield, Sheffield, UK.
| | - Annette Burgess
- Sydney Medical School - Education Office, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Karyn Mossman
- Sydney Medical School - Northern Clinical School, The University of Sydney, Sydney, NSW, Australia
| | - Koshila Kumar
- Division of Learning and Teaching, Charles Sturt University, Bathurst, NSW, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
3
|
Dziadzko M, Varvinskiy A, Di Loreto R, Scipioni H, Ateleanu B, Klimek M, Berger-Estilita J. Examiner workload comparison: three structured oral examination formats for the European diploma in anaesthesiology and intensive care. MEDICAL EDUCATION ONLINE 2024; 29:2364990. [PMID: 38848480 PMCID: PMC11164053 DOI: 10.1080/10872981.2024.2364990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 06/03/2024] [Indexed: 06/09/2024]
Abstract
The COVID-19 pandemic triggered transformations in academic medicine, rapidly adopting remote teaching and online assessments. Whilst virtual environments show promise in evaluating medical knowledge, their impact on examiner workload is unclear. This study explores examiner's workload during different European Diploma in Anaesthesiology and Intensive Care Part 2 Structured Oral Examinations formats. We hypothesise that online exams result in lower examiner's workload than traditional face-to-face methods. We also investigate workload structure and its correlation with examiner characteristics and marking performance. In 2023, examiner's workload for three examination formats (face-to-face, hybrid, online) using the NASA TLX instrument was prospectively evaluated. The impact of examiner demographics, candidate scoring agreement, and examination scores on workload was analysed. The overall NASA TLX score from 215 workload measurements in 142 examiners was high at 59.61 ± 14.13. The online examination had a statistically higher workload (61.65 ± 12.84) than hybrid but not face-to-face. Primary contributors to workload were mental and temporal demands, and effort. Online exams were associated with elevated frustration. Male examiners and those spending more time on exam preparation experienced a higher workload. Multiple diploma specialties and familiarity with European Diploma in Anaesthesiology and Intensive Care exams were protective against high workload. Perceived workload did not impact marking agreement or examination scores across all formats. Examiners experience high workload. Online exams are not systematically associated with decreased workload, likely due to frustration. Despite workload differences, no impact on examiner's performance or examination scores was found. The hybrid examination mode, combining face-to-face and online, was associated with a minor but statistically significant workload reduction. This hybrid approach may offer a more balanced and efficient examination process while maintaining integrity, cost savings, and increased accessibility for candidates.
Collapse
Affiliation(s)
- Mikhail Dziadzko
- Department of Anesthesia, Intensive Care and Pain Management, Hospices Civils de Lyon, Hôpital de la Croix Rousse, Lyon, France
- Research on Healthcare Performance (RESHAPE) U1290-INSERM, Université Claude Bernard Lyon 1, Lyon, France
| | - Andrey Varvinskiy
- South Devon Healthcare NHS Foundation Trust, Department of Anesthesia and Intensive Care, Torquay, UK
| | - Rodolphe Di Loreto
- European Society of Anaesthesiology and Intensive Care, Examinations Office, Brussels, Belgium
| | - Hugues Scipioni
- European Society of Anaesthesiology and Intensive Care, Examinations Office, Brussels, Belgium
| | - Bazil Ateleanu
- European Society of Anesthesiology and Intensive Care, Examinations Committee, Brussels, Belgium
- Department of Anaesthesia, University Hospital of Wales, Cardiff, UK
| | - Markus Klimek
- European Society of Anesthesiology and Intensive Care, Examinations Committee, Brussels, Belgium
- Department of Anaesthesiology, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Joana Berger-Estilita
- European Society of Anesthesiology and Intensive Care, Examinations Committee, Brussels, Belgium
- Institute for Medical Education, University of Bern, Bern, Switzerland
- Hirslanden Hospital Group, Institute of Anaesthesiology and Intensive Care, Salem Spital, Bern, Switzerland
- CINTESIS - Centre for Health Technology and Services Research, Faculty of Medicine, Porto, Portugal
| |
Collapse
|
4
|
Smith SE, McColgan-Smith S, Stewart F, Mardon J, Tallentire VR. Beyond reliability: assessing rater competence when using a behavioural marker system. Adv Simul (Lond) 2024; 9:55. [PMID: 39736776 DOI: 10.1186/s41077-024-00329-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open
Abstract
BACKGROUND Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS - pharmacists' behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect. METHODS Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist's behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson's chi-squared test. RESULTS The ICC for experienced faculty raters was good at 0.60 (0.48-0.72) and for near-peer raters was poor at 0.38 (0.27-0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077). CONCLUSIONS Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.
Collapse
Affiliation(s)
| | | | | | - Julie Mardon
- Scottish Centre for Simulation and Clinical Human Factors, NHS Forth Valley, Larbert, UK
| | | |
Collapse
|
5
|
Meguerdichian MJ, Trottier DG, Campbell-Taylor K, Bentley S, Bryant K, Kolbe M, Grant V, Cheng A. When common cognitive biases impact debriefing conversations. Adv Simul (Lond) 2024; 9:48. [PMID: 39695901 DOI: 10.1186/s41077-024-00324-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 11/23/2024] [Indexed: 12/20/2024] Open
Abstract
Healthcare debriefing is a cognitively demanding conversation after a simulation or clinical experience that promotes reflection, underpinned by psychological safety and attention to learner needs. The process of debriefing requires mental processing that engages both "fast" or unconscious thinking and "slow" intentional thinking to be able to navigate the conversation. "Fast" thinking has the potential to surface cognitive biases that impact reflection and may negatively influence debriefer behaviors, debriefing strategies, and debriefing foundations. As a result, negative cognitive biases risk undermining learning outcomes from debriefing conversations. As the use of healthcare simulation is expanding, the need for faculty development specific to the roles bias plays is imperative. In this article, we hope to build awareness about common cognitive biases that may present in debriefing conversations so debriefers have the chance to begin the hard work of identifying and attending to their potential detrimental impacts.
Collapse
Affiliation(s)
- Michael J Meguerdichian
- Institute for Simulation and Advanced Learning, 1400 Pelham Parkway S, Bronx, NY, 10461, USA.
- Department of Emergency Medicine, NYC Health+Hospitals: Harlem Hospital Center, 506 Malcolm X Blvd, New York, NY, USA.
| | - Dana George Trottier
- Institute for Simulation and Advanced Learning, 1400 Pelham Parkway S, Bronx, NY, 10461, USA
| | | | - Suzanne Bentley
- Icahn School of Medicine at Mt. Sinai, Gustave L. Levy Pl, Elmhurst Hospital Center, 79-01 Broadway, Queens, New York, NY, 10029, USA
| | - Kellie Bryant
- National League of Nursing, 2600 Virginia Ave NW, Washington D.C, 20037, USA
| | - Michaela Kolbe
- Simulation Centre, University Hospital Zurich, Zurich, Switzerland
| | - Vincent Grant
- eSim Provincial Simulation Program for Alberta Health Services, Alberta, Canada
| | - Adam Cheng
- Department of Pediatrics and Emergency Medicine, University of Calgary, 28 Oki Drive NW, Calgary, Canada
| |
Collapse
|
6
|
Wood TJ, Daniels VJ, Pugh D, Touchie C, Halman S, Humphrey-Murto S. Implicit versus explicit first impressions in performance-based assessment: will raters overcome their first impressions when learner performance changes? ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024; 29:1155-1168. [PMID: 38010576 DOI: 10.1007/s10459-023-10302-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/12/2023] [Indexed: 11/29/2023]
Abstract
First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to assess if first impressions affect raters' judgments when workplace performance changes. Second, whether explicitly stating these impressions affects subsequent ratings compared to implicitly-formed first impressions. Physician raters viewed six videos where learner performance either changed (Strong to Weak or Weak to Strong) or remained consistent. Raters were assigned two groups. Group one (n = 23, Explicit) made a first impression global rating (FIGR), then scored learners using the Mini-CEX. Group two (n = 22, Implicit) scored learners at the end of the video solely with the Mini-CEX. For the Explicit group, in the Strong to Weak condition, the FIGR (M = 5.94) was higher than the Mini-CEX Global rating (GR) (M = 3.02, p < .001). In the Weak to Strong condition, the FIGR (M = 2.44) was lower than the Mini-CEX GR (M = 3.96 p < .001). There was no difference between the FIGR and the Mini-CEX GR in the consistent condition (M = 6.61, M = 6.65 respectively, p = .84). There were no statistically significant differences in any of the conditions when comparing both groups' Mini-CEX GR. Therefore, raters adjusted their judgments based on the learners' performances. Furthermore, raters who made their first impressions explicit showed similar rater bias to raters who followed a more naturalistic process.
Collapse
Affiliation(s)
- Timothy J Wood
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada.
| | - Vijay J Daniels
- Department of Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Canada
| | - Debra Pugh
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
- Medical Council of Canada, Ottawa, Canada
| | - Claire Touchie
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Samantha Halman
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Susan Humphrey-Murto
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| |
Collapse
|
7
|
Khawaji B, Masuadi E, Alraddadi A, Khan MA, Aga SS, Al-Jifree H, Magzoub ME. Tutor assessment of medical students in problem-based learning sessions. JOURNAL OF EDUCATION AND HEALTH PROMOTION 2024; 13:237. [PMID: 39297122 PMCID: PMC11410280 DOI: 10.4103/jehp.jehp_1413_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 12/04/2023] [Indexed: 09/21/2024]
Abstract
BACKGROUND Problem-based learning (PBL) is a method of learning that has been adopted in different curricula of different disciplines for more than 30 years; the assessment of the students in PBL sessions in medical schools is fundamental to ensure students' attainment of the expected outcomes of conducting PBL sessions and in providing the students with the feedback that help them to develop and encourage their learning. This study investigated the inter-rater reliability of the tutor assessment in assessing medical students' performance in their PBL tutorial sessions. MATERIALS AND METHODS This study was conducted in the College of Medicine (COM), in the academic year 2021-2022. The study involved ten raters (tutors) of two genders who assessed 33 students in three separate PBL tutorial sessions. The PBL sessions were prerecorded and shown to the 10 raters for their assessment of PBL sessions. RESULTS This study showed that male raters gave higher scores to students compared with female raters. In addition, this investigation showed low inter-rater reliability and poor agreement among the raters in assessing students' performance in PBL tutorial sessions. CONCLUSION This study suggests that PBL tutor assessment should be reviewed and evaluated; this should be performed with consideration of using assessment domains and criteria of performance. Thus, we recommend that 360-degree assessment including tutor, self, and peer assessment should be used to provide effective feedback to students in PBL tutorial sessions.
Collapse
Affiliation(s)
- Bader Khawaji
- Department of Basic Medical Sciences, College of Medicine, King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), Jeddah, Saudi Arabia
- King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard, Health Affairs, Jeddah, Saudi Arabia
| | - Emad Masuadi
- College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Abdulrahman Alraddadi
- King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard, Health Affairs, Riyadh, Saudi Arabia
- Department of Basic Medical Sciences, College of Medicine, King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), Riyadh, Saudi Arabia
| | - Muhammad Anwar Khan
- King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard, Health Affairs, Jeddah, Saudi Arabia
- Department of Medical Education, College of Medicine, King Saud Bin Abdulaziz University for Health Sciences, Jeddah, Saudi Arabia
| | - Syed Sameer Aga
- Department of Basic Medical Sciences, College of Medicine, King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), Jeddah, Saudi Arabia
- King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard, Health Affairs, Jeddah, Saudi Arabia
| | - Hatim Al-Jifree
- King Abdullah International Medical Research Center (KAIMRC), Ministry of National Guard, Health Affairs, Jeddah, Saudi Arabia
- Department of Oncology, King Abdulaziz Medical City, Ministry of National Guard Health Affairs, Jeddah, Saudi Arabia
| | - Mohi Eldin Magzoub
- College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
8
|
Sahi N, Humphrey-Murto S, Brennan EE, O'Brien M, Hall AK. Current use of simulation for EPA assessment in emergency medicine. CAN J EMERG MED 2024; 26:179-187. [PMID: 38374281 DOI: 10.1007/s43678-024-00649-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 01/12/2024] [Indexed: 02/21/2024]
Abstract
OBJECTIVE Approximately five years ago, the Royal College emergency medicine programs in Canada implemented a competency-based paradigm and introduced the use of Entrustable Professional Activities (EPAs) for assessment of units of professional activity to assess trainees. Many competency-based medical education (CBME) based curricula, involve assessing for entrustment through observations of EPAs. While EPAs are frequently assessed in clinical settings, simulation is also used. This study aimed to characterize the use of simulation for EPA assessment. METHODS A study interview guide was jointly developed by all study authors and followed best practices for survey development. A national interview was conducted with program directors or assistant program directors across all the Royal College emergency medicine programs across Canada. Interviews were conducted over Microsoft Teams, interviews were recorded and transcribed, using Microsoft Teams transcribing service. Sample transcripts were analyzed for theme development. Themes were then reviewed by co-authors to ensure they were representative of the participants' views. RESULTS A 64.7% response rate was achieved. Simulation has been widely adopted by EM training programs. All interviewees demonstrated support for the use of simulation for EPA assessment for many reasons, however, PDs acknowledged limitations and thematic analysis revealed certain themes and tensions for using simulation for EPA assessment. Thematic analysis revealed six major themes: widespread support for the use of simulation for EPA assessment, concerns regarding the potential for EPA assessment to become a "tick- box" exercise, logistical barriers limiting the use of simulation for EPA assessment, varied perceptions about the authenticity of using simulation for EPA assessment, the potential for simulation for EPA assessment to compromise learner psychological safety, and suggestions for the optimization of use of simulation for EPA assessment. CONCLUSIONS Our findings offer insight for other programs and specialties on how simulation for EPA assessment can best be utilized. Programs should use these findings when considering using simulation for EPA assessment.
Collapse
Affiliation(s)
- Nidhi Sahi
- Department of Innovation in Medical Education (DIME), University of Ottawa, Ottawa, ON, Canada.
| | - Susan Humphrey-Murto
- Department of Medicine, University of Ottawa, Ottawa, ON, Canada
- Tier 2 Research Chair in Medical Education and Fellowship Director, Medical Education Research, University of Ottawa, Ottawa, ON, Canada
| | - Erin E Brennan
- Department of Emergency Medicine, Queen's University, Kingston, ON, Canada
| | - Michael O'Brien
- Emergency Medicine, The Ottawa Hospital, Ottawa, ON, Canada
- Department of Innovation in Medical Education, University of Ottawa, Ottawa, ON, Canada
| | - Andrew K Hall
- Department of Emergency Medicine, University of Ottawa, Ottawa, ON, Canada
- Royal College of Physicians and Surgeons of Canada, Ottawa, ON, Canada
| |
Collapse
|
9
|
Yang D, Draganov PV, Pohl H, Aihara H, Jeyalingam T, Khashab M, Liu N, Hasan MK, Jawaid S, Othman M, Al-Haddad M, DeWitt JM, Triggs JR, Wang AY, Bechara R, Sethi A, Law R, Aadam AA, Kumta N, Sharma N, Hayat M, Zhang Y, Yi F, Elmunzer BJ. Development and initial validation of a video-based peroral endoscopic myotomy assessment tool. Gastrointest Endosc 2024; 99:177-185. [PMID: 37500019 DOI: 10.1016/j.gie.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/18/2023] [Accepted: 07/19/2023] [Indexed: 07/29/2023]
Abstract
BACKGROUND AND AIMS Video analysis has emerged as a potential strategy for performance assessment and improvement. We aimed to develop a video-based skill assessment tool for peroral endoscopic myotomy (POEM). METHODS POEM was deconstructed into basic procedural components through video analysis by an expert panel. A modified Delphi approach and 2 validation exercises were conducted to refine the POEM assessment tool (POEMAT). Twelve assessors used the final POEMAT version to grade 10 videos. Fully crossed generalizability (G) studies investigated the contributions of assessors, endoscopists' performance, and technical elements to reliability. G coefficients below .5 were considered unreliable, between .5 and .7 as modestly reliable, and above .7 as indicative of satisfactory reliability. RESULTS After task deconstruction, discussions, and the modified Delphi process, the final POEMAT comprised 9 technical elements. G analysis showed low variance for endoscopist performance (.8%-24.9%) and high interrater variability (range, 63.2%-90.1%). The G score was moderately reliable (≥.60) for "submucosal tunneling" and "myotomy" and satisfactorily reliable (≥.70) for "active hemostasis" and "mucosal closure." CONCLUSIONS We developed and established initial content and response process validity evidence for the POEMAT. Future steps include appraisal of the tool using a wider range of POEM videos to establish and improve the discriminative validity of this tool.
Collapse
Affiliation(s)
- Dennis Yang
- Center for Interventional Endoscopy, AdventHealth, Orlando, Florida, USA.
| | - Peter V Draganov
- Division of Gastroenterology and Hepatology, University of Florida, Gainesville, Florida, USA
| | - Heiko Pohl
- Veterans Affairs Medical Center, White River Junction, Vermont; Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Hiroyuki Aihara
- Division of Gastroenterology, Hepatology and Endoscopy, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Thurarshen Jeyalingam
- Division of Gastroenterology and Hepatology, University of Toronto, Toronto, Ontario, Canada
| | - Mouen Khashab
- Division of Gastroenterology and Hepatology, Johns Hopkins Hospital, Baltimore, Maryland, USA
| | - Nanlong Liu
- Division of Gastroenterology, University of Louisville, Louisville, Kentucky, USA
| | - Muhammad K Hasan
- Center for Interventional Endoscopy, AdventHealth, Orlando, Florida, USA
| | - Salmaan Jawaid
- Division of Gastroenterology, Baylor College of Medicine, Houston, Texas, USA
| | - Mohamed Othman
- Division of Gastroenterology, Baylor College of Medicine, Houston, Texas, USA
| | - Mohamed Al-Haddad
- Department of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - John M DeWitt
- Department of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Joseph R Triggs
- Division of Gastroenterology, Fox Chase Cancer Center, Temple Health, Philadelphia, Pennsylvania, USA
| | - Andrew Y Wang
- Division of Gastroenterology and Hepatology, University of Virginia, Charlottesville, Virginia, USA
| | - Robert Bechara
- Division of Gastroenterology and GI Diseases Research Unit, Queen's University, Kingston, Ontario, Canada
| | - Amrita Sethi
- Division of Digestive and Liver Diseases, Columbia University Irving Medical Center, Presbyterian Hospital, New York, New York, USA
| | - Ryan Law
- Division of Gastroenterology and Hepatology, Mayo Clinic, Minneapolis, Minnesota, USA
| | - Aziz A Aadam
- Division of Gastroenterology and Hepatology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Nikhil Kumta
- Henry D. Janowitz Division of Gastroenterology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Neil Sharma
- Division of Interventional Oncology and Surgical Endoscopy (IOSE), Parkview Cancer Institute, Fort Wayne, Indiana, USA
| | - Maham Hayat
- Center for Interventional Endoscopy, AdventHealth, Orlando, Florida, USA
| | - YiYang Zhang
- Center for Collaborative Research, AdventHealth Research Institute, Orlando, Florida, USA
| | - Fanchao Yi
- Center for Collaborative Research, AdventHealth Research Institute, Orlando, Florida, USA
| | - B Joseph Elmunzer
- Department of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
10
|
Fu Y, Zhang W, Zhang S, Hua D, Xu D, Huang H. Applying a video recording, video-based rating method in OSCEs. MEDICAL EDUCATION ONLINE 2023; 28:2187949. [PMID: 36883331 PMCID: PMC10013518 DOI: 10.1080/10872981.2023.2187949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 03/01/2023] [Accepted: 03/02/2023] [Indexed: 06/18/2023]
Abstract
INTRODUCTION Objective structured clinical examination (OSCE) results could be affected by low homogeneity of examiners, non-retrospectiveness of test results, and examiner-cohort effect. In China, many students participate in medical qualification examinations, and this issue is particularly significant. This study aimed to develop a video recording, video-based rating method and compare the reliability of video and on-site ratings to enhance the quality assurance of OSCEs. METHODS The subjects of this study were clinical students one year after graduation participating in the clinical skills portion of the National Medical Licensing Examination. The participants were from four cities in Jiangsu province. Participants were randomly allocated to on-site and video rating groups to evaluate the rating methods consistency. We verified the reliability of recording equipment and evaluability of video recording. Moreover, we compared the consistency and equivalence of the two rating methods and analyzed the impact of video recording on scores. RESULTS The reliability of recording equipment and evaluability of video recording were high. Evaluation consistency between experts and examiners was acceptable, and there was no difference in evaluation results (P = 0.61). There was good consistency between video and on-site rating; however, a difference between the two rating methods was detected. The scores of video-based rating group students were lower than those of all students (P < 0.00). CONCLUSIONS Video-based rating could be reliable and offer advantages over on-site rating. The video recording, video-based rating method could provide greater content validity based on its traceability and the ability to view details. Video recording, video-based rating offers a promising mthod for improving the effectiveness and fairness of OSCEs.
Collapse
Affiliation(s)
- Yu Fu
- Oral and Maxillofacial Surgery Medicine, Affiliated Hospital of Stomatology, Nanjing Medical University, Nanjing, China
| | - Wenjuan Zhang
- Examination management department, National Medical Examination Center, Beijing, China
| | - Saiyi Zhang
- Examination management department, National Medical Examination Center, Beijing, China
| | - Dong Hua
- Department of Biomedical Engineering and Information, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Di Xu
- Department of Medical Simulation Center, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Hua Huang
- Department of Medical Simulation Center, Nanjing Medical University, Nanjing, Jiangsu, China
| |
Collapse
|
11
|
Thornby KA, Brazeau GA, Chen AMH. Reducing Student Workload Through Curricular Efficiency. AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION 2023; 87:100015. [PMID: 37597906 DOI: 10.1016/j.ajpe.2022.12.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 11/17/2022] [Accepted: 12/20/2022] [Indexed: 08/21/2023]
Abstract
OBJECTIVE This integrative review will examine the current literature assessing student workload, outcomes of increased workload and cognitive load, and approaches to evaluate and reduce student workload. Recommendations to better inform curriculum planning efforts will be presented along with a call to action to address the dilemma of student workload and curricular efficiency efforts. FINDINGS Literature supports that perceptions of heavy workload can influence students' approach to learning and lead to the adoption of surface learning rather than a deep approach that involves higher-order processing and critical thinking. Additionally, ongoing evidence suggests that workload expansion affects student well-being and potential burnout in professional programs, and specifically that students perceive workload as directly related to their well-being and satisfaction. Intentional planning by faculty and programs can address this issue through streamlining classroom content, reducing lecture time, and modifying preclass work to allow for efficient learning. Even if the curriculum is lecture-based, workload perceptions can be affected by developing clearer guidance to set expectations for learners, intentionality in classroom design, and creating opportunities for student engagement. SUMMARY Cognitive overload is multifactorial and complicated, given the increased standards of professional education accreditation and licensure requirements. As the Academy deliberately considers methods to improve curricular efficiency, there is an opportunity to focus on curriculum delivery with an appropriate balance of breadth and depth of instruction to ensure effective assessment and cognitive load.
Collapse
Affiliation(s)
- Krisy-Ann Thornby
- Palm Beach Atlantic University, Lloyd L. Gregory School of Pharmacy, West Palm Beach, FL, USA.
| | - Gayle A Brazeau
- Marshall University, School of Pharmacy, Huntington, WV, USA; Editor, American Journal of Pharmaceutical Education, Arlington, VA, USA
| | - Aleda M H Chen
- Cedarville University, School of Pharmacy, Cedarville, OH, USA
| |
Collapse
|
12
|
Klusmann D, Knorr M, Hampe W. Exploring the relationships between first impressions and MMI ratings: a pilot study. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2023; 28:519-536. [PMID: 36053344 PMCID: PMC10169880 DOI: 10.1007/s10459-022-10151-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 06/28/2022] [Indexed: 05/11/2023]
Abstract
The phenomenon of first impression is well researched in social psychology, but less so in the study of OSCEs and the multiple mini interview (MMI). To explore its bearing on the MMI method we included a rating of first impression in the MMI for student selection executed 2012 at the University Medical Center Hamburg-Eppendorf, Germany (196 applicants, 26 pairs of raters) and analyzed how it was related to MMI performance ratings made by (a) the same rater, and (b) a different rater. First impression was assessed immediately after an applicant entered the test room. Each MMI-task took 5 min and was rated subsequently. Internal consistency was α = .71 for first impression and α = .69 for MMI performance. First impression and MMI performance correlated by r = .49. Both measures weakly predicted performance in two OSCEs for communication skills, assessed 18 months later. MMI performance did not increment prediction above the contribution of first impression and vice versa. Prediction was independent of whether or not the rater who rated first impression also rated MMI performance. The correlation between first impression and MMI-performance is in line with the results of corresponding social psychological studies, showing that judgements based on minimal information moderately predict behavioral measures. It is also in accordance with the notion that raters often blend their specific assessment task outlined in MMI-instructions with the self-imposed question of whether a candidate would fit the role of a medical doctor.
Collapse
Affiliation(s)
- Dietrich Klusmann
- Institute of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf (UKE), N41, Martinistr, 52, 20246, Hamburg, Germany.
| | - Mirjana Knorr
- Institute of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf (UKE), N41, Martinistr, 52, 20246, Hamburg, Germany
| | - Wolfgang Hampe
- Institute of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf (UKE), N41, Martinistr, 52, 20246, Hamburg, Germany
| |
Collapse
|
13
|
Gonzalez PR, Paravattil B, Wilby KJ. Mental effort in the assessment of critical reflection: Implications for assessment quality and scoring. CURRENTS IN PHARMACY TEACHING & LEARNING 2022; 14:830-834. [PMID: 35914842 DOI: 10.1016/j.cptl.2022.06.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 05/16/2022] [Accepted: 06/16/2022] [Indexed: 06/15/2023]
Abstract
INTRODUCTION Critical reflection is a mainstay in the training of health professionals, yet assessment of reflection is commonly described as difficult, taxing, and resulting in inconsistent scoring across assessors. At the same time, there is evidence from experiential and simulation settings that assessors' mental effort may explain assessor variability, which could be a target for simplifications in assessment design. Assessors' mental effort for assessment of reflection is currently unknown. This study aimed to determine reliability of rubric scoring of critical reflection, variation in pass-fail rates, and the relationship between reflection scores and assessors' perceived mental effort. METHODS Eleven assessors were recruited to assess six reflection assignments using a published rubric. Mental effort was measured using the Paas scale for each assignment assessed and was correlated with rubric scores for each assignment. RESULTS Findings showed inconsistency in scoring between assessors, resulting in varying pass rates for each assignment (55-100%). All assignments demonstrated negative correlations between rubric scores and perceived mental effort (r = -0.115 to -0.649). CONCLUSIONS Findings support the notion that more work should be done to optimize assessment of critical reflection. Future studies should focus on disentangling the influence on mental effort of scoring tools, assignment structures, and writing quality.
Collapse
Affiliation(s)
| | | | - Kyle John Wilby
- College of Pharmacy, Faculty of Health, Dalhousie University, 5968 College Street, Halifax, Nova Scotia, Canada.
| |
Collapse
|
14
|
Malau‐Aduli BS. Patient involvement in assessment: How useful is it? MEDICAL EDUCATION 2022; 56:590-592. [PMID: 35298852 PMCID: PMC9311839 DOI: 10.1111/medu.14802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/11/2022] [Accepted: 03/12/2022] [Indexed: 06/14/2023]
Abstract
The author unveils a strategy for enquiry that can facilitate identification of best practices for involving real patients in OSCE and WBA competency‐based assessments
Collapse
Affiliation(s)
- Bunmi S. Malau‐Aduli
- College of Medicine and DentistryJames Cook UniversityTownsvilleQueenslandAustralia
| |
Collapse
|
15
|
Homer M. Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022; 27:457-473. [PMID: 35230590 PMCID: PMC9117341 DOI: 10.1007/s10459-022-10096-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 01/23/2022] [Indexed: 06/14/2023]
Abstract
Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade. The exam data is from 442 separate administrations of an 18 station OSCE for international medical graduates who want to work in the National Health Service in the UK. We find that variation due to examiner is approximately twice as large for domain scores as it is for grades (16% vs. 8%), with smaller residual variance in the former (67% vs. 76%). Combined estimates of exam-level (relative) reliability across all data are 0.75 and 0.69 for domains scores and grades respectively. The correlation between two separate estimates of stringency for individual examiners (one for grades and one for domain scores) is relatively high (r=0.76) implying that examiners are generally quite consistent in their stringency between these two assessments of performance. Cluster analysis indicates that examiners fall into two broad groups characterised as hawks or doves on both measures. At the exam level, correcting for examiner stringency produces systematically lower cut-scores under borderline regression standard setting than using the raw marks. In turn, such a correction would produce higher pass rates-although meaningful direct comparisons are challenging to make. As in other studies, this work shows that OSCEs and other standardised performance assessments are subject to substantial variation in examiner stringency, and require sufficient domain sampling to ensure quality of pass/fail decision-making is at least adequate. More, perhaps qualitative, work is needed to understand better how examiners might score similarly (or differently) between the awarding of station-level domain scores and global grades. The issue of the potential systematic bias of borderline regression evidenced for the first time here, with sources of error producing cut-scores higher than they should be, also needs more investigation.
Collapse
Affiliation(s)
- Matt Homer
- School of Medicine, Leeds Institute of Medical Education, University of Leeds, LS29JT, Leeds, UK.
| |
Collapse
|
16
|
Swanberg M, Woodson-Smith S, Pangaro L, Torre D, Maggio L. Factors and Interactions Influencing Direct Observation: A Literature Review Guided by Activity Theory. TEACHING AND LEARNING IN MEDICINE 2022; 34:155-166. [PMID: 34238091 DOI: 10.1080/10401334.2021.1931871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 04/19/2021] [Accepted: 05/11/2021] [Indexed: 06/13/2023]
Abstract
PhenomenonEnsuring that future physicians are competent to practice medicine is necessary for high quality patient care and safety. The shift toward competency-based education has placed renewed emphasis on direct observation via workplace-based assessments in authentic patient care contexts. Despite this interest and multiple studies focused on improving direct observation, challenges regarding the objectivity of this assessment approach remain underexplored and unresolved. Approach: We conducted a literature review of direct observation in authentic patient contexts by systematically searching databases PubMed, Embase, Web of Science, and ERIC. Included studies comprised original research conducted in the patient care context with authentic patients, either as a live encounter or a video recording of an actual encounter, which focused on factors affecting the direct observation of undergraduate medical education (UME) or graduate medical education (GME) trainees. Because the patient care context adds factors that contribute to the cognitive load of the learner and of the clinician-observer we focused our question on such contexts, which are most useful in judgments about advancement to the next level of training or practice. We excluded articles or published abstracts not conducted in the patient care context (e.g., OSCEs) or those involving simulation, allied health professionals, or non-UME/GME trainees. We also excluded studies focused on end-of-rotation evaluations and in-training evaluation reports. We extracted key data from the studies and used Activity Theory as a lens to identify factors affecting these observations and the interactions between them. Activity Theory provides a framework to understand and analyze complex human activities, the systems in which people work, and the interactions or tensions between multiple associated factors. Findings: Nineteen articles were included in the analysis; 13 involved GME learners and 6 UME learners. Of the 19, six studies were set in the operating room and four in the Emergency department. Using Activity Theory, we discovered that while numerous studies focus on rater and tool influences, very few study the impact of social elements. These are the rules that govern how the activity happens, the environment and members of the community involved in the activity and how completion of the activity is divided up among the members of the community. Insights: Viewing direct observation via workplace-based assessment through the lens of Activity Theory may enable educators to implement curricular changes to improve direct observation of assessment. Activity Theory may allow researchers to design studies to focus on the identified underexplored interactions and influences in relation to direct observation.
Collapse
Affiliation(s)
- Margaret Swanberg
- Department of Neurology, Uniformed Services University, Bethesda, Maryland, USA
| | - Sarah Woodson-Smith
- Department of Neurology, Naval Medical Center Portsmouth, Portsmouth, Virginia, USA
| | - Louis Pangaro
- Department of Medicine, Uniformed Services University, Bethesda, Maryland, USA
| | - Dario Torre
- Department of Medicine, Uniformed Services University, Bethesda, Maryland, USA
- Center for Health Professions Education, Uniformed Services University, Bethesda, Maryland, USA
| | - Lauren Maggio
- Department of Medicine, Uniformed Services University, Bethesda, Maryland, USA
- Center for Health Professions Education, Uniformed Services University, Bethesda, Maryland, USA
| |
Collapse
|
17
|
Fyfe M, Horsburgh J, Blitz J, Chiavaroli N, Kumar S, Cleland J. The do's, don'ts and don't knows of redressing differential attainment related to race/ethnicity in medical schools. PERSPECTIVES ON MEDICAL EDUCATION 2022; 11:1-14. [PMID: 34964930 PMCID: PMC8714874 DOI: 10.1007/s40037-021-00696-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 11/07/2021] [Accepted: 11/09/2021] [Indexed: 05/21/2023]
Abstract
INTRODUCTION Systematic and structural inequities in power and privilege create differential attainment whereby differences in average levels of performance are observed between students from different socio-demographic groups. This paper reviews the international evidence on differential attainment related to ethnicity/race in medical school, drawing together the key messages from research to date to provide guidance for educators to operationalize and enact change and identify areas for further research. METHODS Authors first identified areas of conceptual importance within differential attainment (learning, assessment, and systems/institutional factors) which were then the focus of a targeted review of the literature on differential attainment related to ethnicity/race in medical education and, where available and relevant, literature from higher education more generally. Each author then conducted a review of the literature and proposed guidelines based on their experience and research literature. The guidelines were iteratively reviewed and refined between all authors until we reached consensus on the Do's, Don'ts and Don't Knows. RESULTS We present 13 guidelines with a summary of the research evidence for each. Guidelines address assessment practices (assessment design, assessment formats, use of assessments and post-hoc analysis) and educational systems and cultures (student experience, learning environment, faculty diversity and diversity practices). CONCLUSIONS Differential attainment related to ethnicity/race is a complex, systemic problem reflective of unequal norms and practices within broader society and evident throughout assessment practices, the learning environment and student experiences at medical school. Currently, the strongest empirical evidence is around assessment processes themselves. There is emerging evidence of minoritized students facing discrimination and having different learning experiences in medical school, but more studies are needed. There is a pressing need for research on how to effectively redress systemic issues within our medical schools, particularly related to inequity in teaching and learning.
Collapse
Affiliation(s)
- Molly Fyfe
- Medical Education Innovation and Research Centre, Imperial College London, London, UK
| | - Jo Horsburgh
- Medical Education Innovation and Research Centre, Imperial College London, London, UK
- Centre for Higher Education Research and Scholarship, Imperial College London, London, UK
| | - Julia Blitz
- Centre for Health Professions Education, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa
| | | | - Sonia Kumar
- Medical Education Innovation and Research Centre, Imperial College London, London, UK
| | - Jennifer Cleland
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
18
|
Fleming M, Vautour D, McMullen M, Cofie N, Dalgarno N, Phelan R, Mizubuti GB. Examining the accuracy of residents' self-assessments and faculty assessment behaviours in anesthesiology. CANADIAN MEDICAL EDUCATION JOURNAL 2021; 12:17-26. [PMID: 34567302 PMCID: PMC8463238 DOI: 10.36834/cmej.70697] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
BACKGROUND Residents' accurate self-assessment and clinical judgment are essential for optimizing their clinical skills development. Evidence from the medical literature suggests that residents generally do poorly at self-assessing their performance, often due to factors relating to learners' personal backgrounds, cultures, the specific contexts of the learning environment and rater bias or inaccuracies. We evaluated the accuracy of anesthesiology residents' self-assessed Global Entrustment scores and determined whether differences between faculty and resident scores varied by resident seniority, faculty leniency, and/or year of assessment. METHODS We employed variance components modeling techniques and analyzed 329 pairs of faculty and self-assessed entrustment scores among 43 faculty assessors and 15 residents. Using faculty scores as the gold standard, we compared faculty scores with residents' scores (xi(faculty)-xi(resident)), and determined residents' accuracy, including over- and under-confidence. RESULTS The results indicate that residents were respectively over- and under-confident in 10.9% and 54.4% of the assessments but more consistent in their individual self-assessments (rho = 0.70) than faculty assessors. Faculty scores were significantly higher (α = 0.396; z = 4.39; p < 0.001) than residents' self-assessed scores. Being a lenient/dovish (β = 0.121, z = 3.16, p < 0.01) and a neutral (β = 0.137, z = 3.57, p < 0.001) faculty assessor predicted a higher likelihood of resident under-confidence. Senior residents were significantly less likely to be under-confident compared to junior residents (β = -0.182, z =-2.45, p < 0.05). The accuracy of self-assessments did not significantly vary during the two years of the study period. CONCLUSIONS The majority of residents' self-assessments were inaccurate. Our findings may help identify the sources of such inaccuracies.
Collapse
Affiliation(s)
- Melinda Fleming
- Department of Anesthesiology and Perioperative Medicine, Kingston Health Sciences Centre
| | - Danika Vautour
- Department of Anesthesiology and Perioperative Medicine, Kingston Health Sciences Centre
| | - Michael McMullen
- Department of Anesthesiology and Perioperative Medicine, Kingston Health Sciences Centre
| | - Nicholas Cofie
- Faculty of Health Sciences, Queens University, Ontario, Canada
| | - Nancy Dalgarno
- Faculty of Health Sciences, Queens University, Ontario, Canada
| | - Rachel Phelan
- Department of Anesthesiology and Perioperative Medicine, Kingston Health Sciences Centre
| | - Glenio B Mizubuti
- Department of Anesthesiology and Perioperative Medicine, Kingston Health Sciences Centre
| |
Collapse
|
19
|
Humphrey-Murto S, Shaw T, Touchie C, Pugh D, Cowley L, Wood TJ. Are raters influenced by prior information about a learner? A review of assimilation and contrast effects in assessment. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:1133-1156. [PMID: 33566199 DOI: 10.1007/s10459-021-10032-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 01/25/2021] [Indexed: 06/12/2023]
Abstract
Understanding which factors can impact rater judgments in assessments is important to ensure quality ratings. One such factor is whether prior performance information (PPI) about learners influences subsequent decision making. The information can be acquired directly, when the rater sees the same learner, or different learners over multiple performances, or indirectly, when the rater is provided with external information about the same learner prior to rating a performance (i.e., learner handover). The purpose of this narrative review was to summarize and highlight key concepts from multiple disciplines regarding the influence of PPI on subsequent ratings, discuss implications for assessment and provide a common conceptualization to inform research. Key findings include (a) assimilation (rater judgments are biased towards the PPI) occurs with indirect PPI and contrast (rater judgments are biased away from the PPI) with direct PPI; (b) negative PPI appears to have a greater effect than positive PPI; (c) when viewing multiple performances, context effects of indirect PPI appear to diminish over time; and (d) context effects may occur with any level of target performance. Furthermore, some raters are not susceptible to context effects, but it is unclear what factors are predictive. Rater expertise and training do not consistently reduce effects. Making raters more accountable, providing specific standards and reducing rater cognitive load may reduce context effects. Theoretical explanations for these findings will be discussed.
Collapse
Affiliation(s)
- Susan Humphrey-Murto
- Department of Medicine, Faculty of Medicine, The Ottawa Hospital-Riverside Campus, University of Ottawa, 1967 Riverside Drive, Box 67, Ottawa, ON, Canada.
- Department of Innovation in Medical Education, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.
| | - Tammy Shaw
- Department of Medicine, Faculty of Medicine, The Ottawa Hospital-General Campus, Ottawa, ON, Canada
| | - Claire Touchie
- Department of Medicine, Faculty of Medicine, The Ottawa Hospital-Riverside Campus, University of Ottawa, 1967 Riverside Drive, Box 67, Ottawa, ON, Canada
- Medical Council of Canada, Ottawa, ON, Canada
| | - Debra Pugh
- Department of Medicine, Faculty of Medicine, The Ottawa Hospital-Riverside Campus, University of Ottawa, 1967 Riverside Drive, Box 67, Ottawa, ON, Canada
- Medical Council of Canada, Ottawa, ON, Canada
| | - Lindsay Cowley
- Department of Innovation in Medical Education, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Timothy J Wood
- Department of Innovation in Medical Education, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
20
|
Effects of a Resident's Reputation on Laparoscopic Skills Assessment. Obstet Gynecol 2021; 138:16-20. [PMID: 34259459 DOI: 10.1097/aog.0000000000004426] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 02/18/2021] [Indexed: 11/26/2022]
Abstract
OBJECTIVE To quantify the effect of a resident's reputation on the assessment of their laparoscopic skills. METHODS Faculty gynecologists were randomized to receive one of three hypothetical resident scenarios: a resident with high, average, or low surgical skills. All participants were then asked to view the same video of a resident performing a laparoscopic salpingo-oophorectomy that differed only by the resident description and provide an assessment using a modified OSATS (Objective Structured Assessment of Technical Skills) and a global assessment scale. RESULTS From September 6, 2020, to October 20, 2020, a total of 43 faculty gynecologic surgeons were recruited to complete the study. Assessment scores on the modified OSATS (out of 20) and global assessment (out of 5) differed significantly according to resident description, where the high-performing resident scored highest (median scores of 15 and 4, respectively), followed by the average-performing resident (13 and 3), and finally, the low-performing resident (11 and 3) (P=.008 and .043, respectively). CONCLUSION Faculty assessment of residents in gynecologic surgery is influenced by the assessor's knowledge of the resident's past performance. This knowledge introduces bias that artificially increases scores given to those residents with favorable reputations and decreases scores given to those with reputed surgical skill deficits. These data quantify the effect of such bias in the assessment of residents in the workplace and serve as an impetus to explore systems-level interventions to mitigate bias.
Collapse
|
21
|
Valentine N, Durning S, Shanahan EM, Schuwirth L. Fairness in human judgement in assessment: a hermeneutic literature review and conceptual framework. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:713-738. [PMID: 33123837 DOI: 10.1007/s10459-020-10002-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/19/2020] [Indexed: 06/11/2023]
Abstract
Human judgement is widely used in workplace-based assessment despite criticism that it does not meet standards of objectivity. There is an ongoing push within the literature to better embrace subjective human judgement in assessment not as a 'problem' to be corrected psychometrically but as legitimate perceptions of performance. Taking a step back and changing perspectives to focus on the fundamental underlying value of fairness in assessment may help re-set the traditional objective approach and provide a more relevant way to determine the appropriateness of subjective human judgements. Changing focus to look at what is 'fair' human judgement in assessment, rather than what is 'objective' human judgement in assessment allows for the embracing of many different perspectives, and the legitimising of human judgement in assessment. However, this requires addressing the question: what makes human judgements fair in health professions assessment? This is not a straightforward question with a single unambiguously 'correct' answer. In this hermeneutic literature review we aimed to produce a scholarly knowledge synthesis and understanding of the factors, definitions and key questions associated with fairness in human judgement in assessment and a resulting conceptual framework, with a view to informing ongoing further research. The complex construct of fair human judgement could be conceptualised through values (credibility, fitness for purpose, transparency and defensibility) which are upheld at an individual level by characteristics of fair human judgement (narrative, boundaries, expertise, agility and evidence) and at a systems level by procedures (procedural fairness, documentation, multiple opportunities, multiple assessors, validity evidence) which help translate fairness in human judgement from concepts into practical components.
Collapse
Affiliation(s)
- Nyoli Valentine
- Prideaux Health Professions Education, Flinders University, Bedford Park 5042, SA, Australia.
| | - Steven Durning
- Center for Health Professions Education, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Ernst Michael Shanahan
- Prideaux Health Professions Education, Flinders University, Bedford Park 5042, SA, Australia
| | - Lambert Schuwirth
- Prideaux Health Professions Education, Flinders University, Bedford Park 5042, SA, Australia
| |
Collapse
|
22
|
Elmunzer BJ, Walsh CM, Guiton G, Serrano J, Chak A, Edmundowicz S, Kwon RS, Mullady D, Papachristou GI, Elta G, Baron TH, Yachimski P, Fogel E, Draganov PV, Taylor J, Scheiman J, Singh V, Varadarajulu S, Willingham FF, Cote G, Cotton PB, Simon V, Spitzer R, Keswani R, Wani S. Development and initial validation of an instrument for video-based assessment of technical skill in ERCP. Gastrointest Endosc 2021; 93:914-923. [PMID: 32739484 PMCID: PMC8961206 DOI: 10.1016/j.gie.2020.07.055] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/24/2020] [Indexed: 12/11/2022]
Abstract
BACKGROUND AND AIMS The accurate measurement of technical skill in ERCP is essential for endoscopic training, quality assurance, and coaching of this procedure. Hypothesizing that technical skill can be measured by analysis of ERCP videos, we aimed to develop and validate a video-based ERCP skill assessment tool. METHODS Based on review of procedural videos, the task of ERCP was deconstructed into its basic components by an expert panel that developed an initial version of the Bethesda ERCP Skill Assessment Tool (BESAT). Subsequently, 2 modified Delphi panels and 3 validation exercises were conducted with the goal of iteratively refining the tool. Fully crossed generalizability studies investigated the contributions of assessors, ERCP performance, and technical elements to reliability. RESULTS Twenty-nine technical elements were initially generated from task deconstruction. Ultimately, after iterative refinement, the tool comprised 6 technical elements and 11 subelements. The developmental process achieved consistent improvements in the performance characteristics of the tool with every iteration. For the most recent version of the tool, BESAT-v4, the generalizability coefficient (a reliability index) was .67. Most variance in BESAT scores (43.55%) was attributed to differences in endoscopists' skill, indicating that the tool can reliably differentiate between endoscopists based on video analysis. CONCLUSIONS Video-based assessment of ERCP skill appears to be feasible with a novel instrument that demonstrates favorable validity evidence. Future steps include determining whether the tool can discriminate between endoscopists of varying experience levels and predict important outcomes in clinical practice.
Collapse
Affiliation(s)
- B. Joseph Elmunzer
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
| | - Catharine M Walsh
- Division of Gastroenterology, Hepatology, and Nutrition, Learning Institute and Research Institute, Hospital for Sick Children, Toronto, Canada
| | - Gretchen Guiton
- Department of Internal Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Jose Serrano
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Amitabh Chak
- Division of Gastroenterology and Liver Disease, Case Western Reserve University, Cleveland, OH, USA
| | - Steven Edmundowicz
- Division of Gastroenterology and Hepatology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Richard S. Kwon
- Division of Gastroenterology, University of Michigan, Ann Arbor, MI, USA
| | - Daniel Mullady
- Division of Gastroenterology, Washington University School of Medicine, St Louis, Missouri, USA
| | - Georgios I. Papachristou
- Division of Gastroenterology, Hepatology, and Nutrition, Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Grace Elta
- Division of Gastroenterology, University of Michigan, Ann Arbor, MI, USA
| | - Todd H. Baron
- Division of Gastroenterology and Hepatology, University of North Carolina, Chapel Hill, NC, USA
| | - Patrick Yachimski
- Division of Gastroenterology, Vanderbilt University, Nashville, TN, USA
| | - Evan Fogel
- Division of Gastroenterology and Hepatology, Indiana University, Indianapolis, IN, USA
| | - Peter V. Draganov
- Division of Gastroenterology, Hepatology, and Nutrition, University of Florida, Gainesville, FL, USA
| | - Jason Taylor
- Division of Gastroenterology and Hepatology, Saint Louis University, Saint Louis, MO, USA
| | - James Scheiman
- Division of Gastroenterology and Hepatology, University of Virginia, Charlottesville, VA, USA
| | - Vikesh Singh
- Division of Gastroenterology, Johns Hopkins Medical Institutions, Baltimore, MD, USA
| | | | | | - Gregory Cote
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
| | - Peter B. Cotton
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
| | - Violette Simon
- Division of Gastroenterology and Hepatology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Rebecca Spitzer
- Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
| | - Rajesh Keswani
- Division of Gastroenterology, Northwestern University, Chicago, IL, USA
| | - Sachin Wani
- Division of Gastroenterology and Hepatology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | |
Collapse
|
23
|
Malau-Aduli BS, Hays RB, D'Souza K, Smith AM, Jones K, Turner R, Shires L, Smith J, Saad S, Richmond C, Celenza A, Sen Gupta T. Examiners' decision-making processes in observation-based clinical examinations. MEDICAL EDUCATION 2021; 55:344-353. [PMID: 32810334 DOI: 10.1111/medu.14357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/08/2020] [Accepted: 08/14/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND Objective structured clinical examinations (OSCEs) are commonly used to assess the clinical skills of health professional students. Examiner judgement is one acknowledged source of variation in candidate marks. This paper reports an exploration of examiner decision making to better characterise the cognitive processes and workload associated with making judgements of clinical performance in exit-level OSCEs. METHODS Fifty-five examiners for exit-level OSCEs at five Australian medical schools completed a NASA Task Load Index (TLX) measure of cognitive load and participated in focus group interviews immediately after the OSCE session. Discussions focused on how decisions were made for borderline and clear pass candidates. Interviews were transcribed, coded and thematically analysed. NASA TLX results were quantitatively analysed. RESULTS Examiners self-reported higher cognitive workload levels when assessing a borderline candidate in comparison with a clear pass candidate. Further analysis revealed five major themes considered by examiners when marking candidate performance in an OSCE: (a) use of marking criteria as a source of reassurance; (b) difficulty adhering to the marking sheet under certain conditions; (c) demeanour of candidates; (d) patient safety, and (e) calibration using a mental construct of the 'mythical [prototypical] intern'. Examiners demonstrated particularly higher mental demand when assessing borderline compared to clear pass candidates. CONCLUSIONS Examiners demonstrate that judging candidate performance is a complex, cognitively difficult task, particularly when performance is of borderline or lower standard. At programme exit level, examiners intuitively want to rate candidates against a construct of a prototypical graduate when marking criteria appear not to describe both what and how a passing candidate should demonstrate when completing clinical tasks. This construct should be shared, agreed upon and aligned with marking criteria to best guide examiner training and calibration. Achieving this integration may improve the accuracy and consistency of examiner judgements and reduce cognitive workload.
Collapse
Affiliation(s)
- Bunmi S Malau-Aduli
- College of Medicine and Dentistry, James Cook University, Townsville, QLD, Australia
| | - Richard B Hays
- College of Medicine and Dentistry, James Cook University, Townsville, QLD, Australia
| | - Karen D'Souza
- School of Medicine, Deakin University, Geelong, VIC, Australia
| | - Amy M Smith
- College of Medicine and Dentistry, James Cook University, Townsville, QLD, Australia
| | - Karina Jones
- College of Medicine and Dentistry, James Cook University, Townsville, QLD, Australia
| | - Richard Turner
- School of Medicine, University of Tasmania, Hobart, TAS, Australia
| | - Lizzi Shires
- School of Medicine, University of Tasmania, Hobart, TAS, Australia
| | - Jane Smith
- Medical Program, Bond University, Gold Coast, QLD, Australia
| | - Shannon Saad
- School of Medicine, Notre Dame University, Sydney, NSW, Australia
| | | | - Antonio Celenza
- School of Medicine, University of Western Australia, Perth, WA, Australia
| | - Tarun Sen Gupta
- College of Medicine and Dentistry, James Cook University, Townsville, QLD, Australia
| |
Collapse
|
24
|
Roy M, Wojcik J, Bartman I, Smee S. Augmenting physician examiner scoring in objective structured clinical examinations: including the standardized patient perspective. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:313-328. [PMID: 32816242 DOI: 10.1007/s10459-020-09987-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 08/17/2020] [Indexed: 06/11/2023]
Abstract
In Canada, high stakes objective structured clinical examinations (OSCEs) administered by the Medical Council of Canada have relied exclusively on physician examiners (PEs) for scoring. Prior research has looked at using SPs to replace PEs. This paper reports on two studies that implement and evaluate a standardized patient (SP) scoring tool to augment PE scoring. The unique aspect of this study is that it explores the benefits of combining SP and PE scores. SP focus groups developed rating scales for four dimensions they labelled: Listening, Communication, Empathy/Rapport, and Global Impression. In Study I, 43 SPs from one site of a national PE-scored OSCE rated 60 examinees with the initial SP rating scales. In Study II, 137 SPs used slightly revised rating scales with optional narrative comments to score 275 examinees at two sites. Examinees were blinded to SP scoring and SP ratings did not count. Separate PE and SP scoring was examined using descriptive statistics and correlations. Combinations of SP and PE scoring were assessed using pass-rates, reliability, and decision consistency and accuracy indices. In Study II, SP and PE comments were examined. SPs showed greater variability in their scoring, and rated examinees lower than PEs on common elements, resulting in slightly lower pass rates when combined. There was a moderate tendency for both SPs and PEs to make negative comments for the same examinee but for different reasons. We argue that SPs and PE assess performance from different perspectives, and that combining scores from both augments overall reliability of scores and pass/fail decisions. There is potential to provide examinees with feedback comments from each group.
Collapse
Affiliation(s)
- Marguerite Roy
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada.
| | - Josée Wojcik
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada
| | - Ilona Bartman
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada
| | - Sydney Smee
- Medical Council of Canada, 1021 Thomas Spratt Place, Ottawa, ON, K1G 5L5, Canada
| |
Collapse
|
25
|
Leenstra NF, Jung OC, Cnossen F, Jaarsma ADC, Tulleken JE. Development and Evaluation of the Taxonomy of Trauma Leadership Skills-Shortened for Observation and Reflection in Training: A Practical Tool for Observing and Reflecting on Trauma Leadership Performance. Simul Healthc 2021; 16:37-45. [PMID: 32732816 PMCID: PMC7850591 DOI: 10.1097/sih.0000000000000474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
INTRODUCTION Trauma leadership skills are increasingly being addressed in trauma courses, but few resources are available to systematically observe and debrief trainees' performances. The authors therefore translated their previously developed, extensive Taxonomy of Trauma Leadership Skills (TTLS) into a practical observation tool that is tailored to the vocabulary of clinician instructors and their workflow and workload during simulation-based training. METHODS In 2016 to 2018, the TTLS was subjected to practical evaluation in an iterative process of 2 stages. In the first stage, testing panels of trauma specialists observed excerpts from videotaped simulations and indicated from the list of elements which behaviors they felt were being shown. Any ambiguities or redundancy were addressed by rephrasing or combining elements. In the second stage, iterations were used in actual scenario training to observe and debrief trainees' performances. The instructors' recommendations resulted in further improvements of clarity, ease of use, and usefulness, until no new suggestions were raised. RESULTS The resultant "TTLS-Shortened for Observation and Reflection in Training" was given a simpler structure and more concrete and self-explanatory benchmarks. It contains 6 skill categories for evaluation, each with 4 to 6 benchmark behaviors. CONCLUSIONS The TTLS-Shortened for Observation and Reflection in Training is an important addition to other trauma assessment tools because of its specific focus on leadership skills. It helps set concrete performance expectations, simplify note taking, and target observations and debriefings. One central challenge was striking a balance between its conciseness and specificity. The authors reflected on how the decisions for the resultant structure ease and leverage the conduct of observations and performance debriefing.
Collapse
|
26
|
Wilby KJ, Paravattil B. Cognitive load theory: Implications for assessment in pharmacy education. Res Social Adm Pharm 2020; 17:1645-1649. [PMID: 33358136 DOI: 10.1016/j.sapharm.2020.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 11/09/2020] [Accepted: 12/15/2020] [Indexed: 11/28/2022]
Abstract
The concept of mental workload is well studied from a learner's perspective but has yet to be better understood from the perspective of an assessor. Mental workload is largely associated with cognitive load theory, which describes three different types of load. Intrinsic load deals with the complexity of the task, extraneous load describes distractors to the task at hand, and germane load focuses on the development of schemas in working memory for future recall. Studies from medical education show that all three types of load are relevant when considering rater -based assessment (e.g. Objective Structured Clinical Examinations (OSCEs), or experiential training). Assessments with high intrinsic and extraneous load may interfere with assessors' attention and working memory and result in poorer quality assessment. Reducing these loads within assessment tasks should therefore be a priority for pharmacy educators. This commentary aims to provide a theoretical overview of mental workload in assessment, outline research findings from the medical education context, and propose strategies to be considered for reducing mental workload in rater-based assessments relevant to pharmacy education. Suggestions for future research are also addressed.
Collapse
Affiliation(s)
- Kyle John Wilby
- School of Pharmacy, University of Otago, PO Box 56, Dunedin, 9054, New Zealand.
| | | |
Collapse
|
27
|
Hyde C, Yardley S, Lefroy J, Gay S, McKinley RK. Clinical assessors' working conceptualisations of undergraduate consultation skills: a framework analysis of how assessors make expert judgements in practice. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2020; 25:845-875. [PMID: 31997115 PMCID: PMC7471149 DOI: 10.1007/s10459-020-09960-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 01/18/2020] [Indexed: 06/10/2023]
Abstract
Undergraduate clinical assessors make expert, multifaceted judgements of consultation skills in concert with medical school OSCE grading rubrics. Assessors are not cognitive machines: their judgements are made in the light of prior experience and social interactions with students. It is important to understand assessors' working conceptualisations of consultation skills and whether they could be used to develop assessment tools for undergraduate assessment. To identify any working conceptualisations that assessors use while assessing undergraduate medical students' consultation skills and develop assessment tools based on assessors' working conceptualisations and natural language for undergraduate consultation skills. In semi-structured interviews, 12 experienced assessors from a UK medical school populated a blank assessment scale with personally meaningful descriptors while describing how they made judgements of students' consultation skills (at exit standard). A two-step iterative thematic framework analysis was performed drawing on constructionism and interactionism. Five domains were found within working conceptualisations of consultation skills: Application of knowledge; Manner with patients; Getting it done; Safety; and Overall impression. Three mechanisms of judgement about student behaviour were identified: observations, inferences and feelings. Assessment tools drawing on participants' conceptualisations and natural language were generated, including 'grade descriptors' for common conceptualisations in each domain by mechanism of judgement and matched to grading rubrics of Fail, Borderline, Pass, Very good. Utilising working conceptualisations to develop assessment tools is feasible and potentially useful. Work is needed to test impact on assessment quality.
Collapse
Affiliation(s)
- Catherine Hyde
- School of Medicine, Keele University, Keele, Staffordshire, ST5 5BG, UK
| | - Sarah Yardley
- School of Medicine, Keele University, Keele, Staffordshire, ST5 5BG, UK.
- Palliative Care Service, Central and North West London NHS Foundation Trust, St Pancras Hospital, 5th Floor South Wing, 4 St. Pancras Way, London, NW1 0PE, UK.
| | - Janet Lefroy
- School of Medicine, Keele University, Keele, Staffordshire, ST5 5BG, UK
| | - Simon Gay
- University of Leicester School of Medicine, Leicester, UK
| | - Robert K McKinley
- School of Medicine, Keele University, Keele, Staffordshire, ST5 5BG, UK
| |
Collapse
|
28
|
Homer M, Fuller R, Hallam J, Pell G. Shining a spotlight on scoring in the OSCE: Checklists and item weighting. MEDICAL TEACHER 2020; 42:1037-1042. [PMID: 32608303 DOI: 10.1080/0142159x.2020.1781072] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Introduction: There has been a long-running debate about the validity of item-based checklist scoring of performance assessments like OSCEs. In recent years, the conception of a checklist has developed from its dichotomous inception into a more 'key-features' and/or chunked approach, where 'items' have the potential to become weighted differently, but the literature does not always reflect these broader conceptions.Methods: We consider theoretical, design and (clinically trained) assessor issues related to differential item weighting in checklist scoring of OSCEs stations. Using empirical evidence, this work also compares candidate decisions and psychometric quality of different item-weighting approaches (i.e. a simple 'unweighted' scheme versus a differentially weighted one).Results: The impact of different weighting schemes affect approximately 30% of the key borderline group of candidates, and 3% of candidates overall. We also find that measures of overall assessment quality are a little better under the differentially weighted scoring system.Discussion and conclusion: Differentially weighted modern checklists can contribute to valid assessment outcomes, and bring a range of additional benefits to the assessment. Judgment about weighting of particular items should be considered a key design consideration during station development and must align to clinical assessor expectations of the relative importance of sub-tasks.
Collapse
Affiliation(s)
- Matt Homer
- Leeds Institute of Medical Education, School of Medicine, University of Leeds, Leeds, UK
| | - Richard Fuller
- School of Medicine, University of Liverpool, Liverpool, UK
| | - Jennifer Hallam
- Leeds Institute of Medical Education, School of Medicine, University of Leeds, Leeds, UK
| | - Godfrey Pell
- Leeds Institute of Medical Education, School of Medicine, University of Leeds, Leeds, UK
| |
Collapse
|
29
|
Prediger S, Schick K, Fincke F, Fürstenberg S, Oubaid V, Kadmon M, Berberat PO, Harendza S. Validation of a competence-based assessment of medical students' performance in the physician's role. BMC MEDICAL EDUCATION 2020; 20:6. [PMID: 31910843 PMCID: PMC6947905 DOI: 10.1186/s12909-019-1919-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/22/2019] [Indexed: 05/04/2023]
Abstract
BACKGROUND Assessing competence of advanced undergraduate medical students based on performance in the clinical context is the ultimate, yet challenging goal for medical educators to provide constructive alignment between undergraduate medical training and professional work of physicians. Therefore, we designed and validated a performance-based 360-degree assessment for competences of advanced undergraduate medical students. METHODS This study was conducted in three steps: 1) Ten facets of competence considered to be most important for beginning residents were determined by a ranking study with 102 internists and 100 surgeons. 2) Based on these facets of competence we developed a 360-degree assessment simulating a first day of residency. Advanced undergraduate medical students (year 5 and 6) participated in the physician's role. Additionally knowledge was assessed by a multiple-choice test. The assessment was performed twice (t1 and t2) and included three phases: a consultation hour, a patient management phase, and a patient handover. Sixty-seven (t1) and eighty-nine (t2) undergraduate medical students participated. 3) The participants completed the Group Assessment of Performance (GAP)-test for flight school applicants to assess medical students' facets of competence in a non-medical context for validation purposes. We aimed to provide a validity argument for our newly designed assessment based on Messick's six aspects of validation: (1) content validity, (2) substantive/cognitive validity, (3) structural validity, (4) generalizability, (5) external validity, and (6) consequential validity. RESULTS Our assessment proved to be well operationalised to enable undergraduate medical students to show their competences in performance on the higher levels of Bloom's taxonomy. Its generalisability was underscored by its authenticity in respect of workplace reality and its underlying facets of competence relevant for beginning residents. The moderate concordance with facets of competence of the validated GAP-test provides arguments of convergent validity for our assessment. Since five aspects of Messick's validation approach could be defended, our competence-based 360-degree assessment format shows good arguments for its validity. CONCLUSION According to these validation arguments, our assessment instrument seems to be a good option to assess competence in advanced undergraduate medical students in a summative or formative way. Developments towards assessment of postgraduate medical trainees should be explored.
Collapse
Affiliation(s)
- Sarah Prediger
- III. Department of Internal Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Kristina Schick
- TUM Medical Education Center, School of Medicine, Technical University of Munich, Munich, Germany
| | - Fabian Fincke
- Department of Medical Education and Educational Research, Faculty of Medicine and Health Science, University of Oldenburg, Oldenburg, Germany
| | - Sophie Fürstenberg
- III. Department of Internal Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | | | - Martina Kadmon
- Faculty of Medicine, University of Augsburg, Deanery, Augsburg, Germany
| | - Pascal O. Berberat
- TUM Medical Education Center, School of Medicine, Technical University of Munich, Munich, Germany
| | - Sigrid Harendza
- III. Department of Internal Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
30
|
Paravattil B, Wilby KJ. Optimizing assessors' mental workload in rater-based assessment: a critical narrative review. PERSPECTIVES ON MEDICAL EDUCATION 2019; 8:339-345. [PMID: 31728841 PMCID: PMC6904389 DOI: 10.1007/s40037-019-00535-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
INTRODUCTION Rater-based assessment has resulted in high cognitive demands for assessors within the education of health professionals. Rating quality may be influenced by the mental workload required of assessors to complete rating tasks. The objective of this review was to explore interventions or strategies aimed at measuring and reducing mental workload for improvement in assessment outcomes in health professions education. METHODS A critical narrative review was conducted for English-language articles using the databases PubMed, EMBASE, and Google Scholar from conception until November 2018. To be included, articles were eligible if they reported results of interventions aimed at measuring or reducing mental workload in rater-based assessment. RESULTS A total of six articles were included in the review. All studies were conducted in simulation settings (OSCEs or videotaped interactions). Of the four studies that measured mental workload, none found any reduction in mental workload as demonstrated by objective secondary task performance after interventions of assessor training or reductions in competency dimension assessment. Reductions in competency dimensions, however, did result in improvements in assessment quality across three studies. DISCUSSION The concept of mental workload in assessment in medical education needs further exploration, including investigation into valid measures of assessors' mental workload. It appears that adjusting raters' focus may be a valid strategy to improve assessment outcomes. Future research should be designed to inform how to best reduce load in assessments to improve quality, while balancing the type and quantity of data needed for judgments.
Collapse
Affiliation(s)
| | - Kyle John Wilby
- School of Pharmacy, University of Otago, Dunedin, New Zealand.
| |
Collapse
|
31
|
van Andel CEE, Born MP, Themmen APN, Stegers‐Jager KM. Broadly sampled assessment reduces ethnicity-related differences in clinical grades. MEDICAL EDUCATION 2019; 53:264-275. [PMID: 30680783 PMCID: PMC6590164 DOI: 10.1111/medu.13790] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 09/10/2018] [Accepted: 11/09/2018] [Indexed: 05/30/2023]
Abstract
CONTEXT Ethnicity-related differences in clinical grades exist. Broad sampling in assessment of clinical competencies involves multiple assessments used by multiple assessors across multiple moments. Broad sampling in assessment potentially reduces irrelevant variances and may therefore mitigate ethnic disparities in clinical grades. OBJECTIVES Research question 1 (RQ1): to assess whether the relationship between students' ethnicity and clinical grades is weaker in a broadly sampled versus a global assessment. Research question 2 (RQ2): to assess whether larger ethnicity-related differences in grades occur when supervisors are given the opportunity to deviate from the broadly sampled assessment score. METHODS Students' ethnicity was classified as Turkish/Moroccan/African, Surinamese/Antillean, Asian, Western, and native Dutch. RQ1: 1667 students (74.3% native Dutch students) were included, who entered medical school between 2002 and 2004 (global assessment, 818 students) and between 2008 and 2010 (broadly sampled assessment, 849 students). The main outcome measure was whether or not students received ≥3 times a grade of 8 or higher on a scale from 1 to 10 in five clerkships. RQ2: 849 students (72.4% native Dutch students) were included, who were assessed by broad sampling. The main outcome measure was the number of grade points by which supervisors had deviated from broadly sampled scores. Both analyses were adjusted for gender, age, (im)migration status and average bachelor grade. RESULTS Research question 1: ethnicity-related differences in clinical grades were smaller in broadly sampled than in global assessment, and this was also seen after adjustments. More specifically, native Dutch students had reduced probabilities (0.87-0.65) in broadly sampled as compared with global assessment, whereas Surinamese (0.03-0.51) and Asian students (0.21-0.30) had increased probabilities of having ≥3 times a grade of 8 or higher in five clerkships. Research question 2: when supervisors were allowed to deviate from original grades, ethnicity-related differences in clinical grades were reintroduced. CONCLUSIONS Broadly sampled assessment reduces ethnicity-related differences in grades.
Collapse
Affiliation(s)
| | - Marise Ph Born
- Department of PsychologyErasmus University RotterdamRotterdamthe Netherlands
| | - Axel P N Themmen
- Institute of Medical Education Research RotterdamErasmus MCRotterdamthe Netherlands
- Department of Internal MedicineErasmus University RotterdamRotterdamthe Netherlands
| | | |
Collapse
|
32
|
Lee V, Brain K, Martin J. From opening the 'black box' to looking behind the curtain: cognition and context in assessor-based judgements. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2019; 24:85-102. [PMID: 30302670 DOI: 10.1007/s10459-018-9851-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/06/2018] [Indexed: 06/08/2023]
Abstract
The increasing use of direct observation tools to assess routine performance has resulted in the growing reliance on assessor-based judgements in the workplace. However, we have a limited understanding of how assessors make judgements and formulate ratings in real world contexts. The current research on assessor cognition has largely focused on the cognitive domain but the contextual factors are equally important, and both are closely interconnected. This study aimed to explore the perceived cognitive and contextual factors influencing Mini-CEX assessor judgements in the Emergency Department setting. We used a conceptual framework of assessor-based judgement to develop a sequential mixed methods study. We analysed and integrated survey and focus group results to illustrate self-reported cognitive and contextual factors influencing assessor judgements. We used situated cognition theory as a sensitizing lens to explore the interactions between people and their environment. The major factors highlighted through our mixed methods study were: clarity of the assessment, reliance on and variable approach to overall impression (gestalt), role tension especially when giving constructive feedback, prior knowledge of the trainee and case complexity. We identified prevailing tensions between participants (assessors and trainees), interactions (assessment and feedback) and setting. The two practical implications of our research are the need to broaden assessor training to incorporate both cognitive and contextual domains, and the need to develop a more holistic understanding of assessor-based judgements in real world contexts to better inform future research and development in workplace-based assessments.
Collapse
Affiliation(s)
- Victor Lee
- Department of Emergency Medicine, Austin Health, P.O. Box 5555, Heidelberg, VIC, 3084, Australia.
| | | | - Jenepher Martin
- Eastern Health Clinical School, Monash University and Deakin University, Box Hill, VIC, Australia
| |
Collapse
|
33
|
Wood TJ, Pugh D, Touchie C, Chan J, Humphrey-Murto S. Can physician examiners overcome their first impression when examinee performance changes? ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2018; 23:721-732. [PMID: 29556923 DOI: 10.1007/s10459-018-9823-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 03/15/2018] [Indexed: 06/08/2023]
Abstract
There is an increasing focus on factors that influence the variability of rater-based judgments. First impressions are one such factor. First impressions are judgments about people that are made quickly and are based on little information. Under some circumstances, these judgments can be predictive of subsequent decisions. A concern for both examinees and test administrators is whether the relationship remains stable when the performance of the examinee changes. That is, once a first impression is formed, to what degree will an examiner be willing to modify it? The purpose of this study is to determine the degree that first impressions influence final ratings when the performance of examinees changes within the context of an objective structured clinical examination (OSCE). Physician examiners (n = 29) viewed seven videos of examinees (i.e., actors) performing a physical exam on a single OSCE station. They rated the examinees' clinical abilities on a six-point global rating scale after 60 s (first impression or FIGR). They then observed the examinee for the remainder of the station and provided a final global rating (GRS). For three of the videos, the examinees' performance remained consistent throughout the videos. For two videos, examinee performance changed from initially strong to weak and for two videos, performance changed from initially weak to strong. The mean FIGR rating for the Consistent condition (M = 4.80) and the Strong to Weak condition (M = 4.87) were higher compared to their respective GRS ratings (M = 3.93, M = 2.73) with a greater decline for the Strong to Weak condition. The mean FIGR rating for the Weak to Strong condition was lower (3.60) than the corresponding mean GRS (4.81). This pattern of findings suggests that raters were willing to change their judgments based on examinee performance. Future work should explore the impact of making a first impression judgment explicit versus implicit and the role of context on the relationship between a first impression and a subsequent judgment.
Collapse
Affiliation(s)
- Timothy J Wood
- Department of Innovation in Medical Education, Faculty of Medicine, University of Ottawa, PMC 102H, 850 Peter Morand Crescent, Ottawa, ON, K1G-573, Canada.
| | - Debra Pugh
- Department of Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Claire Touchie
- Department of Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - James Chan
- Department of Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| | - Susan Humphrey-Murto
- Department of Innovation in Medical Education, Faculty of Medicine, University of Ottawa, PMC 102H, 850 Peter Morand Crescent, Ottawa, ON, K1G-573, Canada
- Department of Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Canada
| |
Collapse
|
34
|
Tavares W, Sadowski A, Eva KW. Asking for Less and Getting More: The Impact of Broadening a Rater's Focus in Formative Assessment. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2018; 93:1584-1590. [PMID: 29794523 DOI: 10.1097/acm.0000000000002294] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
PURPOSE There may be unintended consequences of broadening the competencies across which health professions trainees are assessed. This study was conducted to determine whether such broadening influences the formative guidance assessors provide to trainees and to test whether sequential collection of competency-specific assessment can overcome setbacks of simultaneous collection. METHOD A randomized between-subjects experimental design, conducted in Toronto and Halifax, Canada, in 2016-2017 with paramedic educators experienced in observing/rating, in which observers' focus was manipulated. In the simultaneous condition, participants rated four unscripted (i.e., spontaneously generated) clinical performances using a six-dimension global rating scale and provided feedback. In three sequential conditions, participants were asked to rate the same performances and provide feedback but for only two of the six dimensions. Participants from these conditions were randomly merged to create a "full score" and set of feedback statements for each candidate. RESULTS Eighty-seven raters completed the study; 23 in the simultaneous condition and 21 or 22 for each pair of dimensions in the sequential conditions. After randomly merging participants, there were 21 "full scores" in the sequential condition. Compared with the sequential condition, participants in the simultaneous condition demonstrated reductions in the amount of unique feedback provided, increased likelihood of ignoring some dimensions of performance, lessened variety of feedback, and reduced reliability. CONCLUSIONS Sequential or distributed assessment strategies in which raters are asked to focus on less may provide more effective assessment by overcoming the unintended consequences of asking raters to spread their attention thinly over many dimensions of competence.
Collapse
Affiliation(s)
- Walter Tavares
- W. Tavares is scientist and assistant professor, Wilson Centre, Department of Medicine and Post-MD Education, University Health Network/University of Toronto Faculty of Medicine, Toronto, Ontario, Canada, and clinician scientist, Department of Community and Health Services, Paramedic and Senior Services, Regional Municipality of York, Newmarket, Ontario, Canada. A. Sadowski is a research associate, Wilson Centre, University Health Network/University of Toronto Faculty of Medicine, Toronto, Ontario, Canada. K.W. Eva is senior scientist, Centre for Health Education Scholarship, and professor, Department of Medicine, University of British Columbia Faculty of Medicine, Vancouver, British Columbia, Canada
| | | | | |
Collapse
|
35
|
Eva KW. Cognitive Influences on Complex Performance Assessment: Lessons from the Interplay between Medicine and Psychology. JOURNAL OF APPLIED RESEARCH IN MEMORY AND COGNITION 2018. [DOI: 10.1016/j.jarmac.2018.03.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
36
|
Scaffidi MA, Grover SC, Carnahan H, Yu JJ, Yong E, Nguyen GC, Ling SC, Khanna N, Walsh CM. A prospective comparison of live and video-based assessments of colonoscopy performance. Gastrointest Endosc 2018; 87:766-775. [PMID: 28859953 DOI: 10.1016/j.gie.2017.08.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 08/20/2017] [Indexed: 02/08/2023]
Abstract
BACKGROUND AND AIMS Colonoscopy performance is typically assessed by a supervisor in the clinical setting. There are limitations of this approach, however, because it allows for rater bias and increases supervisor workload demand during the procedure. Video-based assessment of recorded procedures has been proposed as a complementary means by which to assess colonoscopy performance. This study sought to investigate the reliability, validity, and feasibility of video-based assessments of competence in performing colonoscopy compared with live assessment. METHODS Novice (<50 previous colonoscopies), intermediate (50-500), and experienced (>1000) endoscopists from 5 hospitals participated. Two views of each colonoscopy were videotaped: an endoscopic (intraluminal) view and a recording of the endoscopist's hand movements. Recorded procedures were independently assessed by 2 blinded experts using the Gastrointestinal Endoscopy Competency Assessment Tool (GiECAT), a validated procedure-specific assessment tool comprising a global rating scale (GRS) and checklist (CL). Live ratings were conducted by a non-blinded expert endoscopist. Outcomes included agreement between live and blinded video-based ratings of clinical colonoscopies, intra-rater reliability, inter-rater reliability and discriminative validity of video-based assessments, and perceived ease of assessment. RESULTS Forty endoscopists participated (20 novices, 10 intermediates, and 10 experienced). There was good agreement between the live and video-based ratings (total, intra-class correlation [ICC] = 0.847; GRS, ICC = 0.868; CL, ICC = 0.749). Intra-rater reliability was excellent (total, ICC = 0.99; GRS, ICC = 0.99; CL, ICC = 0.98). Inter-rater reliability between the 2 blinded video-based raters was high (total, ICC = 0.91; GRS, ICC = 0.918; CL, ICC = 0.862). GiECAT total, GRS, and CL scores differed significantly among novice, intermediate, and experienced endoscopists (P < .001). Video-based assessments were perceived as "fairly easy," although live assessments were rated as significantly easier (P < .001). CONCLUSIONS Video-based assessments of colonoscopy procedures using the GiECAT have strong evidence of reliability and validity. In addition, assessments using videos were feasible, although live assessments were easier.
Collapse
Affiliation(s)
- Michael A Scaffidi
- Division of Gastroenterology, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Samir C Grover
- Division of Gastroenterology, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada; Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Heather Carnahan
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
| | - Jeffrey J Yu
- Wilson Centre, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Elaine Yong
- Division of Gastroenterology, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, Ontario, Canada; Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Geoffrey C Nguyen
- Division of Gastroenterology, Mount Sinai Hospital University of Toronto, Toronto, Ontario, Canada; Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Simon C Ling
- Department of Paediatrics, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; Division of Gastroenterology, Hepatology and Nutrition, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
| | - Nitin Khanna
- Division of Gastroenterology, St. Joseph's Health Centre, University of Western Ontario, London, Ontario, Canada
| | - Catharine M Walsh
- Department of Paediatrics, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; Division of Gastroenterology, Hepatology and Nutrition, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada; Wilson Centre, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
37
|
Gomez-Garibello C, Young M. Emotions and assessment: considerations for rater-based judgements of entrustment. MEDICAL EDUCATION 2018; 52:254-262. [PMID: 29119582 DOI: 10.1111/medu.13476] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/03/2017] [Accepted: 09/08/2017] [Indexed: 06/07/2023]
Abstract
CONTEXT Assessment is subject to increasing scrutiny as medical education transitions towards a competency-based medical education (CBME) model. Traditional perspectives on the roles of assessment emphasise high-stakes, summative assessment, whereas CBME argues for formative assessment. Revisiting conceptualisations about the roles and formats of assessment in medical education provides opportunities to examine understandings and expectations of the assessment of learners. The act of the rater generating scores might be considered as an exclusively cognitive exercise; however, current literature has drawn attention to the notion of raters as measurement instruments, thereby attributing additional factors to their decision-making processes, such as social considerations and intuition. However, the literature has not comprehensively examined the influence of raters' emotions during assessment. In this narrative review, we explore the influence of raters' emotions in the assessment of learners. METHODS We summarise existing literature that describes the role of emotions in assessment broadly, and rater-based assessment specifically, across a variety of fields. The literature related to emotions and assessment is examined from different perspectives, including those of educational context, decision making and rater cognition. We use the concept of entrustable professional activities (EPAs) to contextualise a discussion of the ways in which raters' emotions may have meaningful impacts on the decisions they make in clinical settings. This review summarises findings from different perspectives and identifies areas for consideration for the role of emotion in rater-based assessment, and areas for future research. CONCLUSIONS We identify and discuss three different interpretations of the influence of raters' emotions during assessments: (i) emotions lead to biased decision making; (ii) emotions contribute random noise to assessment, and (iii) emotions constitute legitimate sources of information that contribute to assessment decisions. We discuss these three interpretations in terms of areas for future research and implications for assessment.
Collapse
Affiliation(s)
- Carlos Gomez-Garibello
- Centre for Medical Education, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Meredith Young
- Centre for Medical Education, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
38
|
Wilbur K. Does faculty development influence the quality of in-training evaluation reports in pharmacy? BMC MEDICAL EDUCATION 2017; 17:222. [PMID: 29157239 PMCID: PMC5697106 DOI: 10.1186/s12909-017-1054-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/02/2017] [Indexed: 06/02/2023]
Abstract
BACKGROUND In-training evaluation reports (ITERs) of student workplace-based learning are completed by clinical supervisors across various health disciplines. However, outside of medicine, the quality of submitted workplace-based assessments is largely uninvestigated. This study assessed the quality of ITERs in pharmacy and whether clinical supervisors could be trained to complete higher quality reports. METHODS A random sample of ITERs submitted in a pharmacy program during 2013-2014 was evaluated. These ITERs served as a historical control (control group 1) for comparison with ITERs submitted in 2015-2016 by clinical supervisors who participated in an interactive faculty development workshop (intervention group) and those who did not (control group 2). Two trained independent raters scored the ITERs using a previously validated nine-item scale assessing report quality, the Completed Clinical Evaluation Report Rating (CCERR). The scoring scale for each item is anchored at 1 ("not at all") and 5 ("exemplary"), with 3 categorized as "acceptable". RESULTS Mean CCERR score for reports completed after the workshop (22.9 ± 3.39) did not significantly improve when compared to prospective control group 2 (22.7 ± 3.63, p = 0.84) and were worse than historical control group 1 (37.9 ± 8.21, p = 0.001). Mean item scores for individual CCERR items were below acceptable thresholds for 5 of the 9 domains in control group 1, including supervisor documented evidence of specific examples to clearly explain weaknesses and concrete recommendations for student improvement. Mean item scores for individual CCERR items were below acceptable thresholds for 6 and 7 of the 9 domains in control group 2 and the intervention group, respectively. CONCLUSIONS This study is the first using CCERR to evaluate ITER quality outside of medicine. Findings demonstrate low baseline CCERR scores in a pharmacy program not demonstrably changed by a faculty development workshop, but strategies are identified to augment future rater training.
Collapse
Affiliation(s)
- Kerry Wilbur
- College of Pharmacy, Qatar University, PO Box 2713, Doha, Qatar.
| |
Collapse
|
39
|
Kogan JR, Hatala R, Hauer KE, Holmboe E. Guidelines: The do's, don'ts and don't knows of direct observation of clinical skills in medical education. PERSPECTIVES ON MEDICAL EDUCATION 2017; 6:286-305. [PMID: 28956293 PMCID: PMC5630537 DOI: 10.1007/s40037-017-0376-7] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
INTRODUCTION Direct observation of clinical skills is a key assessment strategy in competency-based medical education. The guidelines presented in this paper synthesize the literature on direct observation of clinical skills. The goal is to provide a practical list of Do's, Don'ts and Don't Knows about direct observation for supervisors who teach learners in the clinical setting and for educational leaders who are responsible for clinical training programs. METHODS We built consensus through an iterative approach in which each author, based on their medical education and research knowledge and expertise, independently developed a list of Do's, Don'ts, and Don't Knows about direct observation of clinical skills. Lists were compiled, discussed and revised. We then sought and compiled evidence to support each guideline and determine the strength of each guideline. RESULTS A final set of 33 Do's, Don'ts and Don't Knows is presented along with a summary of evidence for each guideline. Guidelines focus on two groups: individual supervisors and the educational leaders responsible for clinical training programs. Guidelines address recommendations for how to focus direct observation, select an assessment tool, promote high quality assessments, conduct rater training, and create a learning culture conducive to direct observation. CONCLUSIONS High frequency, high quality direct observation of clinical skills can be challenging. These guidelines offer important evidence-based Do's and Don'ts that can help improve the frequency and quality of direct observation. Improving direct observation requires focus not just on individual supervisors and their learners, but also on the organizations and cultures in which they work and train. Additional research to address the Don't Knows can help educators realize the full potential of direct observation in competency-based education.
Collapse
Affiliation(s)
- Jennifer R Kogan
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.
| | - Rose Hatala
- University of British Columbia, Vancouver, British Columbia, Canada
| | - Karen E Hauer
- University of California San Francisco, San Francisco, CA, USA
| | - Eric Holmboe
- Accreditation Council of Graduate Medical Education, Chicago, IL, USA
| |
Collapse
|
40
|
Gingerich A, Ramlo SE, van der Vleuten CPM, Eva KW, Regehr G. Inter-rater variability as mutual disagreement: identifying raters' divergent points of view. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2017; 22:819-838. [PMID: 27651046 DOI: 10.1007/s10459-016-9711-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 09/09/2016] [Indexed: 06/06/2023]
Abstract
Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.
Collapse
Affiliation(s)
- Andrea Gingerich
- Northern Medical Program, University of Northern British Columbia, 3333 University Way, Prince George, BC, V2N 4Z9, Canada.
| | - Susan E Ramlo
- Department of Engineering and Science Technology, University of Akron, Akron, OH, USA
| | | | - Kevin W Eva
- Centre for Health Education Scholarship, University of British Columbia, Vancouver, BC, Canada
| | - Glenn Regehr
- Centre for Health Education Scholarship, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
41
|
Lee V, Brain K, Martin J. Factors Influencing Mini-CEX Rater Judgments and Their Practical Implications: A Systematic Literature Review. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2017; 92:880-887. [PMID: 28030422 DOI: 10.1097/acm.0000000000001537] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
PURPOSE At present, little is known about how mini-clinical evaluation exercise (mini-CEX) raters translate their observations into judgments and ratings. The authors of this systematic literature review aim both to identify the factors influencing mini-CEX rater judgments in the medical education setting and to translate these findings into practical implications for clinician assessors. METHOD The authors searched for internal and external factors influencing mini-CEX rater judgments in the medical education setting from 1980 to 2015 using the Ovid MEDLINE, PsycINFO, ERIC, PubMed, and Scopus databases. They extracted the following information from each study: country of origin, educational level, study design and setting, type of observation, occurrence of rater training, provision of feedback to the trainee, research question, and identified factors influencing rater judgments. The authors also conducted a quality assessment for each study. RESULTS Seventeen articles met the inclusion criteria. The authors identified both internal and external factors that influence mini-CEX rater judgments. They subcategorized the internal factors into intrinsic rater factors, judgment-making factors (conceptualization, interpretation, attention, and impressions), and scoring factors (scoring integration and domain differentiation). CONCLUSIONS The current theories of rater-based judgment have not helped clinicians resolve the issues of rater idiosyncrasy, bias, gestalt, and conflicting contextual factors; therefore, the authors believe the most important solution is to increase the justification of rater judgments through the use of specific narrative and contextual comments, which are more informative for trainees. Finally, more real-world research is required to bridge the gap between the theory and practice of rater cognition.
Collapse
Affiliation(s)
- Victor Lee
- V. Lee is codirector of emergency medicine training, Department of Emergency Medicine, Austin Health, Heidelberg, Victoria, Australia.K. Brain is doctor, South West Healthcare, Warrnambool, Victoria, Australia.J. Martin is associate professor and director, Medical Student Programs, Monash University and Deakin University, Eastern Health Clinical School, Box Hill, Victoria, Australia
| | | | | |
Collapse
|
42
|
Scarff CE, Corderoy RM, Bearman M. In-training assessments: 'The difficulty is trying to balance reality and really tell the truth'. Australas J Dermatol 2016; 59:e15-e22. [PMID: 27995625 DOI: 10.1111/ajd.12555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2016] [Accepted: 08/03/2016] [Indexed: 11/30/2022]
Abstract
BACKGROUND In-training assessments (ITA) aim to evaluate trainees' progress and give valuable feedback on their performance. Many factors can affect supervisors during their completion of assessments and these can influence the final results recorded. METHODS This is the second part of a study of supervisors of the Australasian College of Dermatologists (ACD) and presents the qualitative data on their opinions of the ACD ITA process and the influences on their ITA ratings. RESULTS Supervisors noted the benefits of this assessment tool, together with many limitations. Potential influences upon supervisor ratings included the relationship between the supervisor and trainee and the level of honesty in completing and delivery of the assessment. CONCLUSIONS Many factors influence supervisors in the completion of the ITA. These include the impact of interpersonal relationships and concerns about the consequences of delivering a negative assessment, which sometimes lead supervisors to modify the assessment they deliver to the trainee. Further research is needed into honesty in assessment judgements.
Collapse
Affiliation(s)
- Catherine E Scarff
- Health Professions Education and Educational Research, Monash University, Melbourne, Victoria, Australia
| | - Robert M Corderoy
- Educational Development, Planning and Innovation, Australasian College of Dermatologists, Sydney, New South Wales, Australia
| | - Margaret Bearman
- Health Professions Education and Educational Research, Monash University, Melbourne, Victoria, Australia
| |
Collapse
|
43
|
Castanelli DJ, Jowsey T, Chen Y, Weller JM. Perceptions of purpose, value, and process of the mini-Clinical Evaluation Exercise in anesthesia training. Can J Anaesth 2016; 63:1345-1356. [DOI: 10.1007/s12630-016-0740-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 08/15/2016] [Accepted: 09/13/2016] [Indexed: 10/21/2022] Open
|
44
|
St-Onge C, Chamberland M, Lévesque A, Varpio L. Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2016; 21:627-642. [PMID: 26620923 DOI: 10.1007/s10459-015-9656-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 11/26/2015] [Indexed: 06/05/2023]
Abstract
Performance-based assessment (PBA) is a valued assessment approach in medical education, be it in a clerkship, residency, or practice context. Raters are intrinsic to PBA and the increased use of PBA has lead to an increased interest in rater cognition. Although several researchers have tackled factors that may influence the variability in rater judgment, the critical examination of rater observation of performance and the translation of that data into judgements are being investigated. The purpose of this study was to qualitatively investigate the cognitive processes of raters, and to create a framework that conceptualizes those processes when raters assess a complex performance. We conducted semi-structured interviews with 11 faculty members (nominated as excellent assessors) from a Department of Medicine to investigate how raters observe, interpret, and translate performance into judgments. The transcribed verbal protocols were analyzed using Constructivist Grounded Theory in order to develop a theoretical model of raters' assessment processes. Several themes emerged from the data and were grouped according to three macro-level themes describing how the raters balance two sources of data [(1) external sources of information and (2) internal/personal sources of information] by relying on specific cognitive processes to assess an examinee performance. The results from our study demonstrate that assessment is a difficult cognitive task that involves nuance using specific cognitive processes to weigh external and internal data against each other. Our data clearly draws attention to the constant struggle between objectivity and subjectivity that is observed in assessment as illustrated by the importance given to nuancing the examinee's observed performance.
Collapse
Affiliation(s)
- Christina St-Onge
- Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada.
| | - Martine Chamberland
- Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Annie Lévesque
- Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Lara Varpio
- Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| |
Collapse
|
45
|
Yeung E, Kulasagarem K, Woods N, Dubrowski A, Hodges B, Carnahan H. Validity of a new assessment rubric for a short-answer test of clinical reasoning. BMC MEDICAL EDUCATION 2016; 16:192. [PMID: 27461249 PMCID: PMC4962495 DOI: 10.1186/s12909-016-0714-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 07/23/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND The validity of high-stakes decisions derived from assessment results is of primary concern to candidates and certifying institutions in the health professions. In the field of orthopaedic manual physical therapy (OMPT), there is a dearth of documented validity evidence to support the certification process particularly for short-answer tests. To address this need, we examined the internal structure of the Case History Assessment Tool (CHAT); this is a new assessment rubric developed to appraise written responses to a short-answer test of clinical reasoning in post-graduate OMPT certification in Canada. METHODS Fourteen physical therapy students (novices) and 16 physical therapists (PT) with minimal and substantial OMPT training respectively completed a mock examination. Four pairs of examiners (n = 8) participated in appraising written responses using the CHAT. We conducted separate generalizability studies (G studies) for all participants and also by level of OMPT training. Internal consistency was calculated for test questions with more than 2 assessment items. Decision studies were also conducted to determine optimal application of the CHAT for OMPT certification. RESULTS The overall reliability of CHAT scores was found to be moderate; however, reliability estimates for the novice group suggest that the scale was incapable of accommodating for scores of novices. Internal consistency estimates indicate item redundancies for several test questions which will require further investigation. CONCLUSION Future validity studies should consider discriminating the clinical reasoning competence of OMPT trainees strictly at the post-graduate level. Although rater variance was low, the large variance attributed to error sources not incorporated in our G studies warrant further investigations into other threats to validity. Future examination of examiner stringency is also warranted.
Collapse
Affiliation(s)
- Euson Yeung
- Department of Rehabilitation Sciences, University of Toronto, 160-500 University Avenue, Toronto, ON M5G 1V7 Canada
- The Wilson Centre for Research in Education, University Health Network, Toronto, Canada
| | - Kulamakan Kulasagarem
- Department of Family and Community Medicine, University of Toronto, Toronto, Canada
- The Wilson Centre for Research in Education, University Health Network, Toronto, Canada
| | - Nicole Woods
- Department of Surgery, University of Toronto, Toronto, Canada
- The Wilson Centre for Research in Education, University Health Network, Toronto, Canada
| | - Adam Dubrowski
- Division of Emergency Medicine, Memorial University of Newfoundland, St John’s, Canada
| | - Brian Hodges
- Faculty of Medicine, University of Toronto, Toronto, Canada
- Wilson Centre for Research in Education Richard and Elizabeth Currie Chair in Health Professions Education Research, University Health Network, Toronto, Canada
| | - Heather Carnahan
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, St John’s, Canada
| |
Collapse
|
46
|
Byrne A, Soskova T, Dawkins J, Coombes L. A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance. BMC MEDICAL EDUCATION 2016; 16:191. [PMID: 27455964 PMCID: PMC4960857 DOI: 10.1186/s12909-016-0708-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/08/2016] [Indexed: 05/26/2023]
Abstract
BACKGROUND The Objective Structured Clinical Examination (OSCE) is now a standard assessment format and while examiner training is seen as essential to assure quality, there appear to be no widely accepted measures of examiner performance. METHODS The objective of this study was to determine whether the routine training provided to examiners improved their accuracy and reduced their mental workload. Accuracy was defined as the difference between the rating of each examiner and that of an expert group expressed as the mean error per item. At the same time the mental workload of each examiner was measured using a previously validated secondary task methodology. RESULTS Training was not associated with an improvement in accuracy (p = 0.547) and that there was no detectable effect on mental workload. However, accuracy was improved after exposure to the same scenario (p < 0.001) and accuracy was greater when marking an excellent compared to a borderline performance. CONCLUSIONS This study suggests that the method of training OSCE examiners studied is not effective in improving their performance, but that average item accuracy and mental workload appear to be valid methods of assessing examiner performance.
Collapse
Affiliation(s)
- Aidan Byrne
- Abertawe Bro Morgannwg University Local Health Board, Swansea, UK
- Department of Anaesthesia, Morriston Hospital, Swansea, SA6 6NL UK
| | - Tereza Soskova
- Abertawe Bro Morgannwg University Local Health Board, Swansea, UK
| | - Jayne Dawkins
- Abertawe Bro Morgannwg University Local Health Board, Swansea, UK
| | - Lee Coombes
- Cardiff University, School of Medicine, Cardiff, UK
| |
Collapse
|
47
|
Tavares W, Eva KW. Impact of rating demands on rater-based assessments of clinical competence. EDUCATION FOR PRIMARY CARE 2016; 25:308-18. [DOI: 10.1080/14739879.2014.11730760] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
48
|
Lee M, Wimmers PF. Validation of a performance assessment instrument in problem-based learning tutorials using two cohorts of medical students. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2016; 21:341-357. [PMID: 26307371 DOI: 10.1007/s10459-015-9632-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 08/14/2015] [Indexed: 06/04/2023]
Abstract
Although problem-based learning (PBL) has been widely used in medical schools, few studies have attended to the assessment of PBL processes using validated instruments. This study examined reliability and validity for an instrument assessing PBL performance in four domains: Problem Solving, Use of Information, Group Process, and Professionalism. Two cohorts of medical students (N = 310) participated in the study, with 2 years of PBL evaluation data extracted from archive rated by a total of 158 faculty raters. Analyses based on generalizability theory were conducted for reliability examination. Validity was examined through following the Standards for Educational and Psychological Testing to evaluate content validity, response processes, construct validity, predictive validity, and the relationship to the variable of training. For construct validity, correlations of PBL scores with six other outcome measures were examined, including Medical College Admission Test, United States Medical Licensing Examination (USMLE) Step 1, National Board of Medical Examiners (NBME) Comprehensive Basic Science Examination, NBME Comprehensive Clinical Science Examination, Clinical Performance Examination, and USMLE Step 2 Clinical Knowledge. Predictive validity was examined by using PBL scores to predict five medical school outcomes. The highest percentage of PBL total score variance was associated with students (60 %), indicating students in the study differed in their PBL performance. The generalizability and dependability coefficients were moderately high (Ep(2) = .68, ϕ = .60), showing the instrument is reliable for ranking students and identifying competent PBL performers. The patterns of correlations between PBL domain scores and the outcome measures partially support construct validity. PBL performance ratings as a whole significantly (p < .01) predicted all the major medical school achievements. The second year PBL scores were significantly higher than those of the first year, indicating a training effect. Psychometric findings provided support for reliability and many aspects of validity of PBL performance assessment using the instrument.
Collapse
Affiliation(s)
- Ming Lee
- Center for Educational Development and Research, David Geffen School of Medicine at University of California, Los Angeles, PO Box 951722, 60-051, Los Angeles, CA, USA.
| | - Paul F Wimmers
- Center for Educational Development and Research, David Geffen School of Medicine at University of California, Los Angeles, PO Box 951722, 60-051, Los Angeles, CA, USA
| |
Collapse
|
49
|
Gauthier G, St-Onge C, Tavares W. Rater cognition: review and integration of research findings. MEDICAL EDUCATION 2016; 50:511-22. [PMID: 27072440 DOI: 10.1111/medu.12973] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 07/20/2015] [Accepted: 11/13/2015] [Indexed: 05/21/2023]
Abstract
BACKGROUND Given the complexity of competency frameworks, associated skills and abilities, and contexts in which they are to be assessed in competency-based education (CBE), there is an increased reliance on rater judgements when considering trainee performance. This increased dependence on rater-based assessment has led to the emergence of rater cognition as a field of research in health professions education. The topic, however, is often conceptualised and ultimately investigated using many different perspectives and theoretical frameworks. Critically analysing how researchers think about, study and discuss rater cognition or the judgement processes in assessment frameworks may provide meaningful and efficient directions in how the field continues to explore the topic. METHODS We conducted a critical and integrative review of the literature to explore common conceptualisations and unified terminology associated with rater cognition research. We identified 1045 articles on rater-based assessment in health professions education using Scorpus, Medline and ERIC and 78 articles were included in our review. RESULTS We propose a three-phase framework of observation, processing and integration. We situate nine specific mechanisms and sub-mechanisms described across the literature within these phases: (i) generating automatic impressions about the person; (ii) formulating high-level inferences; (iii) focusing on different dimensions of competencies; (iv) categorising through well-developed schemata based on (a) personal concept of competence, (b) comparison with various exemplars and (c) task and context specificity; (v) weighting and synthesising information differently, (vi) producing narrative judgements; and (vii) translating narrative judgements into scales. CONCLUSION Our review has allowed us to identify common underlying conceptualisations of observed rater mechanisms and subsequently propose a comprehensive, although complex, framework for the dynamic and contextual nature of the rating process. This framework could help bridge the gap between researchers adopting different perspectives when studying rater cognition and enable the interpretation of contradictory findings of raters' performance by determining which mechanism is enabled or disabled in any given context.
Collapse
Affiliation(s)
| | - Christina St-Onge
- Medecine interne, Universite de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Walter Tavares
- Division of Emergency Medicine, McMaster University, Hamilton, Ontario, Canada
- Centennial College, School of Community and Health Studies, Toronto, Ontario, Canada
- ORNGE Transport Medicine, Faculty of Medicine, Mississauga, Ontario, Canada
| |
Collapse
|
50
|
Kudláček M, Frömel K, Jakubec L, Groffik D. Compensation for Adolescents' School Mental Load by Physical Activity on Weekend Days. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016; 13:E308. [PMID: 27005652 PMCID: PMC4808971 DOI: 10.3390/ijerph13030308] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Revised: 03/02/2016] [Accepted: 03/04/2016] [Indexed: 01/13/2023]
Abstract
INTRODUCTION AND OBJECTIVE Increasing mental load and inadequate stress management significantly affect the efficiency, success and safety of the educational/working process in adolescents. The objective of this study is to determine the extent that adolescents compensate for their school mental load by physical activity (PA) on weekend days and, thus, to contribute to the objective measurement of mental load in natural working conditions. METHODS A cross-sectional study was conducted between September 2013 and April 2014. A set of different methods was employed-self-administered questionnaire (IPAQ-long questionnaire), objective measurements-pedometers, and accelerometers (ActiTrainers). They was distributed to 548 students from 17 high schools. Participants' mental load was assessed based on the difference between PA intensity and/or physical inactivity and heart rate range. RESULTS The participants with the highest mental load during school lessons do not compensate for this load by PA on weekend days. CONCLUSIONS Adolescents need to be encouraged to be aware of their subjective mental load and to intentionally compensate for this load by PA on weekend days. It is necessary to support the process of adopting habits by sufficient physical literacy of students, as well as teachers, and by changes in the school program.
Collapse
Affiliation(s)
- Michal Kudláček
- Faculty of Physical Culture, Institute of Active Lifestyle, Palacký University Olomouc, Olomouc 77111, Czech Republic.
- Department of Leisure Studies, Faculty of Physical Culture, Palacký University Olomouc, Olomouc 77111, Czech Republic.
| | - Karel Frömel
- Faculty of Physical Culture, Institute of Active Lifestyle, Palacký University Olomouc, Olomouc 77111, Czech Republic.
| | - Lukáš Jakubec
- Faculty of Physical Culture, Institute of Active Lifestyle, Palacký University Olomouc, Olomouc 77111, Czech Republic.
| | - Dorota Groffik
- The Jerzy Kukuczka Academy of Physical Education, Katowice 40-065, Poland.
| |
Collapse
|