Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tavares W, Eva KW. Exploring the impact of mental workload on rater-based assessments. Adv Health Sci Educ Theory Pract 2013;18:291-303. [PMID: 22484964 DOI: 10.1007/s10459-012-9370-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2011] [Accepted: 03/26/2012] [Indexed: 05/14/2023]

For:	Tavares W, Eva KW. Exploring the impact of mental workload on rater-based assessments. Adv Health Sci Educ Theory Pract 2013;18:291-303. [PMID: 22484964 DOI: 10.1007/s10459-012-9370-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2011] [Accepted: 03/26/2012] [Indexed: 05/14/2023]

Number

Cited by Other Article(s)

Bazerbachi F, Murad F, Kubiliun N, Adams MA, Shahidi N, Visrodia K, Essex E, Raju G, Greenberg C, Day LW, Elmunzer BJ. Video recording in GI endoscopy. VIDEOGIE : AN OFFICIAL VIDEO JOURNAL OF THE AMERICAN SOCIETY FOR GASTROINTESTINAL ENDOSCOPY 2025;10:67-80. [PMID: 40012896 PMCID: PMC11852952 DOI: 10.1016/j.vgie.2024.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/03/2025]

Roberts C, Burgess A, Mossman K, Kumar K. Professional judgement: a social practice perspective on a multiple mini-interview for specialty training selection. BMC MEDICAL EDUCATION 2025;25:18. [PMID: 39754259 DOI: 10.1186/s12909-024-06535-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 12/16/2024] [Indexed: 01/06/2025]

Abstract

BACKGROUND

Interviewers' judgements play a critical role in competency-based assessments for selection such as the multiple-mini-interview (MMI). Much of the published research focuses on the psychometrics of selection and the impact of rater subjectivity. Within the context of selecting for entry into specialty postgraduate training, we used an interpretivist and socio-constructivist approach to explore how and why interviewers make judgments in high stakes selection settings whilst taking part in an MMI.

METHODS

We explored MMI interviewers' work processes through an institutional observational approach, based on the notion that interviewers' judgements are socially constructed and mediated by multiple factors. We gathered data through document analysis, and observations of interviewer training, candidate interactions with interviewers, and interviewer meetings. Interviews included informal encounters in a large selection centre. Data analysis balanced description and explicit interpretation of the meanings and functions of the interviewers' actions and behaviours.

RESULTS

Three themes were developed from the data showing how interviewers make professional judgements, specifically by; 'Balancing the interplay of rules and agency,' 'Participating in moderation and shared meaning making; and 'A culture of reflexivity and professional growth.' Interviewers balanced the following of institutional rules with making judgment choices based on personal expertise and knowledge. They engaged in dialogue, moderation, and shared meaning with fellow interviewers which enabled their consideration of multiple perspectives of the candidate's performance. Interviewers engaged in self-evaluation and reflection throughout, with professional learning and growth as primary care physicians and supervisors being an emergent outcome.

CONCLUSION

This study offers insights into the judgment-making processes of interviewers in high-stakes MMI contexts, highlighting the balance between structured protocols and personal expertise within a socially constructed framework. By linking MMI practices to the broader work-based assessment literature, we contribute to advancing the design and implementation of more valid and fair selection tools for postgraduate training. Additionally, the study underscores the dual benefit of MMIs-not only as a selection tool but also as a platform for interviewers' professional growth. These insights offer practical implications for refining future MMI practices and improving the fairness of high-stakes selection processes.

Collapse

Dziadzko M, Varvinskiy A, Di Loreto R, Scipioni H, Ateleanu B, Klimek M, Berger-Estilita J. Examiner workload comparison: three structured oral examination formats for the European diploma in anaesthesiology and intensive care. MEDICAL EDUCATION ONLINE 2024;29:2364990. [PMID: 38848480 PMCID: PMC11164053 DOI: 10.1080/10872981.2024.2364990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 06/03/2024] [Indexed: 06/09/2024]

Abstract

The COVID-19 pandemic triggered transformations in academic medicine, rapidly adopting remote teaching and online assessments. Whilst virtual environments show promise in evaluating medical knowledge, their impact on examiner workload is unclear. This study explores examiner's workload during different European Diploma in Anaesthesiology and Intensive Care Part 2 Structured Oral Examinations formats. We hypothesise that online exams result in lower examiner's workload than traditional face-to-face methods. We also investigate workload structure and its correlation with examiner characteristics and marking performance. In 2023, examiner's workload for three examination formats (face-to-face, hybrid, online) using the NASA TLX instrument was prospectively evaluated. The impact of examiner demographics, candidate scoring agreement, and examination scores on workload was analysed. The overall NASA TLX score from 215 workload measurements in 142 examiners was high at 59.61 ± 14.13. The online examination had a statistically higher workload (61.65 ± 12.84) than hybrid but not face-to-face. Primary contributors to workload were mental and temporal demands, and effort. Online exams were associated with elevated frustration. Male examiners and those spending more time on exam preparation experienced a higher workload. Multiple diploma specialties and familiarity with European Diploma in Anaesthesiology and Intensive Care exams were protective against high workload. Perceived workload did not impact marking agreement or examination scores across all formats. Examiners experience high workload. Online exams are not systematically associated with decreased workload, likely due to frustration. Despite workload differences, no impact on examiner's performance or examination scores was found. The hybrid examination mode, combining face-to-face and online, was associated with a minor but statistically significant workload reduction. This hybrid approach may offer a more balanced and efficient examination process while maintaining integrity, cost savings, and increased accessibility for candidates.

Collapse

Smith SE, McColgan-Smith S, Stewart F, Mardon J, Tallentire VR. Beyond reliability: assessing rater competence when using a behavioural marker system. Adv Simul (Lond) 2024;9:55. [PMID: 39736776 DOI: 10.1186/s41077-024-00329-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 12/16/2024] [Indexed: 01/01/2025] Open

Abstract

BACKGROUND

Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS - pharmacists' behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect.

METHODS

Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist's behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson's chi-squared test.

RESULTS

The ICC for experienced faculty raters was good at 0.60 (0.48-0.72) and for near-peer raters was poor at 0.38 (0.27-0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077).

CONCLUSIONS

Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.

Collapse

Meguerdichian MJ, Trottier DG, Campbell-Taylor K, Bentley S, Bryant K, Kolbe M, Grant V, Cheng A. When common cognitive biases impact debriefing conversations. Adv Simul (Lond) 2024;9:48. [PMID: 39695901 DOI: 10.1186/s41077-024-00324-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 11/23/2024] [Indexed: 12/20/2024] Open

Wood TJ, Daniels VJ, Pugh D, Touchie C, Halman S, Humphrey-Murto S. Implicit versus explicit first impressions in performance-based assessment: will raters overcome their first impressions when learner performance changes? ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024;29:1155-1168. [PMID: 38010576 DOI: 10.1007/s10459-023-10302-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/12/2023] [Indexed: 11/29/2023]

Khawaji B, Masuadi E, Alraddadi A, Khan MA, Aga SS, Al-Jifree H, Magzoub ME. Tutor assessment of medical students in problem-based learning sessions. JOURNAL OF EDUCATION AND HEALTH PROMOTION 2024;13:237. [PMID: 39297122 PMCID: PMC11410280 DOI: 10.4103/jehp.jehp_1413_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 12/04/2023] [Indexed: 09/21/2024]

Sahi N, Humphrey-Murto S, Brennan EE, O'Brien M, Hall AK. Current use of simulation for EPA assessment in emergency medicine. CAN J EMERG MED 2024;26:179-187. [PMID: 38374281 DOI: 10.1007/s43678-024-00649-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 01/12/2024] [Indexed: 02/21/2024]

Abstract

OBJECTIVE

Approximately five years ago, the Royal College emergency medicine programs in Canada implemented a competency-based paradigm and introduced the use of Entrustable Professional Activities (EPAs) for assessment of units of professional activity to assess trainees. Many competency-based medical education (CBME) based curricula, involve assessing for entrustment through observations of EPAs. While EPAs are frequently assessed in clinical settings, simulation is also used. This study aimed to characterize the use of simulation for EPA assessment.

METHODS

A study interview guide was jointly developed by all study authors and followed best practices for survey development. A national interview was conducted with program directors or assistant program directors across all the Royal College emergency medicine programs across Canada. Interviews were conducted over Microsoft Teams, interviews were recorded and transcribed, using Microsoft Teams transcribing service. Sample transcripts were analyzed for theme development. Themes were then reviewed by co-authors to ensure they were representative of the participants' views.

RESULTS

A 64.7% response rate was achieved. Simulation has been widely adopted by EM training programs. All interviewees demonstrated support for the use of simulation for EPA assessment for many reasons, however, PDs acknowledged limitations and thematic analysis revealed certain themes and tensions for using simulation for EPA assessment. Thematic analysis revealed six major themes: widespread support for the use of simulation for EPA assessment, concerns regarding the potential for EPA assessment to become a "tick- box" exercise, logistical barriers limiting the use of simulation for EPA assessment, varied perceptions about the authenticity of using simulation for EPA assessment, the potential for simulation for EPA assessment to compromise learner psychological safety, and suggestions for the optimization of use of simulation for EPA assessment.

CONCLUSIONS

Our findings offer insight for other programs and specialties on how simulation for EPA assessment can best be utilized. Programs should use these findings when considering using simulation for EPA assessment.

Collapse

Yang D, Draganov PV, Pohl H, Aihara H, Jeyalingam T, Khashab M, Liu N, Hasan MK, Jawaid S, Othman M, Al-Haddad M, DeWitt JM, Triggs JR, Wang AY, Bechara R, Sethi A, Law R, Aadam AA, Kumta N, Sharma N, Hayat M, Zhang Y, Yi F, Elmunzer BJ. Development and initial validation of a video-based peroral endoscopic myotomy assessment tool. Gastrointest Endosc 2024;99:177-185. [PMID: 37500019 DOI: 10.1016/j.gie.2023.07.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/18/2023] [Accepted: 07/19/2023] [Indexed: 07/29/2023]

Affiliation(s)

Dennis Yang Center for Interventional Endoscopy, AdventHealth, Orlando, Florida, USA.
Peter V Draganov Division of Gastroenterology and Hepatology, University of Florida, Gainesville, Florida, USA
Heiko Pohl Veterans Affairs Medical Center, White River Junction, Vermont; Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
Hiroyuki Aihara Division of Gastroenterology, Hepatology and Endoscopy, Brigham and Women's Hospital, Boston, Massachusetts, USA
Thurarshen Jeyalingam Division of Gastroenterology and Hepatology, University of Toronto, Toronto, Ontario, Canada
Mouen Khashab Division of Gastroenterology and Hepatology, Johns Hopkins Hospital, Baltimore, Maryland, USA
Nanlong Liu Division of Gastroenterology, University of Louisville, Louisville, Kentucky, USA
Muhammad K Hasan Center for Interventional Endoscopy, AdventHealth, Orlando, Florida, USA
Salmaan Jawaid Division of Gastroenterology, Baylor College of Medicine, Houston, Texas, USA
Mohamed Othman Division of Gastroenterology, Baylor College of Medicine, Houston, Texas, USA
Mohamed Al-Haddad Department of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA
John M DeWitt Department of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA
Joseph R Triggs Division of Gastroenterology, Fox Chase Cancer Center, Temple Health, Philadelphia, Pennsylvania, USA
Andrew Y Wang Division of Gastroenterology and Hepatology, University of Virginia, Charlottesville, Virginia, USA
Robert Bechara Division of Gastroenterology and GI Diseases Research Unit, Queen's University, Kingston, Ontario, Canada
Amrita Sethi Division of Digestive and Liver Diseases, Columbia University Irving Medical Center, Presbyterian Hospital, New York, New York, USA
Ryan Law Division of Gastroenterology and Hepatology, Mayo Clinic, Minneapolis, Minnesota, USA
Aziz A Aadam Division of Gastroenterology and Hepatology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Nikhil Kumta Henry D. Janowitz Division of Gastroenterology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Neil Sharma Division of Interventional Oncology and Surgical Endoscopy (IOSE), Parkview Cancer Institute, Fort Wayne, Indiana, USA
Maham Hayat Center for Interventional Endoscopy, AdventHealth, Orlando, Florida, USA
YiYang Zhang Center for Collaborative Research, AdventHealth Research Institute, Orlando, Florida, USA
Fanchao Yi Center for Collaborative Research, AdventHealth Research Institute, Orlando, Florida, USA
B Joseph Elmunzer Department of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, South Carolina, USA

Collapse

Fu Y, Zhang W, Zhang S, Hua D, Xu D, Huang H. Applying a video recording, video-based rating method in OSCEs. MEDICAL EDUCATION ONLINE 2023;28:2187949. [PMID: 36883331 PMCID: PMC10013518 DOI: 10.1080/10872981.2023.2187949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 03/01/2023] [Accepted: 03/02/2023] [Indexed: 06/18/2023]

Abstract

INTRODUCTION

Objective structured clinical examination (OSCE) results could be affected by low homogeneity of examiners, non-retrospectiveness of test results, and examiner-cohort effect. In China, many students participate in medical qualification examinations, and this issue is particularly significant. This study aimed to develop a video recording, video-based rating method and compare the reliability of video and on-site ratings to enhance the quality assurance of OSCEs.

METHODS

The subjects of this study were clinical students one year after graduation participating in the clinical skills portion of the National Medical Licensing Examination. The participants were from four cities in Jiangsu province. Participants were randomly allocated to on-site and video rating groups to evaluate the rating methods consistency. We verified the reliability of recording equipment and evaluability of video recording. Moreover, we compared the consistency and equivalence of the two rating methods and analyzed the impact of video recording on scores.

RESULTS

The reliability of recording equipment and evaluability of video recording were high. Evaluation consistency between experts and examiners was acceptable, and there was no difference in evaluation results (P = 0.61). There was good consistency between video and on-site rating; however, a difference between the two rating methods was detected. The scores of video-based rating group students were lower than those of all students (P < 0.00).

CONCLUSIONS

Video-based rating could be reliable and offer advantages over on-site rating. The video recording, video-based rating method could provide greater content validity based on its traceability and the ability to view details. Video recording, video-based rating offers a promising mthod for improving the effectiveness and fairness of OSCEs.

Collapse

Thornby KA, Brazeau GA, Chen AMH. Reducing Student Workload Through Curricular Efficiency. AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION 2023;87:100015. [PMID: 37597906 DOI: 10.1016/j.ajpe.2022.12.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 11/17/2022] [Accepted: 12/20/2022] [Indexed: 08/21/2023]

Klusmann D, Knorr M, Hampe W. Exploring the relationships between first impressions and MMI ratings: a pilot study. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2023;28:519-536. [PMID: 36053344 PMCID: PMC10169880 DOI: 10.1007/s10459-022-10151-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 06/28/2022] [Indexed: 05/11/2023]

Gonzalez PR, Paravattil B, Wilby KJ. Mental effort in the assessment of critical reflection: Implications for assessment quality and scoring. CURRENTS IN PHARMACY TEACHING & LEARNING 2022;14:830-834. [PMID: 35914842 DOI: 10.1016/j.cptl.2022.06.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 05/16/2022] [Accepted: 06/16/2022] [Indexed: 06/15/2023]

Malau‐Aduli BS. Patient involvement in assessment: How useful is it? MEDICAL EDUCATION 2022;56:590-592. [PMID: 35298852 PMCID: PMC9311839 DOI: 10.1111/medu.14802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/11/2022] [Accepted: 03/12/2022] [Indexed: 06/14/2023]

Homer M. Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022;27:457-473. [PMID: 35230590 PMCID: PMC9117341 DOI: 10.1007/s10459-022-10096-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 01/23/2022] [Indexed: 06/14/2023]

Abstract

Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade. The exam data is from 442 separate administrations of an 18 station OSCE for international medical graduates who want to work in the National Health Service in the UK. We find that variation due to examiner is approximately twice as large for domain scores as it is for grades (16% vs. 8%), with smaller residual variance in the former (67% vs. 76%). Combined estimates of exam-level (relative) reliability across all data are 0.75 and 0.69 for domains scores and grades respectively. The correlation between two separate estimates of stringency for individual examiners (one for grades and one for domain scores) is relatively high (r=0.76) implying that examiners are generally quite consistent in their stringency between these two assessments of performance. Cluster analysis indicates that examiners fall into two broad groups characterised as hawks or doves on both measures. At the exam level, correcting for examiner stringency produces systematically lower cut-scores under borderline regression standard setting than using the raw marks. In turn, such a correction would produce higher pass rates-although meaningful direct comparisons are challenging to make. As in other studies, this work shows that OSCEs and other standardised performance assessments are subject to substantial variation in examiner stringency, and require sufficient domain sampling to ensure quality of pass/fail decision-making is at least adequate. More, perhaps qualitative, work is needed to understand better how examiners might score similarly (or differently) between the awarding of station-level domain scores and global grades. The issue of the potential systematic bias of borderline regression evidenced for the first time here, with sources of error producing cut-scores higher than they should be, also needs more investigation.

Collapse

Swanberg M, Woodson-Smith S, Pangaro L, Torre D, Maggio L. Factors and Interactions Influencing Direct Observation: A Literature Review Guided by Activity Theory. TEACHING AND LEARNING IN MEDICINE 2022;34:155-166. [PMID: 34238091 DOI: 10.1080/10401334.2021.1931871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 04/19/2021] [Accepted: 05/11/2021] [Indexed: 06/13/2023]

Abstract

PhenomenonEnsuring that future physicians are competent to practice medicine is necessary for high quality patient care and safety. The shift toward competency-based education has placed renewed emphasis on direct observation via workplace-based assessments in authentic patient care contexts. Despite this interest and multiple studies focused on improving direct observation, challenges regarding the objectivity of this assessment approach remain underexplored and unresolved. Approach: We conducted a literature review of direct observation in authentic patient contexts by systematically searching databases PubMed, Embase, Web of Science, and ERIC. Included studies comprised original research conducted in the patient care context with authentic patients, either as a live encounter or a video recording of an actual encounter, which focused on factors affecting the direct observation of undergraduate medical education (UME) or graduate medical education (GME) trainees. Because the patient care context adds factors that contribute to the cognitive load of the learner and of the clinician-observer we focused our question on such contexts, which are most useful in judgments about advancement to the next level of training or practice. We excluded articles or published abstracts not conducted in the patient care context (e.g., OSCEs) or those involving simulation, allied health professionals, or non-UME/GME trainees. We also excluded studies focused on end-of-rotation evaluations and in-training evaluation reports. We extracted key data from the studies and used Activity Theory as a lens to identify factors affecting these observations and the interactions between them. Activity Theory provides a framework to understand and analyze complex human activities, the systems in which people work, and the interactions or tensions between multiple associated factors. Findings: Nineteen articles were included in the analysis; 13 involved GME learners and 6 UME learners. Of the 19, six studies were set in the operating room and four in the Emergency department. Using Activity Theory, we discovered that while numerous studies focus on rater and tool influences, very few study the impact of social elements. These are the rules that govern how the activity happens, the environment and members of the community involved in the activity and how completion of the activity is divided up among the members of the community. Insights: Viewing direct observation via workplace-based assessment through the lens of Activity Theory may enable educators to implement curricular changes to improve direct observation of assessment. Activity Theory may allow researchers to design studies to focus on the identified underexplored interactions and influences in relation to direct observation.

Collapse

Fyfe M, Horsburgh J, Blitz J, Chiavaroli N, Kumar S, Cleland J. The do's, don'ts and don't knows of redressing differential attainment related to race/ethnicity in medical schools. PERSPECTIVES ON MEDICAL EDUCATION 2022;11:1-14. [PMID: 34964930 PMCID: PMC8714874 DOI: 10.1007/s40037-021-00696-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 11/07/2021] [Accepted: 11/09/2021] [Indexed: 05/21/2023]

Abstract

INTRODUCTION

Systematic and structural inequities in power and privilege create differential attainment whereby differences in average levels of performance are observed between students from different socio-demographic groups. This paper reviews the international evidence on differential attainment related to ethnicity/race in medical school, drawing together the key messages from research to date to provide guidance for educators to operationalize and enact change and identify areas for further research.

METHODS

Authors first identified areas of conceptual importance within differential attainment (learning, assessment, and systems/institutional factors) which were then the focus of a targeted review of the literature on differential attainment related to ethnicity/race in medical education and, where available and relevant, literature from higher education more generally. Each author then conducted a review of the literature and proposed guidelines based on their experience and research literature. The guidelines were iteratively reviewed and refined between all authors until we reached consensus on the Do's, Don'ts and Don't Knows.

RESULTS

We present 13 guidelines with a summary of the research evidence for each. Guidelines address assessment practices (assessment design, assessment formats, use of assessments and post-hoc analysis) and educational systems and cultures (student experience, learning environment, faculty diversity and diversity practices).

CONCLUSIONS

Differential attainment related to ethnicity/race is a complex, systemic problem reflective of unequal norms and practices within broader society and evident throughout assessment practices, the learning environment and student experiences at medical school. Currently, the strongest empirical evidence is around assessment processes themselves. There is emerging evidence of minoritized students facing discrimination and having different learning experiences in medical school, but more studies are needed. There is a pressing need for research on how to effectively redress systemic issues within our medical schools, particularly related to inequity in teaching and learning.

Collapse

Fleming M, Vautour D, McMullen M, Cofie N, Dalgarno N, Phelan R, Mizubuti GB. Examining the accuracy of residents' self-assessments and faculty assessment behaviours in anesthesiology. CANADIAN MEDICAL EDUCATION JOURNAL 2021;12:17-26. [PMID: 34567302 PMCID: PMC8463238 DOI: 10.36834/cmej.70697] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Abstract

BACKGROUND

Residents' accurate self-assessment and clinical judgment are essential for optimizing their clinical skills development. Evidence from the medical literature suggests that residents generally do poorly at self-assessing their performance, often due to factors relating to learners' personal backgrounds, cultures, the specific contexts of the learning environment and rater bias or inaccuracies. We evaluated the accuracy of anesthesiology residents' self-assessed Global Entrustment scores and determined whether differences between faculty and resident scores varied by resident seniority, faculty leniency, and/or year of assessment.

METHODS

We employed variance components modeling techniques and analyzed 329 pairs of faculty and self-assessed entrustment scores among 43 faculty assessors and 15 residents. Using faculty scores as the gold standard, we compared faculty scores with residents' scores (x_i(faculty)-x_i(resident)), and determined residents' accuracy, including over- and under-confidence.

RESULTS

The results indicate that residents were respectively over- and under-confident in 10.9% and 54.4% of the assessments but more consistent in their individual self-assessments (rho = 0.70) than faculty assessors. Faculty scores were significantly higher (α = 0.396; z = 4.39; p < 0.001) than residents' self-assessed scores. Being a lenient/dovish (β = 0.121, z = 3.16, p < 0.01) and a neutral (β = 0.137, z = 3.57, p < 0.001) faculty assessor predicted a higher likelihood of resident under-confidence. Senior residents were significantly less likely to be under-confident compared to junior residents (β = -0.182, z =-2.45, p < 0.05). The accuracy of self-assessments did not significantly vary during the two years of the study period.

CONCLUSIONS

The majority of residents' self-assessments were inaccurate. Our findings may help identify the sources of such inaccuracies.

Collapse

Humphrey-Murto S, Shaw T, Touchie C, Pugh D, Cowley L, Wood TJ. Are raters influenced by prior information about a learner? A review of assimilation and contrast effects in assessment. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:1133-1156. [PMID: 33566199 DOI: 10.1007/s10459-021-10032-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 01/25/2021] [Indexed: 06/12/2023]

Effects of a Resident's Reputation on Laparoscopic Skills Assessment. Obstet Gynecol 2021;138:16-20. [PMID: 34259459 DOI: 10.1097/aog.0000000000004426] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 02/18/2021] [Indexed: 11/26/2022]

Valentine N, Durning S, Shanahan EM, Schuwirth L. Fairness in human judgement in assessment: a hermeneutic literature review and conceptual framework. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:713-738. [PMID: 33123837 DOI: 10.1007/s10459-020-10002-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/19/2020] [Indexed: 06/11/2023]

Abstract

Human judgement is widely used in workplace-based assessment despite criticism that it does not meet standards of objectivity. There is an ongoing push within the literature to better embrace subjective human judgement in assessment not as a 'problem' to be corrected psychometrically but as legitimate perceptions of performance. Taking a step back and changing perspectives to focus on the fundamental underlying value of fairness in assessment may help re-set the traditional objective approach and provide a more relevant way to determine the appropriateness of subjective human judgements. Changing focus to look at what is 'fair' human judgement in assessment, rather than what is 'objective' human judgement in assessment allows for the embracing of many different perspectives, and the legitimising of human judgement in assessment. However, this requires addressing the question: what makes human judgements fair in health professions assessment? This is not a straightforward question with a single unambiguously 'correct' answer. In this hermeneutic literature review we aimed to produce a scholarly knowledge synthesis and understanding of the factors, definitions and key questions associated with fairness in human judgement in assessment and a resulting conceptual framework, with a view to informing ongoing further research. The complex construct of fair human judgement could be conceptualised through values (credibility, fitness for purpose, transparency and defensibility) which are upheld at an individual level by characteristics of fair human judgement (narrative, boundaries, expertise, agility and evidence) and at a systems level by procedures (procedural fairness, documentation, multiple opportunities, multiple assessors, validity evidence) which help translate fairness in human judgement from concepts into practical components.

Collapse

Elmunzer BJ, Walsh CM, Guiton G, Serrano J, Chak A, Edmundowicz S, Kwon RS, Mullady D, Papachristou GI, Elta G, Baron TH, Yachimski P, Fogel E, Draganov PV, Taylor J, Scheiman J, Singh V, Varadarajulu S, Willingham FF, Cote G, Cotton PB, Simon V, Spitzer R, Keswani R, Wani S. Development and initial validation of an instrument for video-based assessment of technical skill in ERCP. Gastrointest Endosc 2021;93:914-923. [PMID: 32739484 PMCID: PMC8961206 DOI: 10.1016/j.gie.2020.07.055] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 07/24/2020] [Indexed: 12/11/2022]

Affiliation(s)

B. Joseph Elmunzer Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
Catharine M Walsh Division of Gastroenterology, Hepatology, and Nutrition, Learning Institute and Research Institute, Hospital for Sick Children, Toronto, Canada
Gretchen Guiton Department of Internal Medicine, University of Colorado School of Medicine, Aurora, CO, USA
Jose Serrano National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
Amitabh Chak Division of Gastroenterology and Liver Disease, Case Western Reserve University, Cleveland, OH, USA
Steven Edmundowicz Division of Gastroenterology and Hepatology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Richard S. Kwon Division of Gastroenterology, University of Michigan, Ann Arbor, MI, USA
Daniel Mullady Division of Gastroenterology, Washington University School of Medicine, St Louis, Missouri, USA
Georgios I. Papachristou Division of Gastroenterology, Hepatology, and Nutrition, Ohio State University Wexner Medical Center, Columbus, OH, USA
Grace Elta Division of Gastroenterology, University of Michigan, Ann Arbor, MI, USA
Todd H. Baron Division of Gastroenterology and Hepatology, University of North Carolina, Chapel Hill, NC, USA
Patrick Yachimski Division of Gastroenterology, Vanderbilt University, Nashville, TN, USA
Evan Fogel Division of Gastroenterology and Hepatology, Indiana University, Indianapolis, IN, USA
Peter V. Draganov Division of Gastroenterology, Hepatology, and Nutrition, University of Florida, Gainesville, FL, USA
Jason Taylor Division of Gastroenterology and Hepatology, Saint Louis University, Saint Louis, MO, USA
James Scheiman Division of Gastroenterology and Hepatology, University of Virginia, Charlottesville, VA, USA
Vikesh Singh Division of Gastroenterology, Johns Hopkins Medical Institutions, Baltimore, MD, USA
Shyam Varadarajulu Center for Interventional Endoscopy, Advent Health, Orlando, FL, USA
Field F. Willingham Division of Digestive Diseases, Emory University, Atlanta, GA, USA
Gregory Cote Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
Peter B. Cotton Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
Violette Simon Division of Gastroenterology and Hepatology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Rebecca Spitzer Division of Gastroenterology and Hepatology, Medical University of South Carolina, Charleston, SC, USA
Rajesh Keswani Division of Gastroenterology, Northwestern University, Chicago, IL, USA
Sachin Wani Division of Gastroenterology and Hepatology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA

Collapse

Malau-Aduli BS, Hays RB, D'Souza K, Smith AM, Jones K, Turner R, Shires L, Smith J, Saad S, Richmond C, Celenza A, Sen Gupta T. Examiners' decision-making processes in observation-based clinical examinations. MEDICAL EDUCATION 2021;55:344-353. [PMID: 32810334 DOI: 10.1111/medu.14357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/08/2020] [Accepted: 08/14/2020] [Indexed: 06/11/2023]

Abstract

BACKGROUND

Objective structured clinical examinations (OSCEs) are commonly used to assess the clinical skills of health professional students. Examiner judgement is one acknowledged source of variation in candidate marks. This paper reports an exploration of examiner decision making to better characterise the cognitive processes and workload associated with making judgements of clinical performance in exit-level OSCEs.

METHODS

Fifty-five examiners for exit-level OSCEs at five Australian medical schools completed a NASA Task Load Index (TLX) measure of cognitive load and participated in focus group interviews immediately after the OSCE session. Discussions focused on how decisions were made for borderline and clear pass candidates. Interviews were transcribed, coded and thematically analysed. NASA TLX results were quantitatively analysed.

RESULTS

Examiners self-reported higher cognitive workload levels when assessing a borderline candidate in comparison with a clear pass candidate. Further analysis revealed five major themes considered by examiners when marking candidate performance in an OSCE: (a) use of marking criteria as a source of reassurance; (b) difficulty adhering to the marking sheet under certain conditions; (c) demeanour of candidates; (d) patient safety, and (e) calibration using a mental construct of the 'mythical [prototypical] intern'. Examiners demonstrated particularly higher mental demand when assessing borderline compared to clear pass candidates.

CONCLUSIONS

Examiners demonstrate that judging candidate performance is a complex, cognitively difficult task, particularly when performance is of borderline or lower standard. At programme exit level, examiners intuitively want to rate candidates against a construct of a prototypical graduate when marking criteria appear not to describe both what and how a passing candidate should demonstrate when completing clinical tasks. This construct should be shared, agreed upon and aligned with marking criteria to best guide examiner training and calibration. Achieving this integration may improve the accuracy and consistency of examiner judgements and reduce cognitive workload.

Collapse

Roy M, Wojcik J, Bartman I, Smee S. Augmenting physician examiner scoring in objective structured clinical examinations: including the standardized patient perspective. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021;26:313-328. [PMID: 32816242 DOI: 10.1007/s10459-020-09987-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 08/17/2020] [Indexed: 06/11/2023]

Leenstra NF, Jung OC, Cnossen F, Jaarsma ADC, Tulleken JE. Development and Evaluation of the Taxonomy of Trauma Leadership Skills-Shortened for Observation and Reflection in Training: A Practical Tool for Observing and Reflecting on Trauma Leadership Performance. Simul Healthc 2021;16:37-45. [PMID: 32732816 PMCID: PMC7850591 DOI: 10.1097/sih.0000000000000474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Abstract

INTRODUCTION

Trauma leadership skills are increasingly being addressed in trauma courses, but few resources are available to systematically observe and debrief trainees' performances. The authors therefore translated their previously developed, extensive Taxonomy of Trauma Leadership Skills (TTLS) into a practical observation tool that is tailored to the vocabulary of clinician instructors and their workflow and workload during simulation-based training.

METHODS

In 2016 to 2018, the TTLS was subjected to practical evaluation in an iterative process of 2 stages. In the first stage, testing panels of trauma specialists observed excerpts from videotaped simulations and indicated from the list of elements which behaviors they felt were being shown. Any ambiguities or redundancy were addressed by rephrasing or combining elements. In the second stage, iterations were used in actual scenario training to observe and debrief trainees' performances. The instructors' recommendations resulted in further improvements of clarity, ease of use, and usefulness, until no new suggestions were raised.

RESULTS

The resultant "TTLS-Shortened for Observation and Reflection in Training" was given a simpler structure and more concrete and self-explanatory benchmarks. It contains 6 skill categories for evaluation, each with 4 to 6 benchmark behaviors.

CONCLUSIONS

The TTLS-Shortened for Observation and Reflection in Training is an important addition to other trauma assessment tools because of its specific focus on leadership skills. It helps set concrete performance expectations, simplify note taking, and target observations and debriefings. One central challenge was striking a balance between its conciseness and specificity. The authors reflected on how the decisions for the resultant structure ease and leverage the conduct of observations and performance debriefing.

Collapse

Wilby KJ, Paravattil B. Cognitive load theory: Implications for assessment in pharmacy education. Res Social Adm Pharm 2020;17:1645-1649. [PMID: 33358136 DOI: 10.1016/j.sapharm.2020.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 11/09/2020] [Accepted: 12/15/2020] [Indexed: 11/28/2022]

Hyde C, Yardley S, Lefroy J, Gay S, McKinley RK. Clinical assessors' working conceptualisations of undergraduate consultation skills: a framework analysis of how assessors make expert judgements in practice. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2020;25:845-875. [PMID: 31997115 PMCID: PMC7471149 DOI: 10.1007/s10459-020-09960-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 01/18/2020] [Indexed: 06/10/2023]

Homer M, Fuller R, Hallam J, Pell G. Shining a spotlight on scoring in the OSCE: Checklists and item weighting. MEDICAL TEACHER 2020;42:1037-1042. [PMID: 32608303 DOI: 10.1080/0142159x.2020.1781072] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Prediger S, Schick K, Fincke F, Fürstenberg S, Oubaid V, Kadmon M, Berberat PO, Harendza S. Validation of a competence-based assessment of medical students' performance in the physician's role. BMC MEDICAL EDUCATION 2020;20:6. [PMID: 31910843 PMCID: PMC6947905 DOI: 10.1186/s12909-019-1919-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/22/2019] [Indexed: 05/04/2023]

Abstract

BACKGROUND

Assessing competence of advanced undergraduate medical students based on performance in the clinical context is the ultimate, yet challenging goal for medical educators to provide constructive alignment between undergraduate medical training and professional work of physicians. Therefore, we designed and validated a performance-based 360-degree assessment for competences of advanced undergraduate medical students.

METHODS

This study was conducted in three steps: 1) Ten facets of competence considered to be most important for beginning residents were determined by a ranking study with 102 internists and 100 surgeons. 2) Based on these facets of competence we developed a 360-degree assessment simulating a first day of residency. Advanced undergraduate medical students (year 5 and 6) participated in the physician's role. Additionally knowledge was assessed by a multiple-choice test. The assessment was performed twice (t1 and t2) and included three phases: a consultation hour, a patient management phase, and a patient handover. Sixty-seven (t1) and eighty-nine (t2) undergraduate medical students participated. 3) The participants completed the Group Assessment of Performance (GAP)-test for flight school applicants to assess medical students' facets of competence in a non-medical context for validation purposes. We aimed to provide a validity argument for our newly designed assessment based on Messick's six aspects of validation: (1) content validity, (2) substantive/cognitive validity, (3) structural validity, (4) generalizability, (5) external validity, and (6) consequential validity.

RESULTS

Our assessment proved to be well operationalised to enable undergraduate medical students to show their competences in performance on the higher levels of Bloom's taxonomy. Its generalisability was underscored by its authenticity in respect of workplace reality and its underlying facets of competence relevant for beginning residents. The moderate concordance with facets of competence of the validated GAP-test provides arguments of convergent validity for our assessment. Since five aspects of Messick's validation approach could be defended, our competence-based 360-degree assessment format shows good arguments for its validity.

CONCLUSION

According to these validation arguments, our assessment instrument seems to be a good option to assess competence in advanced undergraduate medical students in a summative or formative way. Developments towards assessment of postgraduate medical trainees should be explored.

Collapse

Paravattil B, Wilby KJ. Optimizing assessors' mental workload in rater-based assessment: a critical narrative review. PERSPECTIVES ON MEDICAL EDUCATION 2019;8:339-345. [PMID: 31728841 PMCID: PMC6904389 DOI: 10.1007/s40037-019-00535-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

van Andel CEE, Born MP, Themmen APN, Stegers‐Jager KM. Broadly sampled assessment reduces ethnicity-related differences in clinical grades. MEDICAL EDUCATION 2019;53:264-275. [PMID: 30680783 PMCID: PMC6590164 DOI: 10.1111/medu.13790] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 09/10/2018] [Accepted: 11/09/2018] [Indexed: 05/30/2023]

Abstract

CONTEXT

Ethnicity-related differences in clinical grades exist. Broad sampling in assessment of clinical competencies involves multiple assessments used by multiple assessors across multiple moments. Broad sampling in assessment potentially reduces irrelevant variances and may therefore mitigate ethnic disparities in clinical grades.

OBJECTIVES

Research question 1 (RQ1): to assess whether the relationship between students' ethnicity and clinical grades is weaker in a broadly sampled versus a global assessment. Research question 2 (RQ2): to assess whether larger ethnicity-related differences in grades occur when supervisors are given the opportunity to deviate from the broadly sampled assessment score.

METHODS

Students' ethnicity was classified as Turkish/Moroccan/African, Surinamese/Antillean, Asian, Western, and native Dutch. RQ1: 1667 students (74.3% native Dutch students) were included, who entered medical school between 2002 and 2004 (global assessment, 818 students) and between 2008 and 2010 (broadly sampled assessment, 849 students). The main outcome measure was whether or not students received ≥3 times a grade of 8 or higher on a scale from 1 to 10 in five clerkships. RQ2: 849 students (72.4% native Dutch students) were included, who were assessed by broad sampling. The main outcome measure was the number of grade points by which supervisors had deviated from broadly sampled scores. Both analyses were adjusted for gender, age, (im)migration status and average bachelor grade.

RESULTS

Research question 1: ethnicity-related differences in clinical grades were smaller in broadly sampled than in global assessment, and this was also seen after adjustments. More specifically, native Dutch students had reduced probabilities (0.87-0.65) in broadly sampled as compared with global assessment, whereas Surinamese (0.03-0.51) and Asian students (0.21-0.30) had increased probabilities of having ≥3 times a grade of 8 or higher in five clerkships. Research question 2: when supervisors were allowed to deviate from original grades, ethnicity-related differences in clinical grades were reintroduced.

CONCLUSIONS

Broadly sampled assessment reduces ethnicity-related differences in grades.

Collapse

Lee V, Brain K, Martin J. From opening the 'black box' to looking behind the curtain: cognition and context in assessor-based judgements. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2019;24:85-102. [PMID: 30302670 DOI: 10.1007/s10459-018-9851-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/06/2018] [Indexed: 06/08/2023]

Wood TJ, Pugh D, Touchie C, Chan J, Humphrey-Murto S. Can physician examiners overcome their first impression when examinee performance changes? ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2018;23:721-732. [PMID: 29556923 DOI: 10.1007/s10459-018-9823-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 03/15/2018] [Indexed: 06/08/2023]

Abstract

There is an increasing focus on factors that influence the variability of rater-based judgments. First impressions are one such factor. First impressions are judgments about people that are made quickly and are based on little information. Under some circumstances, these judgments can be predictive of subsequent decisions. A concern for both examinees and test administrators is whether the relationship remains stable when the performance of the examinee changes. That is, once a first impression is formed, to what degree will an examiner be willing to modify it? The purpose of this study is to determine the degree that first impressions influence final ratings when the performance of examinees changes within the context of an objective structured clinical examination (OSCE). Physician examiners (n = 29) viewed seven videos of examinees (i.e., actors) performing a physical exam on a single OSCE station. They rated the examinees' clinical abilities on a six-point global rating scale after 60 s (first impression or FIGR). They then observed the examinee for the remainder of the station and provided a final global rating (GRS). For three of the videos, the examinees' performance remained consistent throughout the videos. For two videos, examinee performance changed from initially strong to weak and for two videos, performance changed from initially weak to strong. The mean FIGR rating for the Consistent condition (M = 4.80) and the Strong to Weak condition (M = 4.87) were higher compared to their respective GRS ratings (M = 3.93, M = 2.73) with a greater decline for the Strong to Weak condition. The mean FIGR rating for the Weak to Strong condition was lower (3.60) than the corresponding mean GRS (4.81). This pattern of findings suggests that raters were willing to change their judgments based on examinee performance. Future work should explore the impact of making a first impression judgment explicit versus implicit and the role of context on the relationship between a first impression and a subsequent judgment.

Collapse

Tavares W, Sadowski A, Eva KW. Asking for Less and Getting More: The Impact of Broadening a Rater's Focus in Formative Assessment. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2018;93:1584-1590. [PMID: 29794523 DOI: 10.1097/acm.0000000000002294] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Abstract

PURPOSE

There may be unintended consequences of broadening the competencies across which health professions trainees are assessed. This study was conducted to determine whether such broadening influences the formative guidance assessors provide to trainees and to test whether sequential collection of competency-specific assessment can overcome setbacks of simultaneous collection.

METHOD

A randomized between-subjects experimental design, conducted in Toronto and Halifax, Canada, in 2016-2017 with paramedic educators experienced in observing/rating, in which observers' focus was manipulated. In the simultaneous condition, participants rated four unscripted (i.e., spontaneously generated) clinical performances using a six-dimension global rating scale and provided feedback. In three sequential conditions, participants were asked to rate the same performances and provide feedback but for only two of the six dimensions. Participants from these conditions were randomly merged to create a "full score" and set of feedback statements for each candidate.

RESULTS

Eighty-seven raters completed the study; 23 in the simultaneous condition and 21 or 22 for each pair of dimensions in the sequential conditions. After randomly merging participants, there were 21 "full scores" in the sequential condition. Compared with the sequential condition, participants in the simultaneous condition demonstrated reductions in the amount of unique feedback provided, increased likelihood of ignoring some dimensions of performance, lessened variety of feedback, and reduced reliability.

CONCLUSIONS

Sequential or distributed assessment strategies in which raters are asked to focus on less may provide more effective assessment by overcoming the unintended consequences of asking raters to spread their attention thinly over many dimensions of competence.

Collapse

Eva KW. Cognitive Influences on Complex Performance Assessment: Lessons from the Interplay between Medicine and Psychology. JOURNAL OF APPLIED RESEARCH IN MEMORY AND COGNITION 2018. [DOI: 10.1016/j.jarmac.2018.03.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]

Scaffidi MA, Grover SC, Carnahan H, Yu JJ, Yong E, Nguyen GC, Ling SC, Khanna N, Walsh CM. A prospective comparison of live and video-based assessments of colonoscopy performance. Gastrointest Endosc 2018;87:766-775. [PMID: 28859953 DOI: 10.1016/j.gie.2017.08.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 08/20/2017] [Indexed: 02/08/2023]

Abstract

BACKGROUND AND AIMS

Colonoscopy performance is typically assessed by a supervisor in the clinical setting. There are limitations of this approach, however, because it allows for rater bias and increases supervisor workload demand during the procedure. Video-based assessment of recorded procedures has been proposed as a complementary means by which to assess colonoscopy performance. This study sought to investigate the reliability, validity, and feasibility of video-based assessments of competence in performing colonoscopy compared with live assessment.

METHODS

Novice (<50 previous colonoscopies), intermediate (50-500), and experienced (>1000) endoscopists from 5 hospitals participated. Two views of each colonoscopy were videotaped: an endoscopic (intraluminal) view and a recording of the endoscopist's hand movements. Recorded procedures were independently assessed by 2 blinded experts using the Gastrointestinal Endoscopy Competency Assessment Tool (GiECAT), a validated procedure-specific assessment tool comprising a global rating scale (GRS) and checklist (CL). Live ratings were conducted by a non-blinded expert endoscopist. Outcomes included agreement between live and blinded video-based ratings of clinical colonoscopies, intra-rater reliability, inter-rater reliability and discriminative validity of video-based assessments, and perceived ease of assessment.

RESULTS

Forty endoscopists participated (20 novices, 10 intermediates, and 10 experienced). There was good agreement between the live and video-based ratings (total, intra-class correlation [ICC] = 0.847; GRS, ICC = 0.868; CL, ICC = 0.749). Intra-rater reliability was excellent (total, ICC = 0.99; GRS, ICC = 0.99; CL, ICC = 0.98). Inter-rater reliability between the 2 blinded video-based raters was high (total, ICC = 0.91; GRS, ICC = 0.918; CL, ICC = 0.862). GiECAT total, GRS, and CL scores differed significantly among novice, intermediate, and experienced endoscopists (P < .001). Video-based assessments were perceived as "fairly easy," although live assessments were rated as significantly easier (P < .001).

CONCLUSIONS

Video-based assessments of colonoscopy procedures using the GiECAT have strong evidence of reliability and validity. In addition, assessments using videos were feasible, although live assessments were easier.

Collapse

Gomez-Garibello C, Young M. Emotions and assessment: considerations for rater-based judgements of entrustment. MEDICAL EDUCATION 2018;52:254-262. [PMID: 29119582 DOI: 10.1111/medu.13476] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Revised: 03/03/2017] [Accepted: 09/08/2017] [Indexed: 06/07/2023]

Abstract

CONTEXT

Assessment is subject to increasing scrutiny as medical education transitions towards a competency-based medical education (CBME) model. Traditional perspectives on the roles of assessment emphasise high-stakes, summative assessment, whereas CBME argues for formative assessment. Revisiting conceptualisations about the roles and formats of assessment in medical education provides opportunities to examine understandings and expectations of the assessment of learners. The act of the rater generating scores might be considered as an exclusively cognitive exercise; however, current literature has drawn attention to the notion of raters as measurement instruments, thereby attributing additional factors to their decision-making processes, such as social considerations and intuition. However, the literature has not comprehensively examined the influence of raters' emotions during assessment. In this narrative review, we explore the influence of raters' emotions in the assessment of learners.

METHODS

We summarise existing literature that describes the role of emotions in assessment broadly, and rater-based assessment specifically, across a variety of fields. The literature related to emotions and assessment is examined from different perspectives, including those of educational context, decision making and rater cognition. We use the concept of entrustable professional activities (EPAs) to contextualise a discussion of the ways in which raters' emotions may have meaningful impacts on the decisions they make in clinical settings. This review summarises findings from different perspectives and identifies areas for consideration for the role of emotion in rater-based assessment, and areas for future research.

CONCLUSIONS

We identify and discuss three different interpretations of the influence of raters' emotions during assessments: (i) emotions lead to biased decision making; (ii) emotions contribute random noise to assessment, and (iii) emotions constitute legitimate sources of information that contribute to assessment decisions. We discuss these three interpretations in terms of areas for future research and implications for assessment.

Collapse

Wilbur K. Does faculty development influence the quality of in-training evaluation reports in pharmacy? BMC MEDICAL EDUCATION 2017;17:222. [PMID: 29157239 PMCID: PMC5697106 DOI: 10.1186/s12909-017-1054-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/02/2017] [Indexed: 06/02/2023]

Abstract

BACKGROUND

In-training evaluation reports (ITERs) of student workplace-based learning are completed by clinical supervisors across various health disciplines. However, outside of medicine, the quality of submitted workplace-based assessments is largely uninvestigated. This study assessed the quality of ITERs in pharmacy and whether clinical supervisors could be trained to complete higher quality reports.

METHODS

A random sample of ITERs submitted in a pharmacy program during 2013-2014 was evaluated. These ITERs served as a historical control (control group 1) for comparison with ITERs submitted in 2015-2016 by clinical supervisors who participated in an interactive faculty development workshop (intervention group) and those who did not (control group 2). Two trained independent raters scored the ITERs using a previously validated nine-item scale assessing report quality, the Completed Clinical Evaluation Report Rating (CCERR). The scoring scale for each item is anchored at 1 ("not at all") and 5 ("exemplary"), with 3 categorized as "acceptable".

RESULTS

Mean CCERR score for reports completed after the workshop (22.9 ± 3.39) did not significantly improve when compared to prospective control group 2 (22.7 ± 3.63, p = 0.84) and were worse than historical control group 1 (37.9 ± 8.21, p = 0.001). Mean item scores for individual CCERR items were below acceptable thresholds for 5 of the 9 domains in control group 1, including supervisor documented evidence of specific examples to clearly explain weaknesses and concrete recommendations for student improvement. Mean item scores for individual CCERR items were below acceptable thresholds for 6 and 7 of the 9 domains in control group 2 and the intervention group, respectively.

CONCLUSIONS

This study is the first using CCERR to evaluate ITER quality outside of medicine. Findings demonstrate low baseline CCERR scores in a pharmacy program not demonstrably changed by a faculty development workshop, but strategies are identified to augment future rater training.

Collapse

Kogan JR, Hatala R, Hauer KE, Holmboe E. Guidelines: The do's, don'ts and don't knows of direct observation of clinical skills in medical education. PERSPECTIVES ON MEDICAL EDUCATION 2017;6:286-305. [PMID: 28956293 PMCID: PMC5630537 DOI: 10.1007/s40037-017-0376-7] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Abstract

INTRODUCTION

Direct observation of clinical skills is a key assessment strategy in competency-based medical education. The guidelines presented in this paper synthesize the literature on direct observation of clinical skills. The goal is to provide a practical list of Do's, Don'ts and Don't Knows about direct observation for supervisors who teach learners in the clinical setting and for educational leaders who are responsible for clinical training programs.

METHODS

We built consensus through an iterative approach in which each author, based on their medical education and research knowledge and expertise, independently developed a list of Do's, Don'ts, and Don't Knows about direct observation of clinical skills. Lists were compiled, discussed and revised. We then sought and compiled evidence to support each guideline and determine the strength of each guideline.

RESULTS

A final set of 33 Do's, Don'ts and Don't Knows is presented along with a summary of evidence for each guideline. Guidelines focus on two groups: individual supervisors and the educational leaders responsible for clinical training programs. Guidelines address recommendations for how to focus direct observation, select an assessment tool, promote high quality assessments, conduct rater training, and create a learning culture conducive to direct observation.

CONCLUSIONS

High frequency, high quality direct observation of clinical skills can be challenging. These guidelines offer important evidence-based Do's and Don'ts that can help improve the frequency and quality of direct observation. Improving direct observation requires focus not just on individual supervisors and their learners, but also on the organizations and cultures in which they work and train. Additional research to address the Don't Knows can help educators realize the full potential of direct observation in competency-based education.

Collapse

Gingerich A, Ramlo SE, van der Vleuten CPM, Eva KW, Regehr G. Inter-rater variability as mutual disagreement: identifying raters' divergent points of view. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2017;22:819-838. [PMID: 27651046 DOI: 10.1007/s10459-016-9711-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 09/09/2016] [Indexed: 06/06/2023]

Abstract

Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.

Collapse

Lee V, Brain K, Martin J. Factors Influencing Mini-CEX Rater Judgments and Their Practical Implications: A Systematic Literature Review. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2017;92:880-887. [PMID: 28030422 DOI: 10.1097/acm.0000000000001537] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Abstract

PURPOSE

At present, little is known about how mini-clinical evaluation exercise (mini-CEX) raters translate their observations into judgments and ratings. The authors of this systematic literature review aim both to identify the factors influencing mini-CEX rater judgments in the medical education setting and to translate these findings into practical implications for clinician assessors.

METHOD

The authors searched for internal and external factors influencing mini-CEX rater judgments in the medical education setting from 1980 to 2015 using the Ovid MEDLINE, PsycINFO, ERIC, PubMed, and Scopus databases. They extracted the following information from each study: country of origin, educational level, study design and setting, type of observation, occurrence of rater training, provision of feedback to the trainee, research question, and identified factors influencing rater judgments. The authors also conducted a quality assessment for each study.

RESULTS

Seventeen articles met the inclusion criteria. The authors identified both internal and external factors that influence mini-CEX rater judgments. They subcategorized the internal factors into intrinsic rater factors, judgment-making factors (conceptualization, interpretation, attention, and impressions), and scoring factors (scoring integration and domain differentiation).

CONCLUSIONS

The current theories of rater-based judgment have not helped clinicians resolve the issues of rater idiosyncrasy, bias, gestalt, and conflicting contextual factors; therefore, the authors believe the most important solution is to increase the justification of rater judgments through the use of specific narrative and contextual comments, which are more informative for trainees. Finally, more real-world research is required to bridge the gap between the theory and practice of rater cognition.

Collapse

Scarff CE, Corderoy RM, Bearman M. In-training assessments: 'The difficulty is trying to balance reality and really tell the truth'. Australas J Dermatol 2016;59:e15-e22. [PMID: 27995625 DOI: 10.1111/ajd.12555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2016] [Accepted: 08/03/2016] [Indexed: 11/30/2022]

Castanelli DJ, Jowsey T, Chen Y, Weller JM. Perceptions of purpose, value, and process of the mini-Clinical Evaluation Exercise in anesthesia training. Can J Anaesth 2016;63:1345-1356. [DOI: 10.1007/s12630-016-0740-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 08/15/2016] [Accepted: 09/13/2016] [Indexed: 10/21/2022] Open

St-Onge C, Chamberland M, Lévesque A, Varpio L. Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2016;21:627-642. [PMID: 26620923 DOI: 10.1007/s10459-015-9656-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 11/26/2015] [Indexed: 06/05/2023]

Yeung E, Kulasagarem K, Woods N, Dubrowski A, Hodges B, Carnahan H. Validity of a new assessment rubric for a short-answer test of clinical reasoning. BMC MEDICAL EDUCATION 2016;16:192. [PMID: 27461249 PMCID: PMC4962495 DOI: 10.1186/s12909-016-0714-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 07/23/2016] [Indexed: 06/06/2023]

Abstract

BACKGROUND

The validity of high-stakes decisions derived from assessment results is of primary concern to candidates and certifying institutions in the health professions. In the field of orthopaedic manual physical therapy (OMPT), there is a dearth of documented validity evidence to support the certification process particularly for short-answer tests. To address this need, we examined the internal structure of the Case History Assessment Tool (CHAT); this is a new assessment rubric developed to appraise written responses to a short-answer test of clinical reasoning in post-graduate OMPT certification in Canada.

METHODS

Fourteen physical therapy students (novices) and 16 physical therapists (PT) with minimal and substantial OMPT training respectively completed a mock examination. Four pairs of examiners (n = 8) participated in appraising written responses using the CHAT. We conducted separate generalizability studies (G studies) for all participants and also by level of OMPT training. Internal consistency was calculated for test questions with more than 2 assessment items. Decision studies were also conducted to determine optimal application of the CHAT for OMPT certification.

RESULTS

The overall reliability of CHAT scores was found to be moderate; however, reliability estimates for the novice group suggest that the scale was incapable of accommodating for scores of novices. Internal consistency estimates indicate item redundancies for several test questions which will require further investigation.

CONCLUSION

Future validity studies should consider discriminating the clinical reasoning competence of OMPT trainees strictly at the post-graduate level. Although rater variance was low, the large variance attributed to error sources not incorporated in our G studies warrant further investigations into other threats to validity. Future examination of examiner stringency is also warranted.

Collapse

Byrne A, Soskova T, Dawkins J, Coombes L. A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance. BMC MEDICAL EDUCATION 2016;16:191. [PMID: 27455964 PMCID: PMC4960857 DOI: 10.1186/s12909-016-0708-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/08/2016] [Indexed: 05/26/2023]

Tavares W, Eva KW. Impact of rating demands on rater-based assessments of clinical competence. EDUCATION FOR PRIMARY CARE 2016;25:308-18. [DOI: 10.1080/14739879.2014.11730760] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Lee M, Wimmers PF. Validation of a performance assessment instrument in problem-based learning tutorials using two cohorts of medical students. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2016;21:341-357. [PMID: 26307371 DOI: 10.1007/s10459-015-9632-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 08/14/2015] [Indexed: 06/04/2023]

Abstract

Although problem-based learning (PBL) has been widely used in medical schools, few studies have attended to the assessment of PBL processes using validated instruments. This study examined reliability and validity for an instrument assessing PBL performance in four domains: Problem Solving, Use of Information, Group Process, and Professionalism. Two cohorts of medical students (N = 310) participated in the study, with 2 years of PBL evaluation data extracted from archive rated by a total of 158 faculty raters. Analyses based on generalizability theory were conducted for reliability examination. Validity was examined through following the Standards for Educational and Psychological Testing to evaluate content validity, response processes, construct validity, predictive validity, and the relationship to the variable of training. For construct validity, correlations of PBL scores with six other outcome measures were examined, including Medical College Admission Test, United States Medical Licensing Examination (USMLE) Step 1, National Board of Medical Examiners (NBME) Comprehensive Basic Science Examination, NBME Comprehensive Clinical Science Examination, Clinical Performance Examination, and USMLE Step 2 Clinical Knowledge. Predictive validity was examined by using PBL scores to predict five medical school outcomes. The highest percentage of PBL total score variance was associated with students (60 %), indicating students in the study differed in their PBL performance. The generalizability and dependability coefficients were moderately high (Ep(2) = .68, ϕ = .60), showing the instrument is reliable for ranking students and identifying competent PBL performers. The patterns of correlations between PBL domain scores and the outcome measures partially support construct validity. PBL performance ratings as a whole significantly (p < .01) predicted all the major medical school achievements. The second year PBL scores were significantly higher than those of the first year, indicating a training effect. Psychometric findings provided support for reliability and many aspects of validity of PBL performance assessment using the instrument.

Collapse

Gauthier G, St-Onge C, Tavares W. Rater cognition: review and integration of research findings. MEDICAL EDUCATION 2016;50:511-22. [PMID: 27072440 DOI: 10.1111/medu.12973] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 07/20/2015] [Accepted: 11/13/2015] [Indexed: 05/21/2023]

Abstract

BACKGROUND

Given the complexity of competency frameworks, associated skills and abilities, and contexts in which they are to be assessed in competency-based education (CBE), there is an increased reliance on rater judgements when considering trainee performance. This increased dependence on rater-based assessment has led to the emergence of rater cognition as a field of research in health professions education. The topic, however, is often conceptualised and ultimately investigated using many different perspectives and theoretical frameworks. Critically analysing how researchers think about, study and discuss rater cognition or the judgement processes in assessment frameworks may provide meaningful and efficient directions in how the field continues to explore the topic.

METHODS

We conducted a critical and integrative review of the literature to explore common conceptualisations and unified terminology associated with rater cognition research. We identified 1045 articles on rater-based assessment in health professions education using Scorpus, Medline and ERIC and 78 articles were included in our review.

RESULTS

We propose a three-phase framework of observation, processing and integration. We situate nine specific mechanisms and sub-mechanisms described across the literature within these phases: (i) generating automatic impressions about the person; (ii) formulating high-level inferences; (iii) focusing on different dimensions of competencies; (iv) categorising through well-developed schemata based on (a) personal concept of competence, (b) comparison with various exemplars and (c) task and context specificity; (v) weighting and synthesising information differently, (vi) producing narrative judgements; and (vii) translating narrative judgements into scales.

CONCLUSION

Our review has allowed us to identify common underlying conceptualisations of observed rater mechanisms and subsequently propose a comprehensive, although complex, framework for the dynamic and contextual nature of the rating process. This framework could help bridge the gap between researchers adopting different perspectives when studying rater cognition and enable the interpretation of contradictory findings of raters' performance by determining which mechanism is enabled or disabled in any given context.

Collapse

Kudláček M, Frömel K, Jakubec L, Groffik D. Compensation for Adolescents' School Mental Load by Physical Activity on Weekend Days. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016;13:E308. [PMID: 27005652 PMCID: PMC4808971 DOI: 10.3390/ijerph13030308] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Revised: 03/02/2016] [Accepted: 03/04/2016] [Indexed: 01/13/2023]