Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Homer M, Russell J. Conjunctive standards in OSCEs: The why and the how of number of stations passed criteria. Med Teach 2021;43:448-455. [PMID: 33290124 DOI: 10.1080/0142159x.2020.1856353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

For:	Homer M, Russell J. Conjunctive standards in OSCEs: The why and the how of number of stations passed criteria. Med Teach 2021;43:448-455. [PMID: 33290124 DOI: 10.1080/0142159x.2020.1856353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Number

Cited by Other Article(s)

Schauber SK, Olsen AO, Werner EL, Magelssen M. Inconsistencies in rater-based assessments mainly affect borderline candidates: but using simple heuristics might improve pass-fail decisions. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024;29:1749-1767. [PMID: 38649529 PMCID: PMC11549209 DOI: 10.1007/s10459-024-10328-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/24/2024] [Indexed: 04/25/2024]

Sharp S, Snowden A, Stables I, Paterson R. Ensuring robust OSCE assessments: A reflective account from a Scottish school of nursing. Nurse Educ Pract 2024;78:104021. [PMID: 38917560 DOI: 10.1016/j.nepr.2024.104021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 05/21/2024] [Accepted: 06/05/2024] [Indexed: 06/27/2024]

Abstract

AIM

This paper reflects on the experience of one Scottish University in conducting a face-to-face Objective Structured Examination (OSCE) for large cohorts of student nurses. It outlines the challenges experienced and learning gained. Borton's model of reflection frames this work due to its simplicity, ease of application and cyclical nature.

BACKGROUND

The theoretical framework for the OSCE is critical thinking, enabling students to apply those skills authentically. OSCE's are designed to transfer classroom knowledge to clinical practice and offer an authentic work-based assessment.

DESIGN

Validity and robustness are key considerations in any assessment and in OSCE, the number of stations that students encounter is important and debated. We used a case-study based OSCE approach initially over four stations and following reflection, changed to one long station with four phases.

RESULTS

In OSCE examinations, interrater reliability is a necessity, and students expect equity of approach. We identified that despite clear marking criteria, marks were polarised, with students achieving high or low marks with little middle ground. Review of examination papers highlighted that although students' overall performance was good, some had failed in at least one station, suggesting a four-station approach may skew results. On reflection we hypothesised that using a one station case study-based, phased approach enabled the examiner to build up a more holistic picture of student knowledge and skills. It also provided the student opportunity to develop a rapport with the examiner and standardised patient, thereby putting them more at ease. We argue that this approach is holistic, authentic and student centred.

CONCLUSIONS

Our experience highlights that a single station, four phase OSCE is preferrable, enabling students to integrate all aspects of the assessment and provides a holistic view of clinical skills and knowledge.

Collapse

Wong WYA, Thistlethwaite J, Moni K, Roberts C. Using cultural historical activity theory to reflect on the sociocultural complexities in OSCE examiners' judgements. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2023;28:27-46. [PMID: 35943605 PMCID: PMC9992227 DOI: 10.1007/s10459-022-10139-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 06/28/2022] [Indexed: 06/15/2023]

Abstract

Examiners' judgements play a critical role in competency-based assessments such as objective structured clinical examinations (OSCEs). The standardised nature of OSCEs and their alignment with regulatory accountability assure their wide use as high-stakes assessment in medical education. Research into examiner behaviours has predominantly explored the desirable psychometric characteristics of OSCEs, or investigated examiners' judgements from a cognitive rather than a sociocultural perspective. This study applies cultural historical activity theory (CHAT) to address this gap in exploring examiners' judgements in a high-stakes OSCE. Based on the idea that OSCE examiners' judgements are socially constructed and mediated by their clinical roles, the objective was to explore the sociocultural factors that influenced examiners' judgements of student competence and use the findings to inform examiner training to enhance assessment practice. Seventeen semi-structured interviews were conducted with examiners who assessed medical student competence in progressing to the next stage of training in a large-scale OSCE at one Australian university. The initial thematic analysis provided a basis for applying CHAT iteratively to explore the sociocultural factors and, specifically, the contradictions created by interactions between different elements such as examiners and rules, thus highlighting the factors influencing examiners' judgements. The findings indicated four key factors that influenced examiners' judgements: examiners' contrasting beliefs about the purpose of the OSCE; their varying perceptions of the marking criteria; divergent expectations of student competence; and idiosyncratic judgement practices. These factors were interrelated with the activity systems of the medical school's assessment practices and the examiners' clinical work contexts. Contradictions were identified through the guiding principles of multi-voicedness and historicity. The exploration of the sociocultural factors that may influence the consistency of examiners' judgements was facilitated by applying CHAT as an analytical framework. Reflecting upon these factors at organisational and system levels generated insights for creating fit-for-purpose examiner training to enhance assessment practice.

Collapse

Ibrahim MS, Naing NN, Abd Aziz A, Makhtar M, Mohamed Yusoff H, Esa NK, A Rahman NI, Thwe Aung MM, Oo SS, Ismail S, Ramli RA. Medical Experts' Agreement on Risk Assessment Based on All Possible Combinations of the COVID-19 Predictors-A Novel Approach for Public Health Screening and Surveillance. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:16601. [PMID: 36554487 PMCID: PMC9779080 DOI: 10.3390/ijerph192416601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 11/29/2022] [Accepted: 12/08/2022] [Indexed: 06/17/2023]

Yeates P, Maluf A, Kinston R, Cope N, McCray G, Cullen K, O'Neill V, Cole A, Goodfellow R, Vallender R, Chung CW, McKinley RK, Fuller R, Wong G. Enhancing authenticity, diagnosticity and equivalence (AD-Equiv) in multicentre OSCE exams in health professionals education: protocol for a complex intervention study. BMJ Open 2022;12:e064387. [PMID: 36600366 PMCID: PMC9730346 DOI: 10.1136/bmjopen-2022-064387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 10/12/2022] [Indexed: 12/12/2022] Open

Abstract

INTRODUCTION

Objective structured clinical exams (OSCEs) are a cornerstone of assessing the competence of trainee healthcare professionals, but have been criticised for (1) lacking authenticity, (2) variability in examiners' judgements which can challenge assessment equivalence and (3) for limited diagnosticity of trainees' focal strengths and weaknesses. In response, this study aims to investigate whether (1) sharing integrated-task OSCE stations across institutions can increase perceived authenticity, while (2) enhancing assessment equivalence by enabling comparison of the standard of examiners' judgements between institutions using a novel methodology (video-based score comparison and adjustment (VESCA)) and (3) exploring the potential to develop more diagnostic signals from data on students' performances.

METHODS AND ANALYSIS

The study will use a complex intervention design, developing, implementing and sharing an integrated-task (research) OSCE across four UK medical schools. It will use VESCA to compare examiner scoring differences between groups of examiners and different sites, while studying how, why and for whom the shared OSCE and VESCA operate across participating schools. Quantitative analysis will use Many Facet Rasch Modelling to compare the influence of different examiners groups and sites on students' scores, while the operation of the two interventions (shared integrated task OSCEs; VESCA) will be studied through the theory-driven method of Realist evaluation. Further exploratory analyses will examine diagnostic performance signals within data.

ETHICS AND DISSEMINATION

The study will be extra to usual course requirements and all participation will be voluntary. We will uphold principles of informed consent, the right to withdraw, confidentiality with pseudonymity and strict data security. The study has received ethical approval from Keele University Research Ethics Committee. Findings will be academically published and will contribute to good practice guidance on (1) the use of VESCA and (2) sharing and use of integrated-task OSCE stations.

Collapse

McGown PJ, Brown CA, Sebastian A, Le R, Amin A, Greenland A, Sam AH. Is the assumption of equal distances between global assessment categories used in borderline regression valid? BMC MEDICAL EDUCATION 2022;22:708. [PMID: 36199083 PMCID: PMC9536020 DOI: 10.1186/s12909-022-03753-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/12/2022] [Indexed: 06/16/2023]

Abstract

BACKGROUND

Standard setting for clinical examinations typically uses the borderline regression method to set the pass mark. An assumption made in using this method is that there are equal intervals between global ratings (GR) (e.g. Fail, Borderline Pass, Clear Pass, Good and Excellent). However, this assumption has never been tested in the medical literature to the best of our knowledge. We examine if the assumption of equal intervals between GR is met, and the potential implications for student outcomes.

METHODS

Clinical finals examiners were recruited across two institutions to place the typical 'Borderline Pass', 'Clear Pass' and 'Good' candidate on a continuous slider scale between a typical 'Fail' candidate at point 0 and a typical 'Excellent' candidate at point 1. Results were analysed using one-sample t-testing of each interval to an equal interval size of 0.25. Secondary data analysis was performed on summative assessment scores for 94 clinical stations and 1191 medical student examination outcomes in the final 2 years of study at a single centre.

RESULTS

On a scale from 0.00 (Fail) to 1.00 (Excellent), mean examiner GRs for 'Borderline Pass', 'Clear Pass' and 'Good' were 0.33, 0.55 and 0.77 respectively. All of the four intervals between GRs (Fail-Borderline Pass, Borderline Pass-Clear Pass, Clear Pass-Good, Good-Excellent) were statistically significantly different to the expected value of 0.25 (all p-values < 0.0125). An ordinal linear regression using mean examiner GRs was performed for each of the 94 stations, to determine pass marks out of 24. This increased pass marks for all 94 stations compared with the original GR locations (mean increase 0.21), and caused one additional fail by overall exam pass mark (out of 1191 students) and 92 additional station fails (out of 11,346 stations).

CONCLUSIONS

Although the current assumption of equal intervals between GRs across the performance spectrum is not met, and an adjusted regression equation causes an increase in station pass marks, the effect on overall exam pass/fail outcomes is modest.

Collapse

Homer M. Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2022;27:457-473. [PMID: 35230590 PMCID: PMC9117341 DOI: 10.1007/s10459-022-10096-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 01/23/2022] [Indexed: 06/14/2023]

Abstract

Variation in examiner stringency is a recognised problem in many standardised summative assessments of performance such as the OSCE. The stated strength of the OSCE is that such error might largely balance out over the exam as a whole. This study uses linear mixed models to estimate the impact of different factors (examiner, station, candidate and exam) on station-level total domain score and, separately, on a single global grade. The exam data is from 442 separate administrations of an 18 station OSCE for international medical graduates who want to work in the National Health Service in the UK. We find that variation due to examiner is approximately twice as large for domain scores as it is for grades (16% vs. 8%), with smaller residual variance in the former (67% vs. 76%). Combined estimates of exam-level (relative) reliability across all data are 0.75 and 0.69 for domains scores and grades respectively. The correlation between two separate estimates of stringency for individual examiners (one for grades and one for domain scores) is relatively high (r=0.76) implying that examiners are generally quite consistent in their stringency between these two assessments of performance. Cluster analysis indicates that examiners fall into two broad groups characterised as hawks or doves on both measures. At the exam level, correcting for examiner stringency produces systematically lower cut-scores under borderline regression standard setting than using the raw marks. In turn, such a correction would produce higher pass rates-although meaningful direct comparisons are challenging to make. As in other studies, this work shows that OSCEs and other standardised performance assessments are subject to substantial variation in examiner stringency, and require sufficient domain sampling to ensure quality of pass/fail decision-making is at least adequate. More, perhaps qualitative, work is needed to understand better how examiners might score similarly (or differently) between the awarding of station-level domain scores and global grades. The issue of the potential systematic bias of borderline regression evidenced for the first time here, with sources of error producing cut-scores higher than they should be, also needs more investigation.

Collapse

Yeates P, Moult A, Cope N, McCray G, Fuller R, McKinley R. Determining influence, interaction and causality of contrast and sequence effects in objective structured clinical exams. MEDICAL EDUCATION 2022;56:292-302. [PMID: 34893998 PMCID: PMC9304241 DOI: 10.1111/medu.14713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/03/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]

Abstract

INTRODUCTION

Differential rater function over time (DRIFT) and contrast effects (examiners' scores biased away from the standard of preceding performances) both challenge the fairness of scoring in objective structured clinical exams (OSCEs). This is important as, under some circumstances, these effects could alter whether some candidates pass or fail assessments. Benefitting from experimental control, this study investigated the causality, operation and interaction of both effects simultaneously for the first time in an OSCE setting.

METHODS

We used secondary analysis of data from an OSCE in which examiners scored embedded videos of student performances interspersed between live students. Embedded video position varied between examiners (early vs. late) whilst the standard of preceding performances naturally varied (previous high or low). We examined linear relationships suggestive of DRIFT and contrast effects in all within-OSCE data before comparing the influence and interaction of 'early' versus 'late' and 'previous high' versus 'previous low' conditions on embedded video scores.

RESULTS

Linear relationships data did not support the presence of DRIFT or contrast effects. Embedded videos were scored higher early (19.9 [19.4-20.5]) versus late (18.6 [18.1-19.1], p < 0.001), but scores did not differ between previous high and previous low conditions. The interaction term was non-significant.

CONCLUSIONS

In this instance, the small DRIFT effect we observed on embedded videos can be causally attributed to examiner behaviour. Contrast effects appear less ubiquitous than some prior research suggests. Possible mediators of these finding include the following: OSCE context, detail of task specification, examiners' cognitive load and the distribution of learners' ability. As the operation of these effects appears to vary across contexts, further research is needed to determine the prevalence and mechanisms of contrast and DRIFT effects, so that assessments may be designed in ways that are likely to avoid their occurrence. Quality assurance should monitor for these contextually variable effects in order to ensure OSCE equivalence.

Collapse