1
|
Schauber SK, Olsen AO, Werner EL, Magelssen M. Inconsistencies in rater-based assessments mainly affect borderline candidates: but using simple heuristics might improve pass-fail decisions. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024; 29:1749-1767. [PMID: 38649529 PMCID: PMC11549209 DOI: 10.1007/s10459-024-10328-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/24/2024] [Indexed: 04/25/2024]
Abstract
INTRODUCTION Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that inconsistencies in ratings depend on the level of performance a to-be-evaluated candidate shows. This possibility has not been addressed deliberately and with appropriate statistical methods. By adopting the theoretical lens of ecological rationality, we evaluate if easily implementable strategies can enhance decision making in real-world assessment contexts. METHODS We address two objectives. First, we investigate the dependence of rater-consistency on performance levels. We recorded videos of mock-exams and had examiners (N=10) evaluate four students' performances and compare inconsistencies in performance ratings between examiner-pairs using a bootstrapping procedure. Our second objective is to provide an approach that aids decision making by implementing simple heuristics. RESULTS We found that discrepancies were largely a function of the level of performance the candidates showed. Lower performances were rated more inconsistently than excellent performances. Furthermore, our analyses indicated that the use of simple heuristics might improve decisions in examiner pairs. DISCUSSION Inconsistencies in performance judgments continue to be a matter of concern, and we provide empirical evidence for them to be related to candidate performance. We discuss implications for research and the advantages of adopting the perspective of ecological rationality. We point to directions both for further research and for development of assessment practices.
Collapse
Affiliation(s)
- Stefan K Schauber
- Centre for Health Sciences Education, Faculty of Medicine, University of Oslo, Oslo, Norway.
- Centre for Educational Measurement (CEMO), Faculty of Educational Sciences, University of Oslo, Oslo, Norway.
| | - Anne O Olsen
- Department of Community Medicine and Global Health, Institute of Health and Society, University of Oslo, Oslo, Norway
| | - Erik L Werner
- Department of General Practice, Institute of Health and Society, University of Oslo, Oslo, Norway
| | - Morten Magelssen
- Centre for Medical Ethics, Institute of Health and Society, University of Oslo, Oslo, Norway
| |
Collapse
|
2
|
Wood TJ, Daniels VJ, Pugh D, Touchie C, Halman S, Humphrey-Murto S. Implicit versus explicit first impressions in performance-based assessment: will raters overcome their first impressions when learner performance changes? ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2024; 29:1155-1168. [PMID: 38010576 DOI: 10.1007/s10459-023-10302-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/12/2023] [Indexed: 11/29/2023]
Abstract
First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to assess if first impressions affect raters' judgments when workplace performance changes. Second, whether explicitly stating these impressions affects subsequent ratings compared to implicitly-formed first impressions. Physician raters viewed six videos where learner performance either changed (Strong to Weak or Weak to Strong) or remained consistent. Raters were assigned two groups. Group one (n = 23, Explicit) made a first impression global rating (FIGR), then scored learners using the Mini-CEX. Group two (n = 22, Implicit) scored learners at the end of the video solely with the Mini-CEX. For the Explicit group, in the Strong to Weak condition, the FIGR (M = 5.94) was higher than the Mini-CEX Global rating (GR) (M = 3.02, p < .001). In the Weak to Strong condition, the FIGR (M = 2.44) was lower than the Mini-CEX GR (M = 3.96 p < .001). There was no difference between the FIGR and the Mini-CEX GR in the consistent condition (M = 6.61, M = 6.65 respectively, p = .84). There were no statistically significant differences in any of the conditions when comparing both groups' Mini-CEX GR. Therefore, raters adjusted their judgments based on the learners' performances. Furthermore, raters who made their first impressions explicit showed similar rater bias to raters who followed a more naturalistic process.
Collapse
Affiliation(s)
- Timothy J Wood
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada.
| | - Vijay J Daniels
- Department of Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Canada
| | - Debra Pugh
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
- Medical Council of Canada, Ottawa, Canada
| | - Claire Touchie
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Samantha Halman
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Susan Humphrey-Murto
- Faculty of Medicine, University of Ottawa, 850 Peter Morand Crescent, Ottawa, ON, K1G-5Z3, Canada
- Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| |
Collapse
|
3
|
Urbančič J, Battelino S, Bošnjak R, Felbabić T, Steiner N, Vouk M, Vrabec M, Vozel D. A Multidisciplinary Skull Base Board for Tumour and Non-Tumour Diseases: Initial Experiences. J Pers Med 2024; 14:82. [PMID: 38248783 PMCID: PMC10817258 DOI: 10.3390/jpm14010082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 01/05/2024] [Accepted: 01/09/2024] [Indexed: 01/23/2024] Open
Abstract
The skull base is the area where various cancerous and non-cancerous diseases occur and represents the intersection of several medical fields. The key is an integrated treatment by specialists of multiple disciplines. We prospectively analysed patients with a skull base disease between August 2022 and 2023 and presented to the Multidisciplinary Skull Base Board (MDT-SB), which takes place once a month hybridly (in-person and remotely). Thirty-nine patients (median age of 58.2 years) were included, of which twelve (30.8%) had a benign tumour, twelve (30.8%) had a malignant tumour, five had an infection (12.8%), and ten (25.6%) had other diseases. For each patient, at least two otorhinolaryngologists, a neurosurgeon, and a neuroradiologist, as well as an infectious disease specialist, a paediatrician, an oculoplastic surgeon, a maxillofacial surgeon, and a pathologist were involved in 10%, 8%, 8%, 3%, and 3% of cases, respectively. In fifteen patients (38%), the MDT-SB suggested surgical treatment; in fourteen (36%), radiological follow-ups; in five (13%), non-surgical treatments; in two, conservative treatments (5%); in two (5%), surgical and conservative treatments; and in one (3%), a biopsy. Non-cancerous and cancerous diseases of the skull base in adults and children should be presented to the MDT-SB, which consists of at least an otolaryngologist, a neurosurgeon, and a neuroradiologist.
Collapse
Affiliation(s)
- Jure Urbančič
- Department of Otorhinolaryngology, Faculty of Medicine, University of Ljubljana, Vrazov Trg 2, 1000 Ljubljana, Slovenia
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
| | - Saba Battelino
- Department of Otorhinolaryngology, Faculty of Medicine, University of Ljubljana, Vrazov Trg 2, 1000 Ljubljana, Slovenia
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
| | - Roman Bošnjak
- Department of Neurosurgery, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
- Department of Surgery, Faculty of Medicine, University of Ljubljana, Vrazov Trg 2, 1000 Ljubljana, Slovenia
| | - Tomislav Felbabić
- Department of Neurosurgery, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
| | - Nejc Steiner
- Department of Otorhinolaryngology, Faculty of Medicine, University of Ljubljana, Vrazov Trg 2, 1000 Ljubljana, Slovenia
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
| | - Matej Vouk
- Department of Radiology, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
| | - Matej Vrabec
- Medilab Diagnostic Imaging, Vodovodna 100, 1000 Ljubljana, Slovenia
- Department of Diagnostic and Interventional Radiology, General Hospital Slovenj Gradec, Gosposvetska Cesta 1, 2380 Slovenj Gradec, Slovenia
| | - Domen Vozel
- Department of Otorhinolaryngology, Faculty of Medicine, University of Ljubljana, Vrazov Trg 2, 1000 Ljubljana, Slovenia
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloška 2, 1000 Ljubljana, Slovenia
| |
Collapse
|
4
|
Park HJ, Kim SH, Choi JY, Cha D. Human-machine cooperation meta-model for clinical diagnosis by adaptation to human expert's diagnostic characteristics. Sci Rep 2023; 13:16204. [PMID: 37758800 PMCID: PMC10533492 DOI: 10.1038/s41598-023-43291-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/21/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) using deep learning approaches the capabilities of human experts in medical image diagnosis. However, due to liability issues in medical decisions, AI is often relegated to an assistant role. Based on this responsibility constraint, the effective use of AI to assist human intelligence in real-world clinics remains a challenge. Given the significant inter-individual variations in clinical decisions among physicians based on their expertise, AI needs to adapt to individual experts, complementing weaknesses and enhancing strengths. For this adaptation, AI should not only acquire domain knowledge but also understand the specific human experts it assists. This study introduces a meta-model for human-machine cooperation that first evaluates each expert's class-specific diagnostic tendencies using conditional probability, based on which the meta-model adjusts the AI's predictions. This meta-model was applied to ear disease diagnosis using otoendoscopy, highlighting improved performance when incorporating individual diagnostic characteristics, even with limited evaluation data. The highest accuracy was achieved by combining each expert's conditional probabilities with machine classification probability, using optimal weights specific to each individual's overall classification accuracy. This tailored model aims to mitigate potential misjudgments due to psychological effects caused by machine suggestions and to capitalize on the unique expertise of individual clinicians.
Collapse
Affiliation(s)
- Hae-Jeong Park
- Department of Nuclear Medicine, Department of Psychiatry, Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, South Korea.
- Department of Cognitive Science, Yonsei University, Seoul, Republic of Korea.
- Center for Systems and Translational Brain Sciences, Institute of Human Complexity and Systems Science, Yonsei University, 50-1, Yonsei-ro, Sinchon-dong, Seodaemun-gu, Seoul, 03722, Republic of Korea.
| | - Sung Huhn Kim
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, South Korea
| | - Jae Young Choi
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, South Korea
| | - Dongchul Cha
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, South Korea.
- Center for Innovative Medicine, Healthcare Lab, NAVER Corporation, 95, Jeongjail-ro, Bundang-gu, Seongnam-si, Gyeonggi-do, 13561, Republic of Korea.
- Healthcare Lab, Naver Cloud Corporation, Seongnam-si, Republic of Korea.
| |
Collapse
|
5
|
Klusmann D, Knorr M, Hampe W. Exploring the relationships between first impressions and MMI ratings: a pilot study. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2023; 28:519-536. [PMID: 36053344 PMCID: PMC10169880 DOI: 10.1007/s10459-022-10151-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 06/28/2022] [Indexed: 05/11/2023]
Abstract
The phenomenon of first impression is well researched in social psychology, but less so in the study of OSCEs and the multiple mini interview (MMI). To explore its bearing on the MMI method we included a rating of first impression in the MMI for student selection executed 2012 at the University Medical Center Hamburg-Eppendorf, Germany (196 applicants, 26 pairs of raters) and analyzed how it was related to MMI performance ratings made by (a) the same rater, and (b) a different rater. First impression was assessed immediately after an applicant entered the test room. Each MMI-task took 5 min and was rated subsequently. Internal consistency was α = .71 for first impression and α = .69 for MMI performance. First impression and MMI performance correlated by r = .49. Both measures weakly predicted performance in two OSCEs for communication skills, assessed 18 months later. MMI performance did not increment prediction above the contribution of first impression and vice versa. Prediction was independent of whether or not the rater who rated first impression also rated MMI performance. The correlation between first impression and MMI-performance is in line with the results of corresponding social psychological studies, showing that judgements based on minimal information moderately predict behavioral measures. It is also in accordance with the notion that raters often blend their specific assessment task outlined in MMI-instructions with the self-imposed question of whether a candidate would fit the role of a medical doctor.
Collapse
Affiliation(s)
- Dietrich Klusmann
- Institute of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf (UKE), N41, Martinistr, 52, 20246, Hamburg, Germany.
| | - Mirjana Knorr
- Institute of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf (UKE), N41, Martinistr, 52, 20246, Hamburg, Germany
| | - Wolfgang Hampe
- Institute of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf (UKE), N41, Martinistr, 52, 20246, Hamburg, Germany
| |
Collapse
|
6
|
Yeates P, McCray G, Moult A, Cope N, Fuller R, McKinley R. Determining the influence of different linking patterns on the stability of students' score adjustments produced using Video-based Examiner Score Comparison and Adjustment (VESCA). BMC MEDICAL EDUCATION 2022; 22:41. [PMID: 35039023 PMCID: PMC8764767 DOI: 10.1186/s12909-022-03115-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 01/05/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Ensuring equivalence of examiners' judgements across different groups of examiners is a priority for large scale performance assessments in clinical education, both to enhance fairness and reassure the public. This study extends insight into an innovation called Video-based Examiner Score Comparison and Adjustment (VESCA) which uses video scoring to link otherwise unlinked groups of examiners. This linkage enables comparison of the influence of different examiner-groups within a common frame of reference and provision of adjusted "fair" scores to students. Whilst this innovation promises substantial benefit to quality assurance of distributed Objective Structured Clinical Exams (OSCEs), questions remain about how the resulting score adjustments might be influenced by the specific parameters used to operationalise VESCA. Research questions, How similar are estimates of students' score adjustments when the model is run with either: fewer comparison videos per participating examiner?; reduced numbers of participating examiners? METHODS Using secondary analysis of recent research which used VESCA to compare scoring tendencies of different examiner groups, we made numerous copies of the original data then selectively deleted video scores to reduce the number of 1/ linking videos per examiner (4 versus several permutations of 3,2,or 1 videos) or 2/examiner participation rates (all participating examiners (76%) versus several permutations of 70%, 60% or 50% participation). After analysing all resulting datasets with Many Facet Rasch Modelling (MFRM) we calculated students' score adjustments for each dataset and compared these with score adjustments in the original data using Spearman's correlations. RESULTS Students' score adjustments derived form 3 videos per examiner correlated highly with score adjustments derived from 4 linking videos (median Rho = 0.93,IQR0.90-0.95,p < 0.001), with 2 (median Rho 0.85,IQR0.81-0.87,p < 0.001) and 1 linking videos (median Rho = 0.52(IQR0.46-0.64,p < 0.001) producing progressively smaller correlations. Score adjustments were similar for 76% participating examiners and 70% (median Rho = 0.97,IQR0.95-0.98,p < 0.001), and 60% (median Rho = 0.95,IQR0.94-0.98,p < 0.001) participation, but were lower and more variable for 50% examiner participation (median Rho = 0.78,IQR0.65-0.83, some ns). CONCLUSIONS Whilst VESCA showed some sensitivity to the examined parameters, modest reductions in examiner participation rates or video numbers produced highly similar results. Employing VESCA in distributed or national exams could enhance quality assurance or exam fairness.
Collapse
Affiliation(s)
- Peter Yeates
- School of Medicine, David Weatherall Building, Keele University, Keele, Staffordshire, ST5 5BG, UK.
- Fairfield General Hospital, Northern Care Alliance NHS Foundation Trust, Rochdale Old Road, Bury, BL9 7TD, Lancashire, UK.
| | - Gareth McCray
- School of Medicine, David Weatherall Building, Keele University, Keele, Staffordshire, ST5 5BG, UK
| | - Alice Moult
- School of Medicine, David Weatherall Building, Keele University, Keele, Staffordshire, ST5 5BG, UK
| | - Natalie Cope
- School of Medicine, David Weatherall Building, Keele University, Keele, Staffordshire, ST5 5BG, UK
| | - Richard Fuller
- Christie Education, Christie Hospitals NHS Foundation Trust, Wilmslow Rd, Manchester, M20 4BX, UK
| | - Robert McKinley
- School of Medicine, David Weatherall Building, Keele University, Keele, Staffordshire, ST5 5BG, UK
| |
Collapse
|
7
|
Tavares W, Hodwitz K, Rowland P, Ng S, Kuper A, Friesen F, Shwetz K, Brydges R. Implicit and inferred: on the philosophical positions informing assessment science. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:1597-1623. [PMID: 34370126 DOI: 10.1007/s10459-021-10063-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 07/25/2021] [Indexed: 06/13/2023]
Abstract
Assessment practices have been increasingly informed by a range of philosophical positions. While generally beneficial, the addition of options can lead to misalignment in the philosophical assumptions associated with different features of assessment (e.g., the nature of constructs and competence, ways of assessing, validation approaches). Such incompatibility can threaten the quality and defensibility of researchers' claims, especially when left implicit. We investigated how authors state and use their philosophical positions when designing and reporting on performance-based assessments (PBA) of intrinsic roles, as well as the (in)compatibility of assumptions across assessment features. Using a representative sample of studies examining PBA of intrinsic roles, we used qualitative content analysis to extract data on how authors enacted their philosophical positions across three key assessment features: (1) construct conceptualizations, (2) assessment activities, and (3) validation methods. We also examined patterns in philosophical positioning across features and studies. In reviewing 32 papers from established peer-reviewed journals, we found (a) authors rarely reported their philosophical positions, meaning underlying assumptions could only be inferred; (b) authors approached features of assessment in variable ways that could be informed by or associated with different philosophical assumptions; (c) we experienced uncertainty in determining (in)compatibility of philosophical assumptions across features. Authors' philosophical positions were often vague or absent in the selected contemporary assessment literature. Leaving such details implicit may lead to misinterpretation by knowledge users wishing to implement, build on, or evaluate the work. As such, assessing claims, quality and defensibility, may increasingly depend more on who is interpreting, rather than what is being interpreted.
Collapse
Affiliation(s)
- Walter Tavares
- The Wilson Centre, Temerty Faculty of Medicine, Department of Medicine, Institute for Health Policy, Management and Evaluation, University of Toronto/University Health Network, Toronto, Ontario, Canada.
| | - Kathryn Hodwitz
- Li Ka Shing Knowledge Institute, St. Michaels Hospital, Toronto, Ontario, Canada
| | - Paula Rowland
- The Wilson Centre, Temerty Faculty of Medicine, Department of Occupational Therapy and Occupational Science, University of Toronto/University Health Network, Toronto, Ontario , Canada
| | - Stella Ng
- The Wilson Centre, Temerty Faculty of Medicine, Department of Speech-Language Pathology, Temerty Faculty of Medicine, The Wilson Centre, University of Toronto, Centre for Faculty Development, Unity Health Toronto, Toronto, Ontario, Canada
| | - Ayelet Kuper
- The Wilson Centre, University Health Network/University of Toronto, Division of General Internal Medicine, Sunnybrook Health Sciences Centre, Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Farah Friesen
- Centre for Faculty Development, Temerty Faculty of Medicine, University of Toronto at Unity Health Toronto, Toronto, Ontario, Canada
| | - Katherine Shwetz
- Department of English, University of Toronto, Toronto, Ontario, Canada
| | - Ryan Brydges
- The Wilson Centre, Temerty Faculty of Medicine, Department of Medicine, Unity Health Toronto, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
8
|
Tepeš I, Košak Soklič T, Urbančič J. The agreement of the endoscopic Modified Lund-Kennedy scoring in a clinical research group: An observational study. Eur Ann Otorhinolaryngol Head Neck Dis 2021; 139:185-188. [PMID: 34654664 DOI: 10.1016/j.anorl.2021.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 08/16/2021] [Accepted: 08/26/2021] [Indexed: 11/25/2022]
Abstract
OBJECTIVES The main objective was to prove the robustness of the modified Lund-Kennedy staging system and its use in the clinical research group. Secondary objectives were to evaluate the physicians' homogeneity, identify outliers with an unacceptable agreement and define factors for questionable agreement within the group of raters. MATERIAL AND METHODS Anonymized endoscopic photos of patients with chronic rhinosinusitis were assessed by independent raters from a clinical research group. The level of agreement between raters was calculated using intra-class correlation and weighted kappa coefficient. Clusters of similarity were identified using Inter-Item Correlation Matrix. The weighted kappa coefficient was calculated for the most homogeneous group and outliers. Age, sex, consultancy years, combined clinical and research work assessed by 5 senior peers were also statistically compared between raters. RESULTS Intraclass-correlation coefficients were 0.75 and 0.95 for respectively single and average measures. Single measures value for most homogenous raters was 0.97 (weighted kappa 0.88, (P<0.001). One outlier with less research work score had an unacceptable agreement for single measures coefficient values with the 2 most homogenous raters (respectively 0.59, weighted kappa 0.15, P=0.32 and 0.57, weighted kappa 0.197, P=0.32). Pooled groups were similar in age (P=0.3), sex (P=0.1) and consultancy years (P=0.2) but significantly differentiated in peer-assessed clinical and research work score (P<0.001). CONCLUSION Even with a perfect overall agreement, careful examination of correlation matrix revealed an obvious outlier with less than ideal performance. The method may be helpful when studies using endoscopic staging system are designed to involve researchers from different backgrounds. When exploring the most common factors, education and clinical experience play a paramount role.
Collapse
Affiliation(s)
- I Tepeš
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloska 2, SI-1000 Ljubljana, Slovenia
| | - T Košak Soklič
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloska 2, SI-1000 Ljubljana, Slovenia; Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1000 Ljubljana, Slovenia
| | - J Urbančič
- Department of Otorhinolaryngology and Cervicofacial Surgery, University Medical Centre Ljubljana, Zaloska 2, SI-1000 Ljubljana, Slovenia; Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1000 Ljubljana, Slovenia.
| |
Collapse
|
9
|
Coertjens L, Lesterhuis M, De Winter BY, Goossens M, De Maeyer S, Michels NRM. Improving Self-Reflection Assessment Practices: Comparative Judgment as an Alternative to Rubrics. TEACHING AND LEARNING IN MEDICINE 2021; 33:525-535. [PMID: 33571014 DOI: 10.1080/10401334.2021.1877709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 12/05/2020] [Accepted: 01/17/2021] [Indexed: 06/12/2023]
Abstract
CONSTRUCT The authors aimed to investigate the utility of the comparative judgment method for assessing students' written self-reflections. BACKGROUND Medical practitioners' reflective skills are increasingly considered important and therefore included in the medical education curriculum. However, assessing students' reflective skills using rubrics does not appear to guarantee adequate inter-rater reliabilities. Recently, comparative judgment was introduced as a new method to evaluate performance assessments. This study investigates the merits and limitations of the comparative judgment method for assessing students' written self-reflections. More specifically, it examines the reliability in relation to the time spent assessing, the correlation between the scores obtained using the two methods (rubrics and comparative judgment), and, raters' perceptions of the comparative judgment method. APPROACH Twenty-two self-reflections, that had previously been scored using a rubric, were assessed by a group of eight raters using comparative judgment. Two hundred comparisons were completed and a rank order was calculated. Raters' impressions were investigated using a focus group. FINDINGS Using comparative judgment, each self-reflection needed to be compared seven times with another self-reflection to reach a scale separation reliability of .55. The inter-rater reliability of rating (ICC, (1, k)) using rubrics was .56. The time investment required for these reliability levels in both methods was around 24 minutes. The Kendall's tau rank correlation indicated a strong correlation between the scores obtained via both methods. Raters reported that making comparisons made them evaluate the quality of self-reflections in a more nuanced way. Time investment was, however, considered heavy, especially for the first comparisons. Although raters appreciated that they did not have to assign a grade to each self-reflection, the fact that the method does not automatically lead to a grade or feedback was considered a downside. CONCLUSIONS First evidence was provided for the comparative judgment method as an alternative to using rubrics for assessing students' written self-reflections. Before comparative judgment can be implemented for summative assessment, more research is needed on the time investment required to ensure no contradictory feedback is given back to students. Moreover, as the comparative judgment method requires an additional standard setting exercise to obtain grades, more research is warranted on the merits and limitations of this method when a pass/fail approach is used.
Collapse
Affiliation(s)
- Liesje Coertjens
- Psychological Sciences Research Institute, Université catholique de Louvain, Louvain-la-Neuve, Belgium
- Department of Educational Sciences, Faculty of Social Sciences, University of Antwerp, Antwerp, Belgium
| | - Marije Lesterhuis
- Department of Educational Sciences, Faculty of Social Sciences, University of Antwerp, Antwerp, Belgium
| | - Benedicte Y De Winter
- Skills Lab at the Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Maarten Goossens
- Department of Educational Sciences, Faculty of Social Sciences, University of Antwerp, Antwerp, Belgium
| | - Sven De Maeyer
- Department of Educational Sciences, Faculty of Social Sciences, University of Antwerp, Antwerp, Belgium
| | - Nele R M Michels
- Skills Lab at the Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
10
|
Moult A, McKinley RK, Yeates P. Understanding patient involvement in judging students' communication skills in OSCEs. MEDICAL TEACHER 2021; 43:1070-1078. [PMID: 34496725 DOI: 10.1080/0142159x.2021.1915467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
INTRODUCTION Communication skills are assessed by medically-enculturated examiners using consensus frameworks which were developed with limited patient involvement. Assessments consequently risk rewarding performance which incompletely serves patients' authentic communication needs. Whilst regulators require patient involvement in assessment, little is known about how this can be achieved. We aimed to explore patients' perceptions of students' communication skills, examiner feedback and potential roles for patients in assessment. METHODS Using constructivist grounded theory we performed cognitive stimulated, semi-structured interviews with patients who watched videos of student performances in communication-focused OSCE stations and read corresponding examiner feedback. Data were analysed using grounded theory methods. RESULTS A disconnect occurred between participants' and examiners' views of students' communication skills. Whilst patients frequently commented on students' use of medical terminology, examiners omitted to mention this in feedback. Patients' judgements of students' performances varied widely, reflecting different preferences and beliefs. Participants viewed variability as an opportunity for students to learn from diverse lived experiences. Participants perceived a variety of roles to enhance assessment authenticity. DISCUSSION Integrating patients into communications skills assessments could help to highlight deficiencies in students' communication which medically-enculturated examiners may miss. Overcoming the challenges inherent to this is likely to enhance graduates' preparedness for practice.
Collapse
Affiliation(s)
- Alice Moult
- School of Medicine, Keele University, Keele, UK
| | | | | |
Collapse
|
11
|
Yau SY, Babovič M, Liu GRJ, Gugel A, Monrouxe LV. Differing viewpoints around healthcare professions' education research priorities: A Q-methodology approach. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2021; 26:975-999. [PMID: 33570670 DOI: 10.1007/s10459-021-10030-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 01/13/2021] [Indexed: 06/12/2023]
Abstract
Recently, due to scarce resources and the need to provide an evidence-base for healthcare professions' education (HPE), HPE research centres internationally have turned to identifying priorities for their research efforts. Engaging a range of stakeholders in research priority setting exercises has been posited as one way to address the issues around reducing researcher bias and increasing social accountability. However, assigning individuals to single a priori stakeholder groups is complex, with previous research overlooking cross-category membership and agreement between individuals across groups. Further, analyses have pitched stakeholder groups against one another in an attempt to understand who prioritises what, and often fails to grasp rationales underlying priorities. A deeper understanding of who prioritises what research areas and why is required to consider applicability of results across contexts and deepen social accountability and transferability. A web-based Q-methodological approach with n=91 participants (who) from ten pre-classified stakeholder groups was employed with post-sort interviews (why). Sixty-seven Q-set items (Chinese/English languages) were developed from previous research (what). Participants were mainly from Taiwan, although international researchers were included. Q-sorting was undertaken in groups or individually, followed by post-sort interviews. Eighty-six participants' Q-sorts were included in the final analysis. Intercorrelations among Q-sorts were factor-analysed (Centroid method) and rotated analytically (Varimax method). Interviews were thematically analysed. Six Viewpoints with eigenvalues exceeding 1 were identified (range = 3.55-10.34; 42% total variance; 35/67 topics), mapping high/low priorities for research foci: Workplace teaching and learning; Patient dignity and healthcare safety; Professionalism and healthcare professionals' development; Medical ethics and moral development; Healthcare professionals' retention and success; Preparing for clinical practice. Eighteen rationales for prioritisation were identified: impact, organisational culture and deficit of educators/practitioners were most highly cited. Each Viewpoint, held by multiple stakeholders, comprised a unique set of topic-groupings, target study participants, beneficiaries and rationales. The two most prolific Viewpoints represent how different stakeholder groups highlight key complementary perspectives of healthcare professions' education in the workplace (efficacy of teaching/learning practices, application of knowledge/values). By illuminating the detail around each Viewpoint, and presenting an holistic description of the who-what-why in research priority setting, others wishing to undertake such an exercise can more easily identify how stakeholder Viewpoints and their epistemic beliefs can help shape healthcare professions' research agendas more generally.
Collapse
Affiliation(s)
- Sze-Yuen Yau
- (CG-MERC) Chang Gung Medical Education Research Centre, Linkou, Taiwan, Republic of China
| | - Mojca Babovič
- (CG-MERC) Chang Gung Medical Education Research Centre, Linkou, Taiwan, Republic of China
| | - Garrett Ren-Jie Liu
- (CG-MERC) Chang Gung Medical Education Research Centre, Linkou, Taiwan, Republic of China
| | - Arthur Gugel
- (CG-MERC) Chang Gung Medical Education Research Centre, Linkou, Taiwan, Republic of China
| | - Lynn V Monrouxe
- The Faculty of Medicine and Health, The University of Sydney, Level 7, Susan Wakil Health Building D18, NSW, 2006, Australia.
| |
Collapse
|
12
|
Sleiman J, Savage DJ, Switzer B, Colbert CY, Chevalier C, Neuendorf K, Harris D. Teaching residents how to break bad news: piloting a resident-led curriculum and feedback task force as a proof-of-concept study. BMJ SIMULATION & TECHNOLOGY ENHANCED LEARNING 2021; 7:568-574. [DOI: 10.1136/bmjstel-2021-000897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/12/2021] [Indexed: 11/04/2022]
Abstract
BackgroundBreaking bad news (BBN) is a critically important skill set for residents. Limited formal supervision and unpredictable timing of bad news delivery serve as barriers to the exchange of meaningful feedback.Purpose of studyThe goal of this educational innovation was to improve internal medicine residents’ communication skills during challenging BBN encounters. A formal BBN training programme and innovative on-demand task force were part of this two-phase project.Study designInternal medicine residents at a large academic medical centre participated in an interactive workshop focused on BBN. Workshop survey results served as a needs assessment for the development of a novel resident-led BBN task force. The task force was created to provide observations at the bedside and feedback after BBN encounters. Training of task force members incorporated video triggers and a feedback checklist. Inter-rater reliability was analysed prior to field testing, which provided data on real-world implementation challenges.Results148 residents were trained during the 2-hour communications skills workshop. Based on survey results, 73% (108 of 148) of the residents indicated enhanced confidence in BBN after participation. Field testing of the task force on a hospital ward revealed potential workflow barriers for residents requesting observations and prompted troubleshooting. Solutions were implemented based on field testing results.ConclusionsA trainee-led BBN task force and communication skills workshop is offered as an innovative model for improving residents’ interpersonal and communication skills in BBN. We believe the model is both sustainable and reproducible. Lessons learnt are offered to aid in implementation in other settings.
Collapse
|
13
|
Edgar L, Jones MD, Harsy B, Passiment M, Hauer KE. Better Decision-Making: Shared Mental Models and the Clinical Competency Committee. J Grad Med Educ 2021; 13:51-58. [PMID: 33936533 PMCID: PMC8078083 DOI: 10.4300/jgme-d-20-00850.1] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 11/13/2020] [Accepted: 12/01/2020] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Shared mental models (SMMs) help groups make better decisions. Clinical competency committees (CCCs) can benefit from the development and use of SMMs in their decision-making as a way to optimize the quality and consistency of their decisions. OBJECTIVE We reviewed the use of SMMs for decision making in graduate medical education, particularly their use in CCCs. METHODS In May 2020, the authors conducted a narrative review of the literature related to SMMs. This review included the SMM related to teams, team functioning, CCCs, and graduate medical education. RESULTS The literature identified the general use of SMMs, SMMs in graduate medical education, and strategies for building SMMs into the work of the CCC. Through the use of clear communication and guidelines, and a shared understanding of goals and expectations, CCCs can make better decisions. SMMs can be applied to Milestones, resident performance, assessment, and feedback. CONCLUSIONS To ensure fair and robust decision-making, the CCC must develop and maintain SMMs through excellent communication and understanding of expectations among members.
Collapse
Affiliation(s)
- Laura Edgar
- Laura Edgar, EdD, CAE, is Vice President, Milestones Development, Accreditation Council for Graduate Medical Education (ACGME)
| | - M. Douglas Jones
- M. Douglas Jones Jr, MD, is Professor of Pediatrics, University of Colorado School of Medicine
| | - Braden Harsy
- Braden Harsy, MA, is Milestones Administrator, ACGME
| | - Morgan Passiment
- Morgan Passiment, MS, is Director, Institutional Outreach and Collaboration, ACGME
| | - Karen E. Hauer
- Karen E. Hauer, MD, PhD, is Associate Dean, Competency Assessment and Professional Standards, and Professor of Medicine, University of California, San Francisco
| |
Collapse
|
14
|
Fainstad TL, McClintock AH, Yarris LM. Bias in assessment: name, reframe, and check in. CLINICAL TEACHER 2021; 18:449-453. [PMID: 33787001 DOI: 10.1111/tct.13351] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 02/19/2021] [Accepted: 03/11/2021] [Indexed: 11/28/2022]
Abstract
Cognitive bias permeates almost every learner assessment in medical education. Assessment bias has the potential to affect a learner's education, future career and sense of self-worth. Decades of data show that there is little educators can do to overcome bias in learner assessments. Using in-group favouritism as an example, we offer an evidence-based, three-step solution to understand and move forward with cognitive bias in assessment: (1) Name: a simple admission about the presence of inherent bias in assessment, (2) Reframe: a rephrasing of assessment language to shed light on the assessor's subjectivity and (3) Check-in: a chance to ensure learner understanding and open lines of bidirectional communication. This process is theory-informed and based on decades of educational, sociological and psychological literature; we offer it as a logical first step towards a much-needed paradigm shift towards addressing bias in learner assessment.
Collapse
Affiliation(s)
- Tyra L Fainstad
- Department of Medicine, Division of General Internal Medicine, University of Colorado, Aurora, CO, USA
| | - Adelaide H McClintock
- Department of Medicine, Division of General Internal Medicine, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
15
|
Boursicot K, Kemp S, Wilkinson T, Findyartini A, Canning C, Cilliers F, Fuller R. Performance assessment: Consensus statement and recommendations from the 2020 Ottawa Conference. MEDICAL TEACHER 2021; 43:58-67. [PMID: 33054524 DOI: 10.1080/0142159x.2020.1830052] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
INTRODUCTION In 2011 the Consensus Statement on Performance Assessment was published in Medical Teacher. That paper was commissioned by AMEE (Association for Medical Education in Europe) as part of the series of Consensus Statements following the 2010 Ottawa Conference. In 2019, it was recommended that a working group be reconvened to review and consider developments in performance assessment since the 2011 publication. METHODS Following review of the original recommendations in the 2011 paper and shifts in the field across the past 10 years, the group identified areas of consensus and yet to be resolved issues for performance assessment. RESULTS AND DISCUSSION This paper addresses developments in performance assessment since 2011, reiterates relevant aspects of the 2011 paper, and summarises contemporary best practice recommendations for OSCEs and WBAs, fit-for-purpose methods for performance assessment in the health professions.
Collapse
Affiliation(s)
- Katharine Boursicot
- Department of Assessment and Progression, Duke-National University of Singapore, Singapore, Singapore
| | - Sandra Kemp
- Curtin Medical School, Curtin University, Perth, Australia
| | - Tim Wilkinson
- Dean's Department, University of Otago, Christchurch, New Zealand
| | - Ardi Findyartini
- Department of Medical Education, Universitas Indonesia, Jakarta, Indonesia
| | - Claire Canning
- Department of Assessment and Progression, Duke-National University of Singapore, Singapore, Singapore
| | - Francois Cilliers
- Department of Health Sciences Education, University of Cape Town, Cape Town, South Africa
| | | |
Collapse
|
16
|
Wilby KJ, Paravattil B. Cognitive load theory: Implications for assessment in pharmacy education. Res Social Adm Pharm 2020; 17:1645-1649. [PMID: 33358136 DOI: 10.1016/j.sapharm.2020.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 11/09/2020] [Accepted: 12/15/2020] [Indexed: 11/28/2022]
Abstract
The concept of mental workload is well studied from a learner's perspective but has yet to be better understood from the perspective of an assessor. Mental workload is largely associated with cognitive load theory, which describes three different types of load. Intrinsic load deals with the complexity of the task, extraneous load describes distractors to the task at hand, and germane load focuses on the development of schemas in working memory for future recall. Studies from medical education show that all three types of load are relevant when considering rater -based assessment (e.g. Objective Structured Clinical Examinations (OSCEs), or experiential training). Assessments with high intrinsic and extraneous load may interfere with assessors' attention and working memory and result in poorer quality assessment. Reducing these loads within assessment tasks should therefore be a priority for pharmacy educators. This commentary aims to provide a theoretical overview of mental workload in assessment, outline research findings from the medical education context, and propose strategies to be considered for reducing mental workload in rater-based assessments relevant to pharmacy education. Suggestions for future research are also addressed.
Collapse
Affiliation(s)
- Kyle John Wilby
- School of Pharmacy, University of Otago, PO Box 56, Dunedin, 9054, New Zealand.
| | | |
Collapse
|
17
|
van Enk A, Ten Cate O. "Languaging" tacit judgment in formal postgraduate assessment: the documentation of ad hoc and summative entrustment decisions. PERSPECTIVES ON MEDICAL EDUCATION 2020; 9:373-378. [PMID: 32930984 PMCID: PMC7718349 DOI: 10.1007/s40037-020-00616-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
While subjective judgment is recognized by the health professions education literature as important to assessment, it remains difficult to carve out a formally recognized role in assessment practices for personal experiences, gestalts, and gut feelings. Assessment tends to rely on documentary artefacts-like the forms, standards, and policies brought in under competency-based medical education, for example-to support accountability and fairness. But judgment is often tacit in nature and can be more challenging to surface in explicit (and particularly written) form. What is needed is a nuanced approach to the incorporation of judgment in assessment such that it is neither in danger of being suppressed by an overly rigorous insistence on documentation nor uncritically sanctioned by the defense that it resides in a black box and that we must simply trust the expertise of assessors. The concept of entrustment represents an attempt to effect such a balance within current competency frameworks by surfacing judgments about the degree of supervision learners need to care safely for patients. While there is relatively little published data about its implementation as yet, one readily manifest variation in the uptake of entrustment relates to the distinction between ad hoc and summative forms. The ways in which these forms are languaged, together with their intended purposes and guidelines for their use, point to directions for more focused empirical inquiry that can inform current and future uptake of entrustment in competency-based medical education and the responsible and meaningful inclusion of judgment in assessment more generally.
Collapse
Affiliation(s)
- Anneke van Enk
- Centre for Health Education Scholarship, University of British Columbia, Vancouver, Canada.
| | - Olle Ten Cate
- Centre for Research and Development of Education, University Medical Centre Utrecht, Utrecht, The Netherlands
| |
Collapse
|
18
|
Schuwirth LWT, van der Vleuten CPM. A history of assessment in medical education. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2020; 25:1045-1056. [PMID: 33113056 DOI: 10.1007/s10459-020-10003-0] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/19/2020] [Indexed: 06/11/2023]
Abstract
The way quality of assessment has been perceived and assured has changed considerably in the recent 5 decades. Originally, assessment was mainly seen as a measurement problem with the aim to tell people apart, the competent from the not competent. Logically, reproducibility or reliability and construct validity were seen as necessary and sufficient for assessment quality and the role of human judgement was minimised. Later, assessment moved back into the authentic workplace with various workplace-based assessment (WBA) methods. Although originally approached from the same measurement framework, WBA and other assessments gradually became assessment processes that included or embraced human judgement but based on good support and assessment expertise. Currently, assessment is treated as a whole system problem in which competence is evaluated from an integrated rather than a reductionist perspective. Current research therefore focuses on how to support and improve human judgement, how to triangulate assessment information meaningfully and how to construct fairness, credibility and defensibility from a systems perspective. But, given the rapid changes in society, education and healthcare, yet another evolution in our thinking about good assessment is likely to lurk around the corner.
Collapse
Affiliation(s)
- Lambert W T Schuwirth
- FHMRI: Prideaux Research in Health Professions Education, College of Medicine and Public Health, Flinders University, Sturt Road, Bedford Park, South Australia, 5042, GPO Box 2100, Adelaide, SA, 5001, Australia.
- Department of Educational Development and Research, Maastricht University, Maastricht, The Netherlands.
| | - Cees P M van der Vleuten
- FHMRI: Prideaux Research in Health Professions Education, College of Medicine and Public Health, Flinders University, Sturt Road, Bedford Park, South Australia, 5042, GPO Box 2100, Adelaide, SA, 5001, Australia
- Department of Educational Development and Research, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
19
|
Andler C, Daya S, Kowalek K, Boscardin C, van Schaik SM. E-ASSESS: Creating an EPA Assessment Tool for Structured Simulated Emergency Scenarios. J Grad Med Educ 2020; 12:153-158. [PMID: 32322347 PMCID: PMC7161329 DOI: 10.4300/jgme-d-19-00533.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 12/02/2019] [Accepted: 01/31/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The entrustable professional activity (EPA) assessment framework allows supervisors to assign entrustment levels to physician trainees for specific activities. Limited opportunity for direct observation of trainees hampers entrustment decisions, in particular for infrequently performed activities. Simulation allows for direct observation, so tools to assess performance of EPAs in simulation could potentially provide additional data to complement clinical assessments. OBJECTIVE We developed and collected validity evidence for a simulation-based tool grounded in the EPA framework. METHODS We developed E-ASSESS (EPA Assessment for Structured Simulated Emergency ScenarioS) to assess performance in 2 EPAs among pediatric residents participating in simulation-based team training in 2017-2018. We collected validity data, applying Messick's unitary view. Three raters used E-ASSESS to assign entrustment levels based on performance in simulation. We compared those ratings to entrustment levels assigned by clinical supervisors (different from the study raters) for the same residents on a separate tool designed for clinical practice. We calculated intraclass correlation (ICC) for each tool and Pearson correlation coefficients to compare ratings between tools. RESULTS Twenty-eight residents participated in the study. The ICC between the 3 raters for entrustment ratings on E-ASSESS ranged from 0.65 to 0.77, while ICC among raters of the clinical tool were 0.59 and 0.57. We found no significant correlations between E-ASSESS ratings and clinical practice ratings for either EPA (r = -0.35 and 0.38, P > .05). CONCLUSIONS Assessment following an EPA framework in the simulation context may be useful to provide data points to inform entrustment decisions as part of resident assessment.
Collapse
|
20
|
Wagner-Menghin M, de Bruin ABH, van Merriënboer JJG. Communication skills supervisors' monitoring of history-taking performance: an observational study on how doctors and non-doctors use cues to prepare feedback. BMC MEDICAL EDUCATION 2020; 20:36. [PMID: 32028941 PMCID: PMC7006145 DOI: 10.1186/s12909-019-1920-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 12/30/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND Medical students need feedback to improve their patient-interviewing skills because self-monitoring is often inaccurate. Effective feedback should reveal any discrepancies between desired and observed performance (cognitive feedback) and indicate metacognitive cues which are diagnostic of performance (metacognitive feedback). We adapted a cue-utilization model to studying supervisors' cue-usage when preparing feedback and compared doctors' and non-doctors' cue usage. METHOD Twenty-one supervisors watched a video of a patient interview, choose scenes for feedback, and explained their selection. We applied content analysis to categorize and count cue-use frequency per communication pattern (structuring/facilitating) and scene performance rating (positive/negative) for both doctors and non-doctors. RESULTS Both groups used cognitive cues more often than metacognitive cues to explain their scene selection. Both groups also used metacognitive cues such as subjective feelings and mentalizing cues, but mainly the doctors mentioned 'missing information' as a cue. Compared to non-doctors, the doctors described more scenes showing negative performance and fewer scenes showing positive narrative-facilitating performance. CONCLUSIONS Both groups are well able to communicate their observations and provide cognitive feedback on undergraduates' interviewing skills. To improve their feedback, supervisors should be trained to also recognize metacognitive cues, such as subjective feelings and mentalizing cues, and learn how to convert both into metacognitive feedback.
Collapse
Affiliation(s)
| | - Anique B. H. de Bruin
- Maastricht University, School of Health Professions Education, P.O. Box 616, 6200 MD Maastricht, The Netherlands
| | - Jeroen J. G. van Merriënboer
- Maastricht University, School of Health Professions Education, P.O. Box 616, 6200 MD Maastricht, The Netherlands
| |
Collapse
|
21
|
Mortaz Hejri S, Jalili M, Masoomi R, Shirazi M, Nedjat S, Norcini J. The utility of mini-Clinical Evaluation Exercise in undergraduate and postgraduate medical education: A BEME review: BEME Guide No. 59. MEDICAL TEACHER 2020; 42:125-142. [PMID: 31524016 DOI: 10.1080/0142159x.2019.1652732] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Background: This BEME review aims at exploring, analyzing, and synthesizing the evidence considering the utility of the mini-CEX for assessing undergraduate and postgraduate medical trainees, specifically as it relates to reliability, validity, educational impact, acceptability, and cost.Methods: This registered BEME review applied a systematic search strategy in seven databases to identify studies on validity, reliability, educational impact, acceptability, or cost of the mini-CEX. Data extraction and quality assessment were carried out by two authors. Discrepancies were resolved by a third reviewer. Descriptive synthesis was mainly used to address the review questions. A meta-analysis was performed for Cronbach's alpha.Results: Fifty-eight papers were included. Only two studies evaluated all five utility criteria. Forty-seven (81%) of the included studies met seven or more of the quality criteria. Cronbach's alpha ranged from 0.58 to 0.97 (weighted mean = 0.90). Reported G coefficients, Standard error of measurement, and confidence interval were diverse and varied based on the number of encounters and the nested or crossed design of the study. The calculated number of encounters needed for a desirable G coefficient also varied greatly. Content coverage was reported satisfactory in several studies. Mini-CEX discriminated between various levels of competency. Factor analyses revealed a single dimension. The six competencies showed high levels of correlation with statistical significance with the overall competence. Moderate to high correlations between mini-CEX scores and other clinical exams were reported. The mini-CEX improved students' performance in other examinations. By providing a framework for structured observation and feedback, the mini-CEX exerts a favorable educational impact. Included studies revealed that feedback was provided in most encounters but its quality was questionable. The completion rates were generally above 50%. Feasibility and high satisfaction were reported.Conclusion: The mini-CEX has reasonable validity, reliability, and educational impact. Acceptability and feasibility should be interpreted given the required number of encounters.
Collapse
Affiliation(s)
- Sara Mortaz Hejri
- Department of Medical Education, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mohammad Jalili
- Department of Medical Education, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Department of Emergency Medicine, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Rasoul Masoomi
- Department of Medical Education, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mandana Shirazi
- Department of Medical Education, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
- Department of Clinical Science and Education at SOS Hospital, Karolina Institute, Stockholm, Sweden
| | - Saharnaz Nedjat
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
| | - John Norcini
- Foundation for Advancement of International Medical Education and Research (FAIMER), Philadelphia, PA, USA
| |
Collapse
|
22
|
How Do Thresholds of Principle and Preference Influence Surgeon Assessments of Learner Performance? Ann Surg 2019; 268:385-390. [PMID: 28463897 DOI: 10.1097/sla.0000000000002284] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
OBJECTIVE The present study asks whether intraoperative principles are shared among faculty in a single residency program and explores how surgeons' individual thresholds between principles and preferences might influence assessment. BACKGROUND Surgical education continues to face significant challenges in the implementation of intraoperative assessment. Competency-based medical education assumes the possibility of a shared standard of competence, but intersurgeon variation is prevalent and, at times, valued in surgical education. Such procedural variation may pose problems for assessment. METHODS An entire surgical division (n = 11) was recruited to participate in video-guided interviews. Each surgeon assessed intraoperative performance in 8 video clips from a single laparoscopic radical left nephrectomy performed by a senior learner (>PGY5). Interviews were audio recorded, transcribed, and analyzed using the constant comparative method of grounded theory. RESULTS Surgeons' responses revealed 5 shared generic principles: choosing the right plane, knowing what comes next, recognizing normal and abnormal, making safe progress, and handling tools and tissues appropriately. The surgeons, however, disagreed both on whether a particular performance upheld a principle and on how the performance could improve. This variation subsequently shaped their reported assessment of the learner's performance. CONCLUSIONS The findings of the present study provide the first empirical evidence to suggest that surgeons' attitudes toward their own procedural variations may be an important influence on the subjectivity of intraoperative assessment in surgical education. Assessment based on intraoperative entrustment may harness such subjectivity for the purpose of implementing competency-based surgical education.
Collapse
|
23
|
Yeates P, Cope N, Luksaite E, Hassell A, Dikomitis L. Exploring differences in individual and group judgements in standard setting. MEDICAL EDUCATION 2019; 53:941-952. [PMID: 31264741 DOI: 10.1111/medu.13915] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/08/2019] [Accepted: 04/25/2019] [Indexed: 06/09/2023]
Abstract
CONTEXT Standard setting is critically important to assessment decisions in medical education. Recent research has demonstrated variations between medical schools in the standards set for shared items. Despite the centrality of judgement to criterion-referenced standard setting methods, little is known about the individual or group processes that underpin them. This study aimed to explore the operation and interaction of these processes in order to illuminate potential sources of variability. METHODS Using qualitative research, we purposively sampled across UK medical schools that set a low, medium or high standard on nationally shared items, collecting data by observation of graduation-level standard-setting meetings and semi-structured interviews with standard-setting judges. Data were analysed using thematic analysis based on the principles of grounded theory. RESULTS Standard setting occurred through the complex interaction of institutional context, judges' individual perspectives and group interactions. Schools' procedures, panel members and atmosphere produced unique contexts. Individual judges formed varied understandings of the clinical and technical features of each question, relating these to their differing (sometimes contradictory) conceptions of minimally competent students, by balancing information and making suppositions. Conceptions of minimal competence variously comprised: limited attendance; limited knowledge; poor knowledge application; emotional responses to questions; 'test-savviness', or a strategic focus on safety. Judges experienced tensions trying to situate these abstract conceptions in reality, revealing uncertainty. Groups constructively revised scores through debate, sharing information and often constructing detailed clinical representations of cases. Groups frequently displayed conformity, illustrating a belief that outlying judges were likely to be incorrect. Less frequently, judges resisted change, using emphatic language, bargaining or, rarely, 'polarisation' to influence colleagues. CONCLUSIONS Despite careful conduct through well-established procedures, standard setting is judgementally complex and involves uncertainty. Understanding whether or how these varied processes produce the previously observed variations in outcomes may offer routes to enhance equivalence of criterion-referenced standards.
Collapse
Affiliation(s)
- Peter Yeates
- Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK
- Department of Acute Medicine, Fairfield General Hospital, Pennine Acute Hospitals NHS Trust, Bury, UK
| | - Natalie Cope
- Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK
| | - Eva Luksaite
- Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK
| | - Andrew Hassell
- Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK
- The Haywood Hospital, Midlands Partnership Foundation NHS Trust, Stafford, UK
| | - Lisa Dikomitis
- Medical School Education Research Group (MERG), Keele University School of Medicine, Keele, UK
- Research Institute for Primary Care and Health Sciences, Keele University, Keele, UK
| |
Collapse
|
24
|
Gingerich A, Schokking E, Yeates P. Comparatively salient: examining the influence of preceding performances on assessors' focus and interpretations in written assessment comments. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2018; 23:937-959. [PMID: 29980956 DOI: 10.1007/s10459-018-9841-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 07/03/2018] [Indexed: 06/08/2023]
Abstract
Recent literature places more emphasis on assessment comments rather than relying solely on scores. Both are variable, however, emanating from assessment judgements. One established source of variability is "contrast effects": scores are shifted away from the depicted level of competence in a preceding encounter. The shift could arise from an effect on the range-frequency of assessors' internal scales or the salience of performance aspects within assessment judgments. As these suggest different potential interventions, we investigated assessors' cognition by using the insight provided by "clusters of consensus" to determine whether any change in the salience of performance aspects was induced by contrast effects. A dataset from a previous experiment contained scores and comments for 3 encounters: 2 with significant contrast effects and 1 without. Clusters of consensus were identified using F-sort and latent partition analysis both when contrast effects were significant and non-significant. The proportion of assessors making similar comments only significantly differed when contrast effects were significant with assessors more frequently commenting on aspects that were dissimilar with the standard of competence demonstrated in the preceding performance. Rather than simply influencing range-frequency of assessors' scales, preceding performances may affect salience of performance aspects through comparative distinctiveness: when juxtaposed with the context some aspects are more distinct and selectively draw attention. Research is needed to determine whether changes in salience indicate biased or improved assessment information. The potential should be explored to augment existing benchmarking procedures in assessor training by cueing assessors' attention through observation of reference performances immediately prior to assessment.
Collapse
Affiliation(s)
- Andrea Gingerich
- Northern Medical Program, University of Northern British Columbia, 3333 University Way, Prince George, BC, V2N 4Z9, Canada.
| | - Edward Schokking
- Northern Medical Program, University of Northern British Columbia, 3333 University Way, Prince George, BC, V2N 4Z9, Canada
| | - Peter Yeates
- Keele University School of Medicine, Keele, Staffordshire, UK
- Pennine Acute Hospitals NHS Trust, Bury, Lancashire, UK
| |
Collapse
|
25
|
Cleaton N, Yeates P, McCray G. Exploring the relationship between examiners' memories for performances, domain separation and score variability. MEDICAL TEACHER 2018; 40:1159-1165. [PMID: 29703091 DOI: 10.1080/0142159x.2018.1463088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Background: OSCE examiners' scores are variable and may discriminate domains of performance poorly. Examiners must hold their observations of OSCE performances in "episodic memory" until performances end. We investigated whether examiners vary in their recollection of performances; and whether this relates to their score variability or ability to separate disparate performance domains. Methods: Secondary analysis was performed on data where examiners had: 1/scored videos of OSCE performances showing disparate student ability in different domains; and 2/performed a measure of recollection for an OSCE performance. We calculated measures of "overall-score variance" (the degree individual examiners' overall scores varied from the group mean) and "domain separation" (the degree to which examiners separated different performance domains). We related these variables to the measure of examiners' recollection. Results: Examiners varied considerably in their recollection accuracy (recognition beyond chance -5% to +75% for different examiners). Examiners' recollection accuracy was weakly inversely related to their overall score accuracy (R = -0.17, p < 0.001) and related to their ability to separate domains of performance (R = 0.25, p < 0.001). Conclusions: Examiners vary substantially in their memories for students' performances which may offer a useful point of difference to study processing and integration phases of judgment. Findings could have implication for the utility of feedback.
Collapse
|
26
|
|
27
|
Eva KW. Cognitive Influences on Complex Performance Assessment: Lessons from the Interplay between Medicine and Psychology. JOURNAL OF APPLIED RESEARCH IN MEMORY AND COGNITION 2018. [DOI: 10.1016/j.jarmac.2018.03.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
28
|
Sebok‐Syer SS, Chahine S, Watling CJ, Goldszmidt M, Cristancho S, Lingard L. Considering the interdependence of clinical performance: implications for assessment and entrustment. MEDICAL EDUCATION 2018; 52:970-980. [PMID: 29676054 PMCID: PMC6120474 DOI: 10.1111/medu.13588] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 12/14/2017] [Accepted: 02/20/2018] [Indexed: 05/05/2023]
Abstract
INTRODUCTION Our ability to assess independent trainee performance is a key element of competency-based medical education (CBME). In workplace-based clinical settings, however, the performance of a trainee can be deeply entangled with others on the team. This presents a fundamental challenge, given the need to assess and entrust trainees based on the evolution of their independent clinical performance. The purpose of this study, therefore, was to understand what faculty members and senior postgraduate trainees believe constitutes independent performance in a variety of clinical specialty contexts. METHODS Following constructivist grounded theory, and using both purposive and theoretical sampling, we conducted individual interviews with 11 clinical teaching faculty members and 10 senior trainees (postgraduate year 4/5) across 12 postgraduate specialties. Constant comparative inductive analysis was conducted. Return of findings was also carried out using one-to-one sessions with key informants and public presentations. RESULTS Although some independent performances were described, participants spoke mostly about the exceptions to and disclaimers about these, elaborating their sense of the interdependence of trainee performances. Our analysis of these interdependence patterns identified multiple configurations of coupling, with the dominant being coupling of trainee and supervisor performance. We consider how the concept of coupling could advance workplace-based assessment efforts by supporting models that account for the collective dimensions of clinical performance. CONCLUSION These findings call into question the assumption of independent performance, and offer an important step toward measuring coupled performance. An understanding of coupling can help both to better distinguish independent and interdependent performances, and to consider revising workplace-based assessment approaches for CBME.
Collapse
Affiliation(s)
- Stefanie S Sebok‐Syer
- Centre for Education Research and InnovationSchulich School of Medicine and DentistryWestern UniversityLondonOntarioCanada
| | - Saad Chahine
- Centre for Education Research and InnovationSchulich School of Medicine and DentistryWestern UniversityLondonOntarioCanada
| | - Christopher J Watling
- Centre for Education Research and InnovationSchulich School of Medicine and DentistryWestern UniversityLondonOntarioCanada
| | - Mark Goldszmidt
- Centre for Education Research and InnovationSchulich School of Medicine and DentistryWestern UniversityLondonOntarioCanada
| | - Sayra Cristancho
- Centre for Education Research and InnovationSchulich School of Medicine and DentistryWestern UniversityLondonOntarioCanada
| | - Lorelei Lingard
- Centre for Education Research and InnovationSchulich School of Medicine and DentistryWestern UniversityLondonOntarioCanada
| |
Collapse
|
29
|
Kogan JR, Hatala R, Hauer KE, Holmboe E. Guidelines: The do's, don'ts and don't knows of direct observation of clinical skills in medical education. PERSPECTIVES ON MEDICAL EDUCATION 2017; 6:286-305. [PMID: 28956293 PMCID: PMC5630537 DOI: 10.1007/s40037-017-0376-7] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
INTRODUCTION Direct observation of clinical skills is a key assessment strategy in competency-based medical education. The guidelines presented in this paper synthesize the literature on direct observation of clinical skills. The goal is to provide a practical list of Do's, Don'ts and Don't Knows about direct observation for supervisors who teach learners in the clinical setting and for educational leaders who are responsible for clinical training programs. METHODS We built consensus through an iterative approach in which each author, based on their medical education and research knowledge and expertise, independently developed a list of Do's, Don'ts, and Don't Knows about direct observation of clinical skills. Lists were compiled, discussed and revised. We then sought and compiled evidence to support each guideline and determine the strength of each guideline. RESULTS A final set of 33 Do's, Don'ts and Don't Knows is presented along with a summary of evidence for each guideline. Guidelines focus on two groups: individual supervisors and the educational leaders responsible for clinical training programs. Guidelines address recommendations for how to focus direct observation, select an assessment tool, promote high quality assessments, conduct rater training, and create a learning culture conducive to direct observation. CONCLUSIONS High frequency, high quality direct observation of clinical skills can be challenging. These guidelines offer important evidence-based Do's and Don'ts that can help improve the frequency and quality of direct observation. Improving direct observation requires focus not just on individual supervisors and their learners, but also on the organizations and cultures in which they work and train. Additional research to address the Don't Knows can help educators realize the full potential of direct observation in competency-based education.
Collapse
Affiliation(s)
- Jennifer R Kogan
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA.
| | - Rose Hatala
- University of British Columbia, Vancouver, British Columbia, Canada
| | - Karen E Hauer
- University of California San Francisco, San Francisco, CA, USA
| | - Eric Holmboe
- Accreditation Council of Graduate Medical Education, Chicago, IL, USA
| |
Collapse
|