Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zaidi NLB, Kreiter CD, Castaneda PR, Schiller JH, Yang J, Grum CM, Hammoud MM, Gruppen LD, Santen SA. Generalizability of Competency Assessment Scores Across and Within Clerkships: How Students, Assessors, and Clerkships Matter. Acad Med 2018;93:1212-1217. [PMID: 29697428 DOI: 10.1097/acm.0000000000002262] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

For:	Zaidi NLB, Kreiter CD, Castaneda PR, Schiller JH, Yang J, Grum CM, Hammoud MM, Gruppen LD, Santen SA. Generalizability of Competency Assessment Scores Across and Within Clerkships: How Students, Assessors, and Clerkships Matter. Acad Med 2018;93:1212-1217. [PMID: 29697428 DOI: 10.1097/acm.0000000000002262] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Number

Cited by Other Article(s)

Chang C, Laird-Fick HS, Mitchell JD, Parker C, Solomon D. Assessing the impact of clerkships on the growth of clinical knowledge. Ann Med 2025;57:2443812. [PMID: 39731632 DOI: 10.1080/07853890.2024.2443812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 11/21/2024] [Accepted: 11/24/2024] [Indexed: 12/30/2024] Open

Abstract

PURPOSE

This study quantified the impact of clinical clerkships on medical students' disciplinary knowledge using the Comprehensive Clinical Science Examination (CCSE) as a formative assessment tool.

METHODS

This study involved 155 third-year medical students in the College of Human Medicine at Michigan State University who matriculated in 2016. Disciplinary scores on their individual Comprehensive Clinical Science Examination reports were extracted by digitizing the bar charts using image processing techniques. Segmented regression analysis was used to quantify the differences in disciplinary knowledge before, during, and after clerkships in five disciplines: surgery, internal medicine, psychiatry, pediatrics, and obstetrics and gynecology (ob/gyn).

RESULTS

A comparison of the regression intercepts before and during their clerkships revealed that, on average, the participants improved the most in ob/gyn (β = 11.193, p< .0001), followed by psychiatry (β = 10.005, p< .001), pediatrics (β = 6.238, p< .0001), internal medicine (β = 1.638, p= .30), and improved the least in surgery (β = -2.332, p= .10). The regression intercepts of knowledge during their clerkships and after them, on the other hand, suggested that students' average scores improved the most in psychiatry (β = 7.649, p= .008), followed by ob/gyn (β = 4.175, p= .06), surgery (β = 4.106, p= .007), and pediatrics (β = 1.732, p= .32).

CONCLUSIONS

These findings highlight how clerkships influence the acquisition of disciplinary knowledge, offering valuable insights for curriculum design and assessment. This approach can be adapted to evaluate the effectiveness of other curricular activities, such as tutoring or intersessions. The results have significant implications for educators revising clerkship content and for students preparing for the United States Medical Licensing Examination Step 2.

Collapse

Abdolrahimi Raeni R, de Beaufort AJ, Pranger AD. Factors influencing the learning experience in pharmaceutical internships: A qualitative interview study. Eur J Pharmacol 2025;998:177530. [PMID: 40127774 DOI: 10.1016/j.ejphar.2025.177530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 03/13/2025] [Accepted: 03/18/2025] [Indexed: 03/26/2025]

Abstract

INTRODUCTION

experience-based learning (EBL) is the chosen educational strategy in the Master of Pharmacy curriculum at Leiden University. This strategy contributes to the development of the competence profile of a pharmacist by offering multiple internships. However, it is conceivable that what students learn during their internships may vary. This variability is an important issue that has not yet been investigated in literature. Therefore, the aim is to explore factors influencing learning experiences during pharmaceutical internships.

METHODS

we performed a descriptive qualitative study. We conducted semi-structured interviews (n = 25) with Master of Pharmacy students (n = 12), pharmaceutical internship supervisors (n = 8), and curriculum stakeholders (n = 5). The interviews were transcribed verbatim and coded in ATLAS.ti®, followed by thematic analysis.

RESULTS

we identified five themes influencing the learning experiences at pharmaceutical internships: (1) learning goals and experience (2) commitment, (3) diversity of the internships, (4) importance of the assessment, and (5) role of the curriculum. All three perspectives of the participants on the themes were aligned except for a discrepancy between the perspective of the curriculum stakeholders and the students about the safe learning environment.

CONCLUSION

EBL reveals factors, (e.g. the student's dedication, the supervisor's involvement, uniformity in assessment and role of the curriculum) influencing learning experiences at pharmaceutical internships. The EBL strategy leads to variability between students' learning experiences that may lead to differences in competency development and may ultimately affect their readiness to work as a pharmacist. The factors that emerged from our research help to optimize the students' learning experiences during pharmaceutical internships.

Collapse

Rouse M, Newman JR, Waller C, Fink J. R.I.M.E. and reason: multi-station OSCE enhancement to neutralize grade inflation. MEDICAL EDUCATION ONLINE 2024;29:2339040. [PMID: 38603644 PMCID: PMC11011230 DOI: 10.1080/10872981.2024.2339040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/01/2024] [Indexed: 04/13/2024]

Abstract

To offset grade inflation, many clerkships combine faculty evaluations with objective assessments including the Medical Examiners Subject Examination (NBME-SE) or Objective Structured Clinical Examination (OSCE), however, standardized methods are not established. Following a curriculum transition removing faculty clinical evaluations from summative grading, final clerkship designations of fail (F), pass (P), and pass-with-distinction (PD) were determined by combined NBME-SE and OSCE performance, with overall PD for the clerkship requiring meeting this threshold in both. At the time, 90% of students achieved PD on the Internal Medicine (IM) OSCE resulting in overall clerkship grades primarily determined by the NBME-SE. The clerkship sought to enhance the OSCE to provide a more thorough objective clinical skills assessment, offset grade inflation, and reduce the NBME-SE primary determination of the final clerkship grade. The single-station 43-point OSCE was enhanced to a three-station 75-point OSCE using the Reporter-Interpreter-Manager-Educator (RIME) framework to align patient encounters with targeted assessments of progressive skills and competencies related to the clerkship rotation. Student performances were evaluated pre- and post-OSCE enhancement. Student surveys provided feedback about the clinical realism of the OSCE and the difficulty. Pre-intervention OSCE scores were more tightly clustered (SD = 5.65%) around a high average performance with scores being highly negatively skewed. Post-intervention OSCE scores were more dispersed (SD = 6.88%) around a lower average with scores being far less skewed resulting in an approximately normal distribution. This lowered the total number of students achieving PD on the OSCE and PD in the clerkship, thus reducing the relative weight of the NMBE-SE in the overall clerkship grade. Student response was positive, indicating the examination was fair and reflective of their clinical experiences. Through structured development, OSCE assessment can provide a realistic and objective measurement of clinical performance as part of the summative evaluation of students.

Collapse

Shuford A, Carney PA, Ketterer B, Jones RL, Phillipi CA, Kraakevik J, Hasan R, Moulton B, Smeraglio A. An Analysis of Workplace-Based Assessments for Core Entrustable Professional Activities for Entering Residency: Does Type of Clinical Assessor Influence Level of Supervision Ratings? ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2024;99:904-911. [PMID: 38498305 DOI: 10.1097/acm.0000000000005691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]

Abstract

PURPOSE

The authors describe use of the workplace-based assessment (WBA) coactivity scale according to entrustable professional activities (EPAs) and assessor type to examine how diverse assessors rate medical students using WBAs.

METHOD

A WBA data collection system was launched at Oregon Health and Science University to visualize learner competency in various clinical settings to foster EPA assessment. WBA data from January 14 to June 18, 2021, for medical students (all years) were analyzed. The outcome variable was level of supervisor involvement in each EPA, and the independent variable was assessor type.

RESULTS

A total of 7,809 WBAs were included. Most fourth-, third-, and second-year students were assessed by residents or fellows (755 [49.5%], 1,686 [48.5%], and 918 [49.9%], respectively) and first-year students by attending physicians (803 [83.0%]; P < .001). Attendings were least likely to use the highest rating of 4 (1 was available just in case; 2,148 [56.7%] vs 2,368 [67.7%] for residents; P < .001). Learners more commonly sought WBAs from attendings for EPA 2 (prioritize differential diagnosis), EPA 5 (document clinical encounter), EPA 6 (provide oral presentation), EPA 7 (form clinical questions and retrieve evidence-based medicine), and EPA 12 (perform general procedures of a physician). Residents and fellows were more likely to assess students on EPA 3 (recommend and interpret diagnostic and screening tests), EPA 4 (enter and discuss orders and prescriptions), EPA 8 (give and receive patient handover for transitions in care), EPA 9 (collaborate as member of interprofessional team), EPA 10 (recognize and manage patient in need of urgent care), and EPA 11 (obtain informed consent).

CONCLUSIONS

Learners preferentially sought resident versus attending supervisors for different EPA assessments. Future research should investigate why learners seek different assessors more frequently for various EPAs and if assessor type variability in WBA levels holds true across institutions.

Collapse

Lewis SK, Nolan NS, Zickuhr L. Frontline assessors' opinions about grading committees in a medicine clerkship. BMC MEDICAL EDUCATION 2024;24:620. [PMID: 38840190 PMCID: PMC11151467 DOI: 10.1186/s12909-024-05604-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 05/24/2024] [Indexed: 06/07/2024]

Abstract

BACKGROUND

Collective decision-making by grading committees has been proposed as a strategy to improve the fairness and consistency of grading and summative assessment compared to individual evaluations. In the 2020-2021 academic year, Washington University School of Medicine in St. Louis (WUSM) instituted grading committees in the assessment of third-year medical students on core clerkships, including the Internal Medicine clerkship. We explored how frontline assessors perceive the role of grading committees in the Internal Medicine core clerkship at WUSM and sought to identify challenges that could be addressed in assessor development initiatives.

METHODS

We conducted four semi-structured focus group interviews with resident (n = 6) and faculty (n = 17) volunteers from inpatient and outpatient Internal Medicine clerkship rotations. Transcripts were analyzed using thematic analysis.

RESULTS

Participants felt that the transition to a grading committee had benefits and drawbacks for both assessors and students. Grading committees were thought to improve grading fairness and reduce pressure on assessors. However, some participants perceived a loss of responsibility in students' grading. Furthermore, assessors recognized persistent challenges in communicating students' performance via assessment forms and misunderstandings about the new grading process. Interviewees identified a need for more training in formal assessment; however, there was no universally preferred training modality.

CONCLUSIONS

Frontline assessors view the switch from individual graders to a grading committee as beneficial due to a perceived reduction of bias and improvement in grading fairness; however, they report ongoing challenges in the utilization of assessment tools and incomplete understanding of the grading and assessment process.

Collapse

Ten Cate O, Khursigara-Slattery N, Cruess RL, Hamstra SJ, Steinert Y, Sternszus R. Medical competence as a multilayered construct. MEDICAL EDUCATION 2024;58:93-104. [PMID: 37455291 DOI: 10.1111/medu.15162] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/31/2023] [Accepted: 06/15/2023] [Indexed: 07/18/2023]

Abstract

BACKGROUND

The conceptualisation of medical competence is central to its use in competency-based medical education. Calls for 'fixed standards' with 'flexible pathways', recommended in recent reports, require competence to be well defined. Making competence explicit and measurable has, however, been difficult, in part due to a tension between the need for standardisation and the acknowledgment that medical professionals must also be valued as unique individuals. To address these conflicting demands, a multilayered conceptualisation of competence is proposed, with implications for the definition of standards and approaches to assessment.

THE MODEL

Three layers are elaborated. This first is a core layer of canonical knowledge and skill, 'that, which every professional should possess', independent of the context of practice. The second layer is context-dependent knowledge, skill, and attitude, visible through practice in health care. The third layer of personalised competence includes personal skills, interests, habits and convictions, integrated with one's personality. This layer, discussed with reference to Vygotsky's concept of Perezhivanie, cognitive load theory, self-determination theory and Maslow's 'self-actualisation', may be regarded as the art of medicine. We propose that fully matured professional competence requires all three layers, but that the assessment of each layer is different.

IMPLICATIONS

The assessment of canonical knowledge and skills (Layer 1) can be approached with classical psychometric conditions, that is, similar tests, circumstances and criteria for all. Context-dependent medical competence (Layer 2) must be assessed differently, because conditions of assessment across candidates cannot be standardised. Here, multiple sources of information must be merged and intersubjective expert agreement should ground decisions about progression and level of clinical autonomy of trainees. Competence as the art of medicine (Layer 3) cannot be standardised and should not be assessed with the purpose of permission to practice. The pursuit of personal excellence in this level, however, can be recognised and rewarded.

Collapse

Huynh A, Nguyen A, Beyer RS, Harris MH, Hatter MJ, Brown NJ, de Virgilio C, Nahmias J. Fixing a Broken Clerkship Assessment Process: Reflections on Objectivity and Equity Following the USMLE Step 1 Change to Pass/Fail. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023;98:769-774. [PMID: 36780667 DOI: 10.1097/acm.0000000000005168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]

Schafer KR, Sood L, King CJ, Alexandraki I, Aronowitz P, Cohen M, Chretien K, Pahwa A, Shen E, Williams D, Hauer KE. The Grade Debate: Evidence, Knowledge Gaps, and Perspectives on Clerkship Assessment Across the UME to GME Continuum. Am J Med 2023;136:394-398. [PMID: 36632923 DOI: 10.1016/j.amjmed.2023.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 01/03/2023] [Indexed: 01/10/2023]

Dunne D, Gielissen K, Slade M, Park YS, Green M. WBAs in UME-How Many Are Needed? A Reliability Analysis of 5 AAMC Core EPAs Implemented in the Internal Medicine Clerkship. J Gen Intern Med 2022;37:2684-2690. [PMID: 34561828 PMCID: PMC9411433 DOI: 10.1007/s11606-021-07151-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 09/09/2021] [Indexed: 01/07/2023]

Abstract

BACKGROUND

Reliable assessments of clinical skills are important for undergraduate medical education, trustworthy handoffs to graduate medical programs, and safe, effective patient care. Entrustable professional activities (EPAs) for entering residency have been developed; research is needed to assess reliability of such assessments in authentic clinical workspaces.

DESIGN

A student-driven mobile assessment platform was developed and used for clinical supervisors to record ad hoc entrustment decisions using the modified Ottawa scale on 5 core EPAs in an 8-week internal medicine (IM) clerkship. After a 12-month period, generalizability (G) theory analysis was performed to estimate the reliability of entrustment scores and determine the proportion of variance attributable to the student and the other facets, including particular EPA, evaluator type (attending versus resident), or case complexity. Decision (D) theory analysis determined the expected reliability based on the number of hypothetical observations. A g-coefficient of 0.7 was used as a generally agreed upon minimum reliability threshold.

KEY RESULTS

A total of 1368 ratings over the 5 EPAs were completed on 94 students. Variance attributed to person (true variance) was high for all EPAs; EPA-5 had the lowest person variance (9.8% across cases and four blocks). Across cases, reliability ranged from 0.02 to 0.60. Applying this to the Decision study, the estimated number of observations needed to reach a reliability index of 0.7 ranged between 9 and 11 for all EPAs except EPA5 which was sensitive to case complexity.

CONCLUSIONS

Work place-based clinical skills in IM clerkship students were assessed and logged using a convenient mobile platform. Our analysis suggests that 9-11 observations are needed for these EPA workplace-based assessments (WBAs) to achieve a reliability index of 0.7. Note writing was very sensitive to case complexity. Further reliability analyses of core EPAs are needed before US medical schools consider wider adoption into summative entrustment processes and GME handoffs.

Collapse

Jones JM, Berman AB, Tan EX, Mohanty S, Rose MA, Shea JA, Kogan JR. Amplifying the Student Voice: Medical Student Perceptions of AΩA. J Gen Intern Med 2022:10.1007/s11606-022-07544-y. [PMID: 35764758 DOI: 10.1007/s11606-022-07544-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 03/30/2022] [Indexed: 10/17/2022]

Abstract

BACKGROUND

Recent literature has suggested racial disparities in Alpha Omega Alpha Honor Medical Society (AΩA) selection and raised concerns about its effects on the learning environment. Internal reviews at multiple institutions have led to changes in selection practices or suspension of student chapters; in October 2020, the national AΩA organization provided guidance to address these concerns.

OBJECTIVE

This study aimed to better understand student opinions of AΩA.

DESIGN

An anonymous survey using both multiple response option and free response questions.

PARTICIPANTS

Medical students at the Perelman School of Medicine at the University of Pennsylvania.

MAIN MEASURES

Descriptive statistics and logistic regressions were used to examine predictors of student opinion towards AΩA. Free responses were analyzed by two independent coders to identify key themes.

KEY RESULTS

In total, 70% of the student body (n = 547) completed the survey. Sixty-three percent had a negative opinion of AΩA, and 57% felt AΩA should not exist at the student level. Thirteen percent believed AΩA membership appropriately reflects the student body; 8% thought selection processes were fair. On multivariate analysis, negative predictors of a student's preference to continue AΩA at the student level included belief that AΩA membership does not currently mirror class composition (OR: 0.45, [95% CI: 0.23-0.89]) and that AΩA selection processes were unfair (OR: 0.20 [0.08-0.47]). Self-perception as not competitive for AΩA selection was also a negative predictor (OR: 0.44 [0.22-0.88]). Major qualitative themes included equity, impact on the learning environment, transparency, and positive aspects of AΩA.

CONCLUSIONS

This single-institution survey demonstrated significant student concerns regarding AΩA selection fairness and effects on the learning environment. Many critiques extended beyond AΩA itself, instead focusing on the perceived magnification of existing disparities in the learning environment. As the national conversation about AΩA continues, engaging student voices in the discussion is critical.

Collapse

Ryan MS, Khamishon R, Richards A, Perera R, Garber A, Santen SA. A Question of Scale? Generalizability of the Ottawa and Chen Scales to Render Entrustment Decisions for the Core EPAs in the Workplace. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2022;97:552-561. [PMID: 34074896 DOI: 10.1097/acm.0000000000004189] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Abstract

PURPOSE

Assessments of the Core Entrustable Professional Activities (Core EPAs) are based on observations of supervisors throughout a medical student's progression toward entrustment. The purpose of this study was to compare generalizability of scores from 2 entrustment scales: the Ottawa Surgical Competency Operating Room Evaluation (Ottawa) scale and an undergraduate medical education supervisory scale proposed by Chen and colleagues (Chen). A secondary aim was to determine the impact of frequent assessors on generalizability of the data.

METHOD

For academic year 2019-2020, the Virginia Commonwealth University School of Medicine modified a previously described workplace-based assessment (WBA) system developed to provide feedback for the Core EPAs across clerkships. The WBA scored students' performance using both Ottawa and Chen scales. Generalizability (G) and decision (D) studies were performed using an unbalanced random-effects model to determine the reliability of each scale. Secondary G- and D-studies explored whether faculty who rated more than 5 students demonstrated better reliability. The Phi-coefficient was used to estimate reliability; a cutoff of at least 0.70 was used to conduct D-studies.

RESULTS

Using the Ottawa scale, variability attributable to the student ranged from 0.8% to 6.5%. For the Chen scale, student variability ranged from 1.8% to 7.1%. This indicates the majority of variation was due to the rater (42.8%-61.3%) and other unexplained factors. Between 28 and 127 assessments were required to obtain a Phi-coefficient of 0.70. For 2 EPAs, using faculty who frequently assessed the EPA improved generalizability, requiring only 5 and 13 assessments for the Chen scale.

CONCLUSIONS

Both scales performed poorly in terms of learner-attributed variance, with some improvement in 2 EPAs when considering only frequent assessors using the Chen scale. Based on these findings in conjunction with prior evidence, the authors provide a root cause analysis highlighting challenges with WBAs for Core EPAs.

Collapse

Santen SA, Ryan M, Helou MA, Richards A, Perera RA, Haley K, Bradner M, Rigby FB, Park YS. Building reliable and generalizable clerkship competency assessments: Impact of 'hawk-dove' correction. MEDICAL TEACHER 2021;43:1374-1380. [PMID: 34534035 DOI: 10.1080/0142159x.2021.1948519] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Ryan MS, Richards A, Perera R, Park YS, Stringer JK, Waterhouse E, Dubinsky B, Khamishon R, Santen SA. Generalizability of the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) Scale to Assess Medical Student Performance on Core EPAs in the Workplace: Findings From One Institution. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:1197-1204. [PMID: 33464735 DOI: 10.1097/acm.0000000000003921] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Abstract

PURPOSE

Assessment of the Core Entrustable Professional Activities for Entering Residency (Core EPAs) requires direct observation of learners in the workplace to support entrustment decisions. The purpose of this study was to examine the internal structure validity evidence of the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) scale when used to assess medical student performance in the Core EPAs across clinical clerkships.

METHOD

During the 2018-2019 academic year, the Virginia Commonwealth University School of Medicine implemented a mobile-friendly, student-initiated workplace-based assessment (WBA) system to provide formative feedback for the Core EPAs across all clinical clerkships. Students were required to request a specified number of Core EPA assessments in each clerkship. A modified O-SCORE scale (1 = "I had to do" to 4 = "I needed to be in room just in case") was used to rate learner performance. Generalizability theory was applied to assess the generalizability (or reliability) of the assessments. Decision studies were then conducted to determine the number of assessments needed to achieve a reasonable reliability.

RESULTS

A total of 10,680 WBAs were completed on 220 medical students. The majority of ratings were completed on EPA 1 (history and physical) (n = 3,129; 29%) and EPA 6 (oral presentation) (n = 2,830; 26%). Mean scores were similar (3.5-3.6 out of 4) across EPAs. Variance due to the student ranged from 3.5% to 8%, with the majority of the variation due to the rater (29.6%-50.3%) and other unexplained factors. A range of 25 to 63 assessments were required to achieve reasonable reliability (Phi > 0.70).

CONCLUSIONS

The O-SCORE demonstrated modest reliability when used across clerkships. These findings highlight specific challenges for implementing WBAs for the Core EPAs including the process for requesting WBAs, rater training, and application of the O-SCORE scale in medical student assessment.

Collapse

Cohen A, Kind T, DeWolfe C. A Qualitative Exploration of the Intern Experience in Assessing Medical Student Performance. Acad Pediatr 2021;21:728-734. [PMID: 33127592 DOI: 10.1016/j.acap.2020.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 10/23/2020] [Accepted: 10/25/2020] [Indexed: 11/17/2022]

Hernandez CA, Daroowalla F, LaRochelle JS, Ismail N, Tartaglia KM, Fagan MJ, Kisielewski M, Walsh K. Determining Grades in the Internal Medicine Clerkship: Results of a National Survey of Clerkship Directors. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:249-255. [PMID: 33149085 DOI: 10.1097/acm.0000000000003815] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Abstract

PURPOSE

Trust in and comparability of assessments are essential in clerkships in undergraduate medical education for many reasons, including ensuring competency in clinical skills and application of knowledge important for the transition to residency and throughout students' careers. The authors examined how assessments are used to determine internal medicine (IM) core clerkship grades across U.S. medical schools.

METHODS

A multisection web-based survey of core IM clerkship directors at 134 U.S. medical schools with membership in the Clerkship Directors in Internal Medicine was conducted in October through November 2018. The survey included a section on assessment practices to characterize current grading scales used, who determines students' final clerkship grades, the nature/type of summative assessments, and how assessments are weighted. Respondents were asked about perceptions of the influence of the National Board of Medical Examiners (NBME) Medicine Subject Examination (MSE) on students' priorities during the clerkship.

RESULTS

The response rate was 82.1% (110/134). There was considerable variability in the summative assessments and their weighting in determining final grades. The NBME MSE (91.8%), clinical performance (90.9%), professionalism (70.9%), and written notes (60.0%) were the most commonly used assessments. Clinical performance assessments and the NBME MSE accounted for the largest percentage of the total grade (on average 52.8% and 23.5%, respectively). Eighty-seven percent of respondents were concerned that students' focus on the NBME MSE performance detracted from patient care learning.

CONCLUSIONS

There was considerable variability in what IM clerkships assessed and how those assessments were translated into grades. The NBME MSE was a major contributor to the final grade despite concerns about the impact on patient care learning. These findings underscore the difficulty in comparing learners across institutions and serve to advance discussions for how to improve accuracy and comparability of grading in the clinical environment.

Collapse

Ingram MA, Pearman JL, Estrada CA, Zinski A, Williams WL. Are We Measuring What Matters? How Student and Clerkship Characteristics Influence Clinical Grading. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:241-248. [PMID: 32701555 DOI: 10.1097/acm.0000000000003616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Abstract

PURPOSE

Given the growing emphasis placed on clerkship performance for residency selection, clinical evaluation and its grading implications are critically important; therefore, the authors conducted this study to determine which evaluation components best predict a clinical honors recommendation across 3 core clerkships.

METHOD

Student evaluation data were collected during academic years 2015-2017 from the third-year internal medicine (IM), pediatrics, and surgery clerkships at the University of Alabama at Birmingham School of Medicine. The authors used factor analysis to examine 12 evaluation components (12 items), and they applied multilevel logistic regression to correlate evaluation components with a clinical honors recommendation.

RESULTS

Of 3,947 completed evaluations, 1,508 (38%) recommended clinical honors. The top item that predicted a clinical honors recommendation was clinical reasoning skills for IM (odds ratio [OR] 2.8; 95% confidence interval [CI], 1.9 to 4.2; P < .001), presentation skills for surgery (OR 2.6; 95% CI, 1.6 to 4.2; P < .001), and knowledge application for pediatrics (OR 4.8; 95% CI, 2.8 to 8.2; P < .001). Students who spent more time with their evaluators were more likely to receive clinical honors (P < .001), and residents were more likely than faculty to recommend clinical honors (P < .001). Of the top 5 evaluation items associated with clinical honors, 4 composed a single factor for all clerkships: clinical reasoning, knowledge application, record keeping, and presentation skills.

CONCLUSIONS

The 4 characteristics that best predicted a clinical honors recommendation in all disciplines (clinical reasoning, knowledge application, record keeping, and presentation skills) correspond with traditional definitions of clinical competence. Structural components, such as contact time with evaluators, also correlated with a clinical honors recommendation. These findings provide empiric insight into the determination of clinical honors and the need for heightened attention to structural components of clerkships and increased scrutiny of evaluation rubrics.

Collapse

Ryan MS, Lee B, Richards A, Perera RA, Haley K, Rigby FB, Park YS, Santen SA. Evaluating the Reliability and Validity Evidence of the RIME (Reporter-Interpreter-Manager-Educator) Framework for Summative Assessments Across Clerkships. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:256-262. [PMID: 33116058 DOI: 10.1097/acm.0000000000003811] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Abstract

PURPOSE

The ability of medical schools to accurately and reliably assess medical student clinical performance is paramount. The RIME (reporter-interpreter-manager-educator) schema was originally developed as a synthetic and intuitive assessment framework for internal medicine clerkships. Validity evidence of this framework has not been rigorously evaluated outside of internal medicine. This study examined factors contributing to variability in RIME assessment scores using generalizability theory and decision studies across multiple clerkships, thereby contributing to its internal structure validity evidence.

METHOD

Data were collected from RIME-based summative clerkship assessments during 2018-2019 at Virginia Commonwealth University. Generalizability theory was used to explore variance attributed to different facets through a series of unbalanced random-effects models by clerkship. For all analyses, decision (D-) studies were conducted to estimate the effects of increasing the number of assessments.

RESULTS

From 231 students, 6,915 observations were analyzed. Interpreter was the most common RIME designation (44.5%-46.8%) across all clerkships. Variability attributable to students ranged from 16.7% in neurology to 25.4% in surgery. D-studies showed the number of assessments needed to achieve an acceptable reliability (0.7) ranged from 7 in pediatrics and surgery to 11 in internal medicine and 12 in neurology. However, depending on the clerkship each student received between 3 and 8 assessments.

CONCLUSIONS

This study conducted generalizability- and D-studies to examine the internal structure validity evidence of RIME clinical performance assessments across clinical clerkships. Substantial proportion of variance in RIME assessment scores was attributable to the rater, with less attributed to the student. However, the proportion of variance attributed to the student was greater than what has been demonstrated in other generalizability studies of summative clinical assessments. Overall, these findings support the use of RIME as a framework for assessment across clerkships and demonstrate the number of assessments required to obtain sufficient reliability.

Collapse

Ryan MS, Brooks EM, Safdar K, Santen SA. Clerkship Grading and the U.S. Economy: What Medical Education Can Learn From America's Economic History. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2021;96:186-192. [PMID: 33492834 PMCID: PMC8325378 DOI: 10.1097/acm.0000000000003566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Andersen SAW, Park YS, Sørensen MS, Konge L. Reliable Assessment of Surgical Technical Skills Is Dependent on Context: An Exploration of Different Variables Using Generalizability Theory. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2020;95:1929-1936. [PMID: 32590473 DOI: 10.1097/acm.0000000000003550] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Abstract

PURPOSE

Reliable assessment of surgical skills is vital for competency-based medical training. Several factors influence not only the reliability of judgments but also the number of observations needed for making judgments of competency that are both consistent and reproducible. The aim of this study was to explore the role of various conditions-through the analysis of data from large-scale, simulation-based assessments of surgical technical skills-by examining the effects of those conditions on reliability using generalizability theory.

METHOD

Assessment data from large-scale, simulation-based temporal bone surgical training research studies in 2012-2018 were pooled, yielding collectively 3,574 assessments of 1,723 performances. The authors conducted generalizability analyses using an unbalanced random-effects design, and they performed decision studies to explore the effect of the different variables on projections of reliability.

RESULTS

Overall, 5 observations were needed to achieve a generalizability coefficient > 0.8. Several variables modified the projections of reliability: increased learner experience necessitated more observations (5 for medical students, 7 for residents, and 8 for experienced surgeons), the more complex cadaveric dissection required fewer observations than virtual reality simulation (2 vs 5 observations), and increased fidelity simulation graphics reduced the number of observations needed from 7 to 4. The training structure (either massed or distributed practice) and simulator-integrated tutoring had little effect on reliability. Finally, more observations were needed during initial training when the learning curve was steepest (6 observations) compared with the plateau phase (4 observations).

CONCLUSIONS

Reliability in surgical skills assessment seems less stable than it is often reported to be. Training context and conditions influence reliability. The findings from this study highlight that medical educators should exercise caution when using a specific simulation-based assessment in other contexts.

Collapse

George P, Santen S, Hammoud M, Skochelak S. Stepping Back: Re-evaluating the Use of the Numeric Score in USMLE Examinations. MEDICAL SCIENCE EDUCATOR 2020;30:565-567. [PMID: 34457702 PMCID: PMC8368936 DOI: 10.1007/s40670-019-00906-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Morgan HK, Mejicano GC, Skochelak S, Lomis K, Hawkins R, Tunkel AR, Nelson EA, Henderson D, Shelgikar AV, Santen SA. A Responsible Educational Handover: Improving Communication to Improve Learning. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2020;95:194-199. [PMID: 31464734 DOI: 10.1097/acm.0000000000002915] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Frank AK, O'Sullivan P, Mills LM, Muller-Juge V, Hauer KE. Clerkship Grading Committees: the Impact of Group Decision-Making for Clerkship Grading. J Gen Intern Med 2019;34:669-676. [PMID: 30993615 PMCID: PMC6502934 DOI: 10.1007/s11606-019-04879-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

BACKGROUND

Faculty and students debate the fairness and accuracy of medical student clerkship grades. Group decision-making is a potential strategy to improve grading.

OBJECTIVE

To explore how one school's grading committee members integrate assessment data to inform grade decisions and to identify the committees' benefits and challenges.

DESIGN

This qualitative study used semi-structured interviews with grading committee chairs and members conducted between November 2017 and March 2018.

PARTICIPANTS

Participants included the eight core clerkship directors, who chaired their grading committees. We randomly selected other committee members to invite, for a maximum of three interviews per clerkship.

APPROACH

Interviews were recorded, transcribed, and analyzed using inductive content analysis.

KEY RESULTS

We interviewed 17 committee members. Within and across specialties, committee members had distinct approaches to prioritizing and synthesizing assessment data. Participants expressed concerns about the quality of assessments, necessitating careful scrutiny of language, assessor identity, and other contextual factors. Committee members were concerned about how unconscious bias might impact assessors, but they felt minimally impacted at the committee level. When committee members knew students personally, they felt tension about how to use the information appropriately. Participants described high agreement within their committees; debate was more common when site directors reviewed students' files from other sites prior to meeting. Participants reported multiple committee benefits including faculty development and fulfillment, as well as improved grading consistency, fairness, and transparency. Groupthink and a passive approach to bias emerged as the two main threats to optimal group decision-making.

CONCLUSIONS

Grading committee members view their practices as advantageous over individual grading, but they feel limited in their ability to address grading fairness and accuracy. Recommendations and support may help committees broaden their scope to address these aspirations.

Collapse

Lang VJ, Berman NB, Bronander K, Harrell H, Hingle S, Holthouser A, Leizman D, Packer CD, Park YS, Vu TR, Yudkowsky R, Monteiro S, Bordage G. Validity Evidence for a Brief Online Key Features Examination in the Internal Medicine Clerkship. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2019;94:259-266. [PMID: 30379661 DOI: 10.1097/acm.0000000000002506] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Affiliation(s)

Valerie J Lang V.J. Lang is associate professor of medicine, director of the medicine subinternship, and senior associate division chief, Hospital Medicine, University of Rochester School of Medicine and Dentistry, Rochester, New York. N.B. Berman is professor of pediatrics and of medical education, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire. K. Bronander is professor of medicine and medical director of simulation, University of Nevada, Reno School of Medicine, Reno, Nevada. H. Harrell is professor of medicine and codirector of the medicine clerkship, University of Florida, Gainesville, Florida. S. Hingle is professor of medicine, director of the Year 3 curriculum, and director of faculty development, Southern Illinois University School of Medicine, Springfield, Illinois. A. Holthouser is professor of medicine and pediatrics and senior associate dean for medical education, University of Louisville, Louisville, Kentucky. D. Leizman is associate professor of medicine and clerkship director for internal medicine, Case Western Reserve University, University Hospital, Cleveland Medical Center, Cleveland, Ohio. C.D. Packer is professor of medicine, Case Western Reserve University, and clerkship director for internal medicine, Louis Stokes Cleveland Veterans Affairs Medical Center, Cleveland, Ohio. Y.S. Park is associate professor, Department of Medical Education, University of Illinois at Chicago, Chicago, Illinois. T.R. Vu is associate professor of clinical medicine, Indiana University School of Medicine, Indianapolis, Indiana. R. Yudkowsky is professor, Department of Medical Education, University of Illinois at Chicago, Chicago, Illinois. S. Monteiro is assistant professor, Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada. G. Bordage is professor, Department of Medical Education, University of Illinois at Chicago, Chicago, Illinois
Norman B Berman
Kirk Bronander
Heather Harrell
Susan Hingle
Amy Holthouser
Debra Leizman
Clifford D Packer
Yoon Soo Park
T Robert Vu
Rachel Yudkowsky
Sandra Monteiro
Georges Bordage

Collapse

Bernard AW, Feinn R, Ceccolini G, Brown R, Rosenberg I, Trymbulak W, VanCott C. The Reliability of 2-Station Clerkship Objective Structured Clinical Examinations in Isolation and in Aggregate. JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT 2019;6:2382120519863443. [PMID: 31384670 PMCID: PMC6647213 DOI: 10.1177/2382120519863443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 06/24/2019] [Indexed: 06/10/2023]

McAneny BL, Crigger EJ. Toward More Effective Self-Regulation in Medicine. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2019;19:7-10. [PMID: 30676894 DOI: 10.1080/15265161.2018.1554411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Farooqui F, Saeed N, Aaraj S, Sami MA, Amir M. A Comparison Between Written Assessment Methods: Multiple-choice and Short Answer Questions in End-of-clerkship Examinations for Final Year Medical Students. Cureus 2018;10:e3773. [PMID: 30820392 PMCID: PMC6389017 DOI: 10.7759/cureus.3773] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Accepted: 12/24/2018] [Indexed: 11/12/2022] Open

Abstract

Introduction An important aspect of a modern academic curriculum is assessment, which can be clinical and written. Written assessment includes both multiple-choice questions (MCQs) and short answer questions (SAQs). Debate continues as to which is more reliable. It is important to assess the correlation between the two different formats of written assessments, especially in the clinical subjects as they are different from the basic science subjects. Moreover, data are lacking in the correlation of the two formats of the written assessment in the clinical subjects. Therefore, we conducted this study to see the correlation between MCQs and SAQs in the end-of-clerkship examinations for final-year medical students. Materials and methods The end-of-clerkship written assessment results of the four disciplines of medicine, surgery, gynecology, and pediatrics were included. This was a retrospective correlational analytical study conducted at Shifa Tameer-e-Millat University, Islamabad, from 2013 to 2017. Data were analyzed using IBM SPSS Statistics for Windows, version 23.0 (IBM Corp., Armonk, NY); mean, standard deviation, Pearson coefficient, and p values were calculated both for MCQs and SAQs. Results A total of 481 students were involved in our study. The mean percentage scores of MCQs and SAQs in medicine were the most similar, and scores in obstetrics and gynecology had the most disparity. As compared to MCQs, the wider standard deviations were found in SAQs. Pearson correlations were 0.49, 0.47, 0.23, and 0.38 for medicine, surgery, gynecology, and pediatrics, respectively. Conclusion While we found mild to moderate significant correlation between MCQs and SAQs for final-year medical students, further investigations are required to explore the correlation and enhance the validity of our written assessments.

Collapse