Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Burch VC, Norman GR, Schmidt HG, van der Vleuten CPM. Are specialist certification examinations a reliable measure of physician competence? Adv Health Sci Educ Theory Pract 2008;13:521-33. [PMID: 17476579 DOI: 10.1007/s10459-007-9063-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2006] [Accepted: 03/16/2007] [Indexed: 05/15/2023]

For:	Burch VC, Norman GR, Schmidt HG, van der Vleuten CPM. Are specialist certification examinations a reliable measure of physician competence? Adv Health Sci Educ Theory Pract 2008;13:521-33. [PMID: 17476579 DOI: 10.1007/s10459-007-9063-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2006] [Accepted: 03/16/2007] [Indexed: 05/15/2023]

Number

Cited by Other Article(s)

Hoffman KR, Swanson D, Lane S, Nickson C, Brand P, Ryan AT. The reliability of the College of Intensive Care Medicine of Australia and New Zealand "Hot Case" examination. BMC MEDICAL EDUCATION 2024;24:527. [PMID: 38734603 PMCID: PMC11088756 DOI: 10.1186/s12909-024-05516-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 05/03/2024] [Indexed: 05/13/2024]

Abstract

BACKGROUND

High stakes examinations used to credential trainees for independent specialist practice should be evaluated periodically to ensure defensible decisions are made. This study aims to quantify the College of Intensive Care Medicine of Australia and New Zealand (CICM) Hot Case reliability coefficient and evaluate contributions to variance from candidates, cases and examiners.

METHODS

This retrospective, de-identified analysis of CICM examination data used descriptive statistics and generalisability theory to evaluate the reliability of the Hot Case examination component. Decision studies were used to project generalisability coefficients for alternate examination designs.

RESULTS

Examination results from 2019 to 2022 included 592 Hot Cases, totalling 1184 individual examiner scores. The mean examiner Hot Case score was 5.17 (standard deviation 1.65). The correlation between candidates' two Hot Case scores was low (0.30). The overall reliability coefficient for the Hot Case component consisting of two cases observed by two separate pairs of examiners was 0.42. Sources of variance included candidate proficiency (25%), case difficulty and case specificity (63.4%), examiner stringency (3.5%) and other error (8.2%). To achieve a reliability coefficient of > 0.8 a candidate would need to perform 11 Hot Cases observed by two examiners.

CONCLUSION

The reliability coefficient for the Hot Case component of the CICM second part examination is below the generally accepted value for a high stakes examination. Modifications to case selection and introduction of a clear scoring rubric to mitigate the effects of variation in case difficulty may be helpful. Increasing the number of cases and overall assessment time appears to be the best way to increase the overall reliability. Further research is required to assess the combined reliability of the Hot Case and viva components.

Collapse

Staudenmann D, Waldner N, Lörwald A, Huwendiek S. Medical specialty certification exams studied according to the Ottawa Quality Criteria: a systematic review. BMC MEDICAL EDUCATION 2023;23:619. [PMID: 37649019 PMCID: PMC10466740 DOI: 10.1186/s12909-023-04600-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 08/18/2023] [Indexed: 09/01/2023]

Abstract

BACKGROUND

Medical specialty certification exams are high-stakes summative assessments used to determine which doctors have the necessary skills, knowledge, and attitudes to treat patients independently. Such exams are crucial for patient safety, candidates' career progression and accountability to the public, yet vary significantly among medical specialties and countries. It is therefore of paramount importance that the quality of specialty certification exams is studied in the scientific literature.

METHODS

In this systematic literature review we used the PICOS framework and searched for papers concerning medical specialty certification exams published in English between 2000 and 2020 in seven databases using a diverse set of search term variations. Papers were screened by two researchers independently and scored regarding their methodological quality and relevance to this review. Finally, they were categorized by country, medical specialty and the following seven Ottawa Criteria of good assessment: validity, reliability, equivalence, feasibility, acceptability, catalytic and educational effect.

RESULTS

After removal of duplicates, 2852 papers were screened for inclusion, of which 66 met all relevant criteria. Over 43 different exams and more than 28 different specialties from 18 jurisdictions were studied. Around 77% of all eligible papers were based in English-speaking countries, with 55% of publications centered on just the UK and USA. General Practice was the most frequently studied specialty among certification exams with the UK General Practice exam having been particularly broadly analyzed. Papers received an average of 4.2/6 points on the quality score. Eligible studies analyzed 2.1/7 Ottawa Criteria on average, with the most frequently studied criteria being reliability, validity, and acceptability.

CONCLUSIONS

The present systematic review shows a growing number of studies analyzing medical specialty certification exams over time, encompassing a wider range of medical specialties, countries, and Ottawa Criteria. Due to their reliance on multiple assessment methods and data-points, aspects of programmatic assessment suggest a promising way forward in the development of medical specialty certification exams which fulfill all seven Ottawa Criteria. Further research is needed to confirm these results, particularly analyses of examinations held outside the Anglosphere as well as studies analyzing entire certification exams or comparing multiple examination methods.

Collapse

Rivière E, Aubin E, Tremblay SL, Lortie G, Chiniara G. A new tool for assessing short debriefings after immersive simulation: validity of the SHORT scale. BMC MEDICAL EDUCATION 2019;19:82. [PMID: 30871505 PMCID: PMC6419351 DOI: 10.1186/s12909-019-1503-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 02/22/2019] [Indexed: 05/05/2023]

Guldbrand Nielsen D, Jensen SL, O'Neill L. Clinical assessment of transthoracic echocardiography skills: a generalizability study. BMC MEDICAL EDUCATION 2015;15:9. [PMID: 25638012 PMCID: PMC4334848 DOI: 10.1186/s12909-015-0294-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 01/15/2015] [Indexed: 05/17/2023]

Abstract

BACKGROUND

Transthoracic echocardiography (TTE) is a widely used cardiac imaging technique that all cardiologists should be able to perform competently. Traditionally, TTE competence has been assessed by unstructured observation or in test situations separated from daily clinical practice. An instrument for assessment of clinical TTE technical proficiency including a global rating score and a checklist score has previously shown reliability and validity in a standardised setting. As clinical test situations typically have several sources of error giving rise to variance in scores, a more thorough examination of the generalizability of the assessment instrument is needed.

METHODS

Nine physicians performed a TTE scan on the same three patients. Then, two raters rated all 27 TTE scans using the TTE technical assessment instrument in a fully crossed, all random generalizability study. Estimated variance components were calculated for both the global rating and checklist scores. Finally, dependability (phi) coefficients were also calculated for both outcomes in a decision study.

RESULTS

For global rating scores, 66.6% of score variance can be ascribed to true differences in performance. For checklist scores this was 88.8%. The difference was primarily due to physician-rater interaction. Four random cases rated by one random rater resulted in a phi value of 0.81 for global ratings and two random cases rated by one random rater showed a phi value of 0.92 for checklist scores.

CONCLUSIONS

Using the TTE checklist as opposed to the TTE global rating score had the effect of minimising the largest source of error variance in test scores. Two cases rated by one rater using the TTE checklist are sufficiently reliable for high stakes examinations. As global rating is less time consuming it could be considered performing four global rating assessments in addition to the checklist assessments to account for both reliability and content validity of the assessment.

Collapse

An Item Analysis of Written Multiple-Choice Questions: Kashan University of Medical Sciences. Nurs Midwifery Stud 2012. [DOI: 10.5812/nms.8738] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Dijkstra J, Galbraith R, Hodges BD, McAvoy PA, McCrorie P, Southgate LJ, Van der Vleuten CPM, Wass V, Schuwirth LWT. Expert validation of fit-for-purpose guidelines for designing programmes of assessment. BMC MEDICAL EDUCATION 2012;12:20. [PMID: 22510502 PMCID: PMC3676146 DOI: 10.1186/1472-6920-12-20] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 04/17/2012] [Indexed: 05/11/2023]

Le Roux P, Podgorski C, Rosenberg T, Watson WH, McDaniel S. Developing an outcome-based assessment for family therapy training: the Rochester Objective Structured Clinical Evaluation (ROSCE). FAMILY PROCESS 2011;50:544-560. [PMID: 22145725 DOI: 10.1111/j.1545-5300.2011.01375.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Dijkstra J, Van der Vleuten CPM, Schuwirth LWT. A new framework for designing programmes of assessment. ADVANCES IN HEALTH SCIENCES EDUCATION : THEORY AND PRACTICE 2010;15:379-93. [PMID: 19821042 PMCID: PMC2940030 DOI: 10.1007/s10459-009-9205-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Accepted: 09/28/2009] [Indexed: 05/15/2023]