1
|
Brenner JM, Fulton TB, Kruidering M, Bird JB, Willey J, Qua K, Olvet DM. What have we learned about constructed response short-answer questions from students and faculty? A multi-institutional study. MEDICAL TEACHER 2024; 46:349-358. [PMID: 37688773 DOI: 10.1080/0142159x.2023.2249209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2023]
Abstract
PURPOSE The purpose of this study was to enrich understanding about the perceived benefits and drawbacks of constructed response short-answer questions (CR-SAQs) in preclerkship assessment using Norcini's criteria for good assessment as a framework. METHODS This multi-institutional study surveyed students and faculty at three institutions. A survey using Likert scale and open-ended questions was developed to evaluate faculty and student perceptions of CR-SAQs using the criteria of good assessment to determine the benefits and drawbacks. Descriptive statistics and Chi-square analyses are presented, and open responses were analyzed using directed content analysis to describe benefits and drawbacks of CR-SAQs. RESULTS A total of 260 students (19%) and 57 faculty (48%) completed the survey. Students and faculty report that the benefits of CR-SAQs are authenticity, deeper learning (educational effect), and receiving feedback (catalytic effect). Drawbacks included feasibility, construct validity, and scoring reproducibility. Students and faculty found CR-SAQs to be both acceptable (can show your reasoning, partial credit) and unacceptable (stressful, not USMLE format). CONCLUSIONS CR-SAQs are a method of aligning innovative curricula with assessment and could enrich the assessment toolkit for medical educators.
Collapse
Affiliation(s)
- Judith M Brenner
- Department of Science Education, Donald and Barbara Zucker School of Medicine, Hofstra/Northwell, Hempstead, New York, USA
| | - Tracy B Fulton
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California, USA
| | - Marieke Kruidering
- Department of Cellular and Molecular Pharmacology, University of California San Francisco,San Francisco, California, USA
| | - Jeffrey B Bird
- Department of Science Education, Donald and Barbara Zucker School of Medicine, Hofstra/Northwell, Hempstead, New York, USA
| | - Joanne Willey
- Department of Science Education, Donald and Barbara Zucker School of Medicine, Hofstra/Northwell, Hempstead, New York, USA
| | - Kelli Qua
- Center for Medical Education, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Doreen M Olvet
- Department of Science Education, Donald and Barbara Zucker School of Medicine, Hofstra/Northwell, Hempstead, New York, USA
| |
Collapse
|
2
|
Olvet DM, Bird JB, Fulton TB, Kruidering M, Papp KK, Qua K, Willey JM, Brenner JM. A Multi-institutional Study of the Feasibility and Reliability of the Implementation of Constructed Response Exam Questions. TEACHING AND LEARNING IN MEDICINE 2023; 35:609-622. [PMID: 35989668 DOI: 10.1080/10401334.2022.2111571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 07/27/2022] [Indexed: 06/15/2023]
Abstract
PROBLEM Some medical schools have incorporated constructed response short answer questions (CR-SAQs) into their assessment toolkits. Although CR-SAQs carry benefits for medical students and educators, the faculty perception that the amount of time required to create and score CR-SAQs is not feasible and concerns about reliable scoring may impede the use of this assessment type in medical education. INTERVENTION Three US medical schools collaborated to write and score CR-SAQs based on a single vignette. Study participants included faculty question writers (N = 5) and three groups of scorers: faculty content experts (N = 7), faculty non-content experts (N = 6), and fourth-year medical students (N = 7). Structured interviews were performed with question writers and an online survey was administered to scorers to gather information about their process for creating and scoring CR-SAQs. A content analysis was performed on the qualitative data using Bowen's model of feasibility as a framework. To examine inter-rater reliability between the content expert and other scorers, a random selection of fifty student responses from each site were scored by each site's faculty content experts, faculty non-content experts, and student scorers. A holistic rubric (6-point Likert scale) was used by two schools and an analytic rubric (3-4 point checklist) was used by one school. Cohen's weighted kappa (κw) was used to evaluate inter-rater reliability. CONTEXT This research study was implemented at three US medical schools that are nationally dispersed and have been administering CR-SAQ summative exams as part of their programs of assessment for at least five years. The study exam question was included in an end-of-course summative exam during the first year of medical school. IMPACT Five question writers (100%) participated in the interviews and twelve scorers (60% response rate) completed the survey. Qualitative comments revealed three aspects of feasibility: practicality (time, institutional culture, teamwork), implementation (steps in the question writing and scoring process), and adaptation (feedback, rubric adjustment, continuous quality improvement). The scorers' described their experience in terms of the need for outside resources, concern about lack of expertise, and value gained through scoring. Inter-rater reliability between the faculty content expert and student scorers was fair/moderate (κw=.34-.53, holistic rubrics) or substantial (κw=.67-.76, analytic rubric), but much lower between faculty content and non-content experts (κw=.18-.29, holistic rubrics; κw=.59-.66, analytic rubric). LESSONS LEARNED Our findings show that from the faculty perspective it is feasible to include CR-SAQs in summative exams and we provide practical information for medical educators creating and scoring CR-SAQs. We also learned that CR-SAQs can be reliably scored by faculty without content expertise or senior medical students using an analytic rubric, or by senior medical students using a holistic rubric, which provides options to alleviate the faculty burden associated with grading CR-SAQs.
Collapse
Affiliation(s)
- Doreen M Olvet
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| | - Jeffrey B Bird
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| | - Tracy B Fulton
- Department of Biochemistry and Biophysics, University of California San Francisco School of Medicine, San Francisco, California, USA
| | - Marieke Kruidering
- Department of Cellular & Molecular Pharmacology, University of California at San Francisco School of Medicine, San Francisco, California, USA
| | - Klara K Papp
- Center for Medical Education, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Kelli Qua
- Research and Evaluation, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Joanne M Willey
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| | - Judith M Brenner
- Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, USA
| |
Collapse
|
3
|
The Relationship between Learning Styles and Academic Performance: Consistency among Multiple Assessment Methods in Psychology and Education Students. SUSTAINABILITY 2021. [DOI: 10.3390/su13063341] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Universities strive to ensure quality education focused on the diversity of the student body. According to experiential learning theory, students display different learning preferences. This study has a three-fold objective: to compare learning styles based on personal and educational variables, to analyze the association between learning styles, the level of academic performance, and consistency of performance in four assessment methods, and to examine the influence of learning dimensions in students with medium-high performance in the assessment methods. An interdisciplinary approach was designed involving 289 psychology, early childhood education and primary education students at two universities in Spain. The Learning Style Inventory was used to assess learning styles and dimensions. The assessment methods used in the developmental psychology course included the following question formats: multiple-choice, short answer, creation-elaboration and an elaboration question on the relationship between theory and practice. Univariate analysis, multivariate analysis, and binomial logistic models were computed. The results reveal Psychology students to be more assimilative (theoretical and abstract), while early childhood and primary education students were evenly distributed among styles and were more divergent and convergent (practical) in absolute terms. In addition, high scores in perception (abstract conceptualization) were associated with a high level of performance on the multiple-choice tests and the elaboration question on the relationship between theory and practice. Abstract conceptualization was also associated with medium-high performance in all assessment methods and this variable predicted consistent high performance, independent of the assessment method. This study highlights the importance of promoting abstract conceptualization. Recommendations for enhancing this learning dimension are presented.
Collapse
|
4
|
Dontas IA, Applebee K, Vlissingen MFV, Galligioni V, Marinou K, Ryder K, Schenkel J, Prins JB, Degryse AD, Lewis DI. Assessable learning outcomes for the EU Education and Training Framework core and Function A specific modules: Report of an ETPLAS WORKING Group. Lab Anim 2020; 55:215-232. [PMID: 33287628 PMCID: PMC8182332 DOI: 10.1177/0023677220968589] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Article 23(2) of the European Union Directive 2010/63/EU, which regulates welfare
provisions for animals used for scientific purposes, requires that staff
involved in the care and use of animals for scientific purposes be adequately
educated and trained before they undertake any such work. However, the nature
and extent of such training is not stipulated in the Directive. To facilitate
Member States in fulfilling their education and training obligations, the
European Commission developed a common Education and Training Framework, which
was endorsed by the Member States Competent Authorities. An Education &
Training Platform for Laboratory Animal Science (ETPLAS) Working Group was
recently established to develop further guidance to the Learning Outcomes in the
Framework, with the objective to clarify the levels of knowledge and
understanding required by trainees, and to provide the criteria by which these
Learning Outcomes should be assessed. Using the Framework document as a starting
point, assessment criteria for the Learning Outcomes of the modules required for
Function A persons (carrying out procedures on animals) for rats, mice and
zebrafish were created with sufficient detail to enable trainees, providers and
assessors to appreciate the level of knowledge, understanding and skills
required to pass each module. Adoption and utilization of this document by
training providers and accrediting or approving bodies will harmonize
introductory education and training for those involved in the care and use of
animals for scientific purposes within the European Union, promote mutual
recognition of training within and between Member States and therefore free
movement of personnel.
Collapse
Affiliation(s)
- Ismene A Dontas
- Laboratory for Research of the Musculoskeletal System, School of Medicine, National & Kapodistrian University of Athens, Greece
| | | | | | | | | | | | - Johannes Schenkel
- German Cancer Research Centre and Institute of Physiology and Pathophysiology, University of Heidelberg, FR Germany
| | - Jan-Bas Prins
- Biological Research Facility, The Francis Crick Institute, UK.,Leiden University Medical Centre, The Netherlands
| | | | - David I Lewis
- School of Biomedical Sciences, Faculty of Biological Sciences, University of Leeds, UK
| |
Collapse
|
5
|
Validation and perception of a key feature problem examination in neurology. PLoS One 2019; 14:e0224131. [PMID: 31626678 PMCID: PMC6799971 DOI: 10.1371/journal.pone.0224131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 10/07/2019] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE To validate a newly-developed Key Feature Problem Examination (KFPE) in neurology, and to examine how it is perceived by students. METHODS We have developed a formative KFPE containing 12 key feature problems and 44 key feature items. The key feature problems covered four typical clinical situations. The items were presented in short- and long-menu question formats. Third- and fourth-year medical students undergoing the Neurology Course at our department participated in this study. The students' perception of the KFPE was assessed via a questionnaire. Students also had to pass a summative multiple-choice question examination (MCQE) containing 39 Type-A questions. All key feature and multiple-choice questions were classified using a modified Bloom's taxonomy. RESULTS The results from 81 KFPE participants were analyzed. The average score was 6.7/12 points. Cronbach's alpha for the 12 key-feature problems was 0.53. Item difficulty level scores were between 0.39 and 0.77, and item-total correlations between 0.05 and 0.36. Thirty-two key feature items of the KFPE were categorized as testers of comprehension, application and problem-solving, and 12 questions as testers of knowledge (MCQE: 15 comprehension and 24 knowledge, respectively). Overall correlations between the KFPE and the MCQE were intermediate. The KFPE was perceived well by the students. CONCLUSIONS Adherence to previously-established principles enables the creation of a valid KFPE in the field of Neurology.
Collapse
|
6
|
A comparison of clinical-scenario (case cluster) versus stand-alone multiple choice questions in a problem-based learning environment in undergraduate medicine. J Taibah Univ Med Sci 2016; 12:14-26. [PMID: 31435208 PMCID: PMC6694941 DOI: 10.1016/j.jtumed.2016.08.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 08/24/2016] [Accepted: 08/28/2016] [Indexed: 11/20/2022] Open
Abstract
Objectives To compare stand-alone multiple choice questions (MCQs) and integrated clinical-scenario (case cluster) multiple choice questions (CS-MCQs) in a problem-based learning (PBL) environment. Methods A retrospective descriptive analysis of MCQ examinations was conducted in a course that integrates the subspecialties of anatomical pathology, chemical pathology, hematology, immunology, microbiology and pharmacology. The MCQ items were analyzed for their reliability (Kuder-Richardson-20, KR-20), level of difficulty (Pi), discrimination index (Di), item distractors and student performances. The statistical analysis of the results was extracted from the integrity online item-analysis programme. The results of the standard stand-alone and CS multiple choice questions were compared. Results KR-20 for the CS-MCQs and stand-alone MCQs was consistently high. KR-20 and Pi were higher for the CS-MCQs. There was no significant difference between the CS-MCQs and stand-alone MCQs in Pi and Di. A range of difficulty levels was found based on Bloom's taxonomy. The mean scores for the class were higher for the CS-MCQ examination. The compilation of the CS-MCQ examination was more challenging. Conclusions CS-MCQs compare favorably to stand-alone MCQs and provide opportunities for the integration of sub-specialties and assessment in keeping with PBL. They assess students' cognitive skills and are reliable and practical. Different levels of item difficulty promote multi-logical and critical thinking. Students' scores were higher for the CS-MCQ examination, which may suggest better understanding of the material and/or better question clarity. The scenarios have to flow logically. Increasing the number of scenarios ensures the examination of more course content.
Collapse
Key Words
- APS-I, Applied Para-clinical Sciences-I
- APS-II, Applied Para-clinical Sciences-II
- APS-III, Applied Para-clinical Sciences-III
- CA, continuous assessment
- CPBR, corrected point-biserial ratio
- CS, clinical scenario
- CS-MCQ, clinical scenario multiple choice question
- Clinical scenario
- Difficulty
- Discrimination
- EMQ, extended matching questions
- Integration
- KR-20, KuderRichardson-20
- MCQ, multiple choice questions
- MEQ, modified essay questions
- PBL
- PBL, problem based learning
- PDQ, progressive disclosure questions
- SAQ, short answer questions
Collapse
|
7
|
Rush BR, Rankin DC, White BJ. The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC MEDICAL EDUCATION 2016; 16:250. [PMID: 27681933 PMCID: PMC5041405 DOI: 10.1186/s12909-016-0773-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 09/20/2016] [Indexed: 05/22/2023]
Abstract
BACKGROUND Failure to adhere to standard item-writing guidelines may render examination questions easier or more difficult than intended. Item complexity describes the cognitive skill level required to obtain a correct answer. Higher cognitive examination items promote critical thinking and are recommended to prepare students for clinical training. This study evaluated faculty-authored examinations to determine the impact of item-writing flaws and item complexity on the difficulty and discrimination value of examination items used to assess third year veterinary students. METHODS The impact of item-writing flaws and item complexity (cognitive level I-V) on examination item difficulty and discrimination value was evaluated on 1925 examination items prepared by clinical faculty for third year veterinary students. RESULTS The mean (± SE) percent correct (83.3 % ± 17.5) was consistent with target values in professional education, and the mean discrimination index (0.18 ± 0.17) was slightly lower than recommended (0.20). More than one item-writing flaw was identified in 37.3 % of questions. The most common item-writing flaws were awkward stem structure, implausible distractors, longest response is correct, and responses are series of true-false statements. Higher cognitive skills (complexity level III-IV) were required to correctly answer 38.4 % of examination items. As item complexity increased, item difficulty and discrimination values increased. The probability of writing discriminating, difficult examination items decreased when implausible distractors and all of the above were used, and increased if the distractors were comprised of a series of true/false statements. Items with four distractors were not more difficult or discriminating than items with three distractors. CONCLUSION Preparation of examination questions targeting higher cognitive levels will increase the likelihood of constructing discriminating items. Use of implausible distractors to complete a five-option multiple choice question does not strengthen the discrimination value.
Collapse
Affiliation(s)
- Bonnie R. Rush
- Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, KS USA
| | - David C. Rankin
- Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, KS USA
| | - Brad J. White
- Department of Clinical Sciences, College of Veterinary Medicine, Kansas State University, Manhattan, KS USA
| |
Collapse
|
8
|
Vuma S, Sa B. Evaluation of the effectiveness of progressive disclosure questions as an assessment tool for knowledge and skills in a problem based learning setting among third year medical students at The University of The West Indies, Trinidad and Tobago. BMC Res Notes 2015; 8:673. [PMID: 26567129 PMCID: PMC4643491 DOI: 10.1186/s13104-015-1603-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 10/19/2015] [Indexed: 11/26/2022] Open
Abstract
Background At the University of the West Indies, Trinidad and Tobago, third year undergraduate teaching is a hybrid of problem-based learning (PBL) and didactic lectures. PBL discourages students from simply getting basic factual knowledge but encourages them to integrate these basic facts with clinical knowledge and skills. Recently progressive disclosure questions (PDQ) also known as modified essay questions (MEQs) were introduced as an assessment tool which is reported to be in keeping with the PBL philosophy. Objective To describe the effectiveness of the PDQ as an assessment tool in a course that integrates the sub-specialties of Anatomical Pathology, Chemical Pathology, Haematology, Immunology, Microbiology, Pharmacology and Public Health. Methods A descriptive analysis of examination questions in PDQs, and the students’ performance in these examinations was performed for the academic years 2011–2012, 2012–2013, and 2013–2014 in one-third year course that integrates Anatomical Pathology, Chemical Pathology, Haematology, Immunology, Microbiology, Pharmacology and Public Health. Results The PDQs reflected real life scenarios and were composed of questions of different levels of difficulty by Blooms’ Taxonomy, from basic recall through more difficult questions requiring analytical, interpretative and problem solving skills. The integrated PDQs in the years 2011–2012, 2012–2013, 2013–2014 respectively was 52.9, 52.5, 58 % simple recall of facts. By sub-specialty this ranged from 26.7 to 100 %, 18.8 to 70 %, and 23.1 to 100 % in the 3 years respectively. The rest required higher order cognitive skills. For some sub-specialties, students’ performance was better where the examination was mostly basic recall, and was poorer where there were more higher-order questions. The different sub-specialties had different percentages of contribution in the integrated examinations ranging from 4 % in Public health to 22.9 % in Anatomical Pathology. Conclusion The PDQ asked students questions in an integrated fashion in keeping with the PBL process. More care should be taken to ensure appropriate questions are included in the examinations to assess higher order cognitive skills. However in an integrated course, some sub-specialties may not have content requiring higher cognitive level questions in certain clinical cases. More care should be taken in choosing clinical cases that integrate all the sub-specialties.
Collapse
Affiliation(s)
- Sehlule Vuma
- Department of Para-clinical Sciences, Faculty of Medical Sciences, The University of the West Indies, St Augustine, Trinidad and Tobago.
| | - Bidyadhar Sa
- Centre for Medical Sciences Education, Faculty of Medical Sciences, The University of the West Indies, St Augustine, Trinidad and Tobago.
| |
Collapse
|
9
|
Hift RJ. Should essays and other "open-ended"-type questions retain a place in written summative assessment in clinical medicine? BMC MEDICAL EDUCATION 2014; 14:249. [PMID: 25431359 PMCID: PMC4275935 DOI: 10.1186/s12909-014-0249-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 11/07/2014] [Indexed: 05/27/2023]
Abstract
BACKGROUND Written assessments fall into two classes: constructed-response or open-ended questions, such as the essay and a number of variants of the short-answer question, and selected-response or closed-ended questions; typically in the form of multiple-choice. It is widely believed that constructed response written questions test higher order cognitive processes in a manner that multiple-choice questions cannot, and consequently have higher validity. DISCUSSION An extensive review of the literature suggests that in summative assessment neither premise is evidence-based. Well-structured open-ended and multiple-choice questions appear equivalent in their ability to assess higher cognitive functions, and performance in multiple-choice assessments may correlate more highly than the open-ended format with competence demonstrated in clinical practice following graduation. Studies of construct validity suggest that both formats measure essentially the same dimension, at least in mathematics, the physical sciences, biology and medicine. The persistence of the open-ended format in summative assessment may be due to the intuitive appeal of the belief that synthesising an answer to an open-ended question must be both more cognitively taxing and similar to actual experience than is selecting a correct response. I suggest that cognitive-constructivist learning theory would predict that a well-constructed context-rich multiple-choice item represents a complex problem-solving exercise which activates a sequence of cognitive processes which closely parallel those required in clinical practice, hence explaining the high validity of the multiple-choice format. SUMMARY The evidence does not support the proposition that the open-ended assessment format is superior to the multiple-choice format, at least in exit-level summative assessment, in terms of either its ability to test higher-order cognitive functioning or its validity. This is explicable using a theory of mental models, which might predict that the multiple-choice format will have higher validity, a statement for which some empiric support exists. Given the superior reliability and cost-effectiveness of the multiple-choice format consideration should be given to phasing out open-ended format questions in summative assessment. Whether the same applies to non-exit-level assessment and formative assessment is a question which remains to be answered; particularly in terms of the educational effect of testing, an area which deserves intensive study.
Collapse
Affiliation(s)
- Richard J Hift
- Clinical and Professional Practice Research Group, School of Clinical Medicine, University of KwaZulu-Natal, Durban, 4013 South Africa
| |
Collapse
|
10
|
Freiwald T, Salimi M, Khaljani E, Harendza S. Pattern recognition as a concept for multiple-choice questions in a national licensing exam. BMC MEDICAL EDUCATION 2014; 14:232. [PMID: 25398312 PMCID: PMC4289202 DOI: 10.1186/1472-6920-14-232] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 10/17/2014] [Indexed: 05/14/2023]
Abstract
BACKGROUND Multiple-choice questions (MCQ) are still widely used in high stakes medical exams. We wanted to examine whether and to what extent a national licensing exam uses the concept of pattern recognition to test applied clinical knowledge. METHODS We categorized all 4,134 German National medical licensing exam questions between October 2006 and October 2012 by discipline, year, and type. We analyzed questions from the four largest disciplines: internal medicine (n = 931), neurology (n = 305), pediatrics (n = 281), and surgery (n = 233), with respect to the following question types: knowledge questions (KQ), pattern recognition questions (PRQ), inverse PRQ (IPRQ), and pseudo PRQ (PPRQ). RESULTS A total 51.1% of all questions were of a higher taxonomical order (PRQ and IPRQ) with a significant decrease in the percentage of these questions (p <0.001) from 2006 (61.5%) to 2012 (41.6%). The proportion of PRQs and IPRQs was significantly lower (p <0.001) in internal medicine and surgery, compared to neurology and pediatrics. PRQs were mostly used in questions about diagnoses (71.7%). A significantly higher (p <0.05) percentage of PR/therapy questions was found for internal medicine compared with neurology and pediatrics. CONCLUSION The concept of pattern recognition is used with different priorities and to various extents by the different disciplines in a high stakes exam to test applied clinical knowledge. Being aware of this concept may aid in the design and balance of MCQs in an exam with respect to testing clinical reasoning as a desired skill at the threshold of postgraduate medical education.
Collapse
Affiliation(s)
- Tilo Freiwald
- />Department of Nephrology, III. Medical Clinic, Goethe-University Hospital, Theodor-Stern-Kai 7, 60590 Frankfurt/Main, Germany
| | | | - Ehsan Khaljani
- />Department of Urology, Vivantes Auguste-Viktoria-Clinic, Rubensstraße 125, 12157 Berlin, Germany
| | - Sigrid Harendza
- />Department of Internal Medicine, University Hospital Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| |
Collapse
|
11
|
Freiwald T, Salimi M, Khaljani E, Harendza S. Pattern recognition as a concept for multiple-choice questions in a national licensing exam. BMC MEDICAL EDUCATION 2014; 14:232. [PMID: 25398312 DOI: 10.11861/472-6920-14-232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 10/17/2014] [Indexed: 05/24/2023]
Abstract
BACKGROUND Multiple-choice questions (MCQ) are still widely used in high stakes medical exams. We wanted to examine whether and to what extent a national licensing exam uses the concept of pattern recognition to test applied clinical knowledge. METHODS We categorized all 4,134 German National medical licensing exam questions between October 2006 and October 2012 by discipline, year, and type. We analyzed questions from the four largest disciplines: internal medicine (n = 931), neurology (n = 305), pediatrics (n = 281), and surgery (n = 233), with respect to the following question types: knowledge questions (KQ), pattern recognition questions (PRQ), inverse PRQ (IPRQ), and pseudo PRQ (PPRQ). RESULTS A total 51.1% of all questions were of a higher taxonomical order (PRQ and IPRQ) with a significant decrease in the percentage of these questions (p <0.001) from 2006 (61.5%) to 2012 (41.6%). The proportion of PRQs and IPRQs was significantly lower (p <0.001) in internal medicine and surgery, compared to neurology and pediatrics. PRQs were mostly used in questions about diagnoses (71.7%). A significantly higher (p <0.05) percentage of PR/therapy questions was found for internal medicine compared with neurology and pediatrics. CONCLUSION The concept of pattern recognition is used with different priorities and to various extents by the different disciplines in a high stakes exam to test applied clinical knowledge. Being aware of this concept may aid in the design and balance of MCQs in an exam with respect to testing clinical reasoning as a desired skill at the threshold of postgraduate medical education.
Collapse
Affiliation(s)
| | | | | | - Sigrid Harendza
- Department of Internal Medicine, University Hospital Hamburg-Eppendorf, Martinistr, 52, 20246 Hamburg, Germany.
| |
Collapse
|
12
|
Palmer E, Devitt P. The assessment of a structured online formative assessment program: a randomised controlled trial. BMC MEDICAL EDUCATION 2014; 14:8. [PMID: 24400883 PMCID: PMC3893582 DOI: 10.1186/1472-6920-14-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 12/30/2013] [Indexed: 05/09/2023]
Abstract
BACKGROUND Online formative assessment continues to be an important area of research and methods which actively engage the learner and provide useful learning outcomes are of particular interest. This study reports on the outcomes of a two year study of medical students using formative assessment tools. METHOD The study was conducted over two consecutive years using two different strategies for engaging students. The Year 1 strategy involved voluntary use of the formative assessment tool by 129 students. In Year 2, a second cohort of 130 students was encouraged to complete the formative assessment by incorporating summative assessment elements into it. Outcomes from pre and post testing students around the formative assessment intervention were used as measures of learning. To compare improvement scores between the two years a two-way Analysis of Variance (ANOVA) model was fitted to the data. RESULTS The ANOVA model showed that there was a significant difference in improvement scores between students in the two years (mean improvement percentage 19% vs. 38.5%, p < 0.0001). Students were more likely to complete formative assessment items if they had a summative component. In Year 2, the time spent using the formative assessment tool had no impact on student improvement, nor did the number of assessment items completed. CONCLUSION The online medium is a valuable learning resource, capable of providing timely formative feedback and stimulating student-centered learning. However the production of quality content is a time-consuming task and careful consideration must be given to the strategies employed to ensure its efficacy. Course designers should consider the potential positive impact summative components to formative assessment may have on student engagement and outcomes.
Collapse
Affiliation(s)
- Edward Palmer
- School of Education, University of Adelaide, Adelaide, Australia
- School of Medicine, University of Adelaide, Adelaide, Australia
| | - Peter Devitt
- School of Education, University of Adelaide, Adelaide, Australia
| |
Collapse
|
13
|
Boulouffe C, Doucet B, Muschart X, Charlin B, Vanpee D. Assessing clinical reasoning using a script concordance test with electrocardiogram in an emergency medicine clerkship rotation. Emerg Med J 2013; 31:313-6. [DOI: 10.1136/emermed-2012-201737] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
ObjectivesScript concordance tests (SCTs) can be used to assess clinical reasoning, especially in situations of uncertainty, by comparing the responses of examinees with those of emergency physicians. The examinee's answers are scored based on the level of agreement with responses provided by a panel of experts. Emergency physicians are frequently uncertain in the interpretation of ECGs. Thus, the aim of this study was to validate an SCT combined with an ECG.MethodsAn SCT-ECG was developed. The test was administered to medical students, residents and emergency physicians. Scoring was based on data from a panel of 12 emergency physicians. The statistical analyses assessed the internal reliability of the SCT (Cronbach's α) and its ability to discriminate between the different groups (ANOVA followed by Tukey's post hoc test).ResultsThe SCT-ECG was administered to 21 medical students, 19 residents and 12 emergency physicians. The internal reliability was satisfactory (Cronbach's α=0.80). Statistically significant differences were found between the groups (F0.271=21.07; p<0.0001). Moreover, significant differences (post hoc test) were detected between students and residents (p<0.001), students and experts (p<0.001), and residents and experts (p=0.017).ConclusionsThis SCT-ECG is a valid tool to assess clinical reasoning in a context of uncertainty due to its high internal reliability and its ability to discriminate between different levels of expertise.
Collapse
|
14
|
Duggan P, Charlin B. Summative assessment of 5th year medical students' clinical reasoning by Script Concordance Test: requirements and challenges. BMC MEDICAL EDUCATION 2012; 12:29. [PMID: 22571351 PMCID: PMC3419609 DOI: 10.1186/1472-6920-12-29] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Accepted: 05/09/2012] [Indexed: 05/14/2023]
Abstract
BACKGROUND The Script Concordance Test (SCT) has not been reported in summative assessment of students across the multiple domains of a medical curriculum. We report the steps used to build a test for summative assessment in a medical curriculum. METHODS A 51 case, 158-question, multidisciplinary paper was constructed to assess clinical reasoning in 5th-year. 10-16 experts in each of 7 discipline-based reference panels answered questions on-line. A multidisciplinary group considered reference panel data and data from a volunteer group of 6th Years, who sat the same test, to determine the passing score for the 5th Years. RESULTS The mean (SD) scores were 63.6 (7.6) and 68.6 (4.8) for the 6th Year (n = 23, alpha = 0.78) and and 5th Year (n = 132, alpha =0.62) groups (p < 0.05), respectively. The passing score was set at 4 SD from the expert mean. Four students failed. CONCLUSIONS The SCT may be a useful method to assess clinical reasoning in medical students in multidisciplinary summative assessments. Substantial investment in training of faculty and students and in the development of questions is required.
Collapse
Affiliation(s)
- Paul Duggan
- Discipline of Obstetrics and Gynaecology, The University of Adelaide, Frome Rd, Adelaide, South Australia, 5000, Australia
| | - Bernard Charlin
- CPASS, direction de la recherche, Faculté de Médecine, Université de Montréal, CP 6128, Succursale centre- ville, Montréal, Québec, H3C 3 J7, Canada
| |
Collapse
|