1
|
Guo Z, Wang D, Cai Y, Tu D. An Item Response Theory Model for Incorporating Response Times in Forced-Choice Measures. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2024; 84:450-480. [PMID: 38756463 PMCID: PMC11095319 DOI: 10.1177/00131644231171193] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Forced-choice (FC) measures have been widely used in many personality or attitude tests as an alternative to rating scales, which employ comparative rather than absolute judgments. Several response biases, such as social desirability, response styles, and acquiescence bias, can be reduced effectively. Another type of data linked with comparative judgments is response time (RT), which contains potential information concerning respondents' decision-making process. It would be challenging but exciting to combine RT into FC measures better to reveal respondents' behaviors or preferences in personality measurement. Given this situation, this study aims to propose a new item response theory (IRT) model that incorporates RT into FC measures to improve personality assessment. Simulation studies show that the proposed model can effectively improve the estimation accuracy of personality traits with the ancillary information contained in RT. Also, an application on a real data set reveals that the proposed model estimates similar but different parameter values compared with the conventional Thurstonian IRT model. The RT information can explain these differences.
Collapse
Affiliation(s)
| | - Daxun Wang
- Jiangxi Normal University, Nanchang, China
| | - Yan Cai
- Jiangxi Normal University, Nanchang, China
| | - Dongbo Tu
- Jiangxi Normal University, Nanchang, China
| |
Collapse
|
2
|
Nie L, Xu P, Hu D. Multidimensional IRT for forced choice tests: A literature review. Heliyon 2024; 10:e26884. [PMID: 38449643 PMCID: PMC10915382 DOI: 10.1016/j.heliyon.2024.e26884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/11/2024] [Accepted: 02/21/2024] [Indexed: 03/08/2024] Open
Abstract
The Multidimensional Forced Choice (MFC) test is frequently utilized in non-cognitive evaluations because of its effectiveness in reducing response bias commonly associated with the conventional Likert scale. Nonetheless, it is critical to recognize that the MFC test generates ipsative data, a type of measurement that has been criticized due to its limited applicability for comparing individuals. Multidimensional item response theory (MIRT) models have recently sparked renewed interest among academics and professionals. This is largely due to the development of several models that make it easier to collect normative data from forced-choice tests. The paper introduces a modeling framework made up of three key components: response format, measurement model, and decision theory. Under this paradigm, four IRT models were chosen as examples. Following that, a comprehensive study is carried out to compare and characterize the parameter estimation techniques used in MFC-IRT models. This work then examines empirical research on the concept by analyzing three distinct domains: parameter invariance testing, computerized adaptive testing (CAT), and validity investigation. Finally, it is recommended that future research initiatives follow four distinct paths: modeling, parameter invariance testing, forced-choice CAT, and validity studies.
Collapse
Affiliation(s)
- Lei Nie
- School of Public Administration, East China Normal University, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, China
| | - Di Hu
- School of Education and Social Policy, Northwestern University, USA
| |
Collapse
|
3
|
Zheng C, Liu J, Li Y, Xu P, Zhang B, Wei R, Zhang W, Liu B, Huang J. A 2PLM-RANK multidimensional forced-choice model and its fast estimation algorithm. Behav Res Methods 2024:10.3758/s13428-023-02315-x. [PMID: 38409459 DOI: 10.3758/s13428-023-02315-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/28/2024]
Abstract
High-stakes non-cognitive tests frequently employ forced-choice (FC) scales to deter faking. To mitigate the issue of score ipsativity derived, many scoring models have been devised. Among them, the multi-unidimensional pairwise preference (MUPP) framework is a highly flexible and commonly used framework. However, the original MUPP model was developed for unfolding response process and can only handle paired comparisons. The present study proposes the 2PLM-RANK as a generalization of the MUPP model to accommodate dominance RANK format response. In addition, an improved stochastic EM (iStEM) algorithm is devised for more stable and efficient parameter estimation. Simulation results generally supported the efficiency and utility of the new algorithm in estimating the 2PLM-RANK when applied to both triplets and tetrads across various conditions. An empirical illustration with responses to a 24-dimensional personality test further supported the practicality of the proposed model. To further aid in the application of the new model, a user-friendly R package is also provided.
Collapse
Affiliation(s)
- Chanjin Zheng
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China.
| | - Juan Liu
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Yaling Li
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Bo Zhang
- School of Labor and Employment Relations and Department of Psychology, University of Illinois Urbana-Champaign, Champaign, USA
| | - Ran Wei
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Wenqing Zhang
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Boyang Liu
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Jing Huang
- Educational Psychology and Research Methodology, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
4
|
Wang Q, Zheng Y, Liu K, Cai Y, Peng S, Tu D. Item selection methods in multidimensional computerized adaptive testing for forced-choice items using Thurstonian IRT model. Behav Res Methods 2024; 56:600-614. [PMID: 36750522 DOI: 10.3758/s13428-022-02037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2022] [Indexed: 02/09/2023]
Abstract
Multidimensional computerized adaptive testing for forced-choice items (MFC-CAT) combines the benefits of multidimensional forced-choice (MFC) items and computerized adaptive testing (CAT) in that it eliminates response biases and reduces administration time. Previous studies that explored designs of MFC-CAT only discussed item selection methods based on the Fisher information (FI), which is known to perform unstably at early stages of CAT. This study proposes a set of new item selection methods based on the KL information for MFC-CAT (namely MFC-KI, MFC-KB, and MFC-KLP) based on the Thurstonian IRT (TIRT) model. Three simulation studies, including one based on real data, were conducted to compare the performance of the proposed KL-based item selection methods against the existing FI-based methods in three- and five-dimensional MFC-CAT scenarios with various test lengths and inter-trait correlations. Results demonstrate that the proposed KL-based item selection methods are feasible for MFC-CAT and generate acceptable trait estimation accuracy and uniformity of item pool usage. Among the three proposed methods, MFC-KB and MFC-KLP outperformed the existing FI-based item selection methods and resulted in the most accurate trait estimation and relatively even utilization of the item pool.
Collapse
Affiliation(s)
- Qin Wang
- Jiangxi Normal University, Nanchang, China
| | - Yi Zheng
- Arizonal State Univerity, Tempe, AZ, USA
| | - Kai Liu
- Jiangxi Normal University, Nanchang, China
| | - Yan Cai
- Jiangxi Normal University, Nanchang, China.
| | - Siwei Peng
- Jiangxi Normal University, Nanchang, China
| | - Dongbo Tu
- Jiangxi Normal University, Nanchang, China.
| |
Collapse
|
5
|
Qiu X, de la Torre J. A dual process item response theory model for polytomous multidimensional forced-choice items. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2023; 76:491-512. [PMID: 36967236 DOI: 10.1111/bmsp.12303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/03/2023] [Indexed: 06/18/2023]
Abstract
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.
Collapse
Affiliation(s)
- Xuelan Qiu
- Institute for Learning Sciences & Teacher Education, Australian Catholic University, Brisbane, Queensland, Australia
| | | |
Collapse
|
6
|
Rasheed S, Robie C. Faking resistance of a quasi‐ipsative RIASEC occupational interest measure. INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT 2023. [DOI: 10.1111/ijsa.12427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Affiliation(s)
- Sabah Rasheed
- Lazaridis School of Business and Economics Wilfrid Laurier University Waterloo Ontario Canada
| | - Chet Robie
- Lazaridis School of Business and Economics Wilfrid Laurier University Waterloo Ontario Canada
| |
Collapse
|
7
|
Joo SH, Lee P, Stark S. Modeling Multidimensional Forced Choice Measures with the Zinnes and Griggs Pairwise Preference Item Response Theory Model. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:241-261. [PMID: 34370564 DOI: 10.1080/00273171.2021.1960142] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This research developed a new ideal point-based item response theory (IRT) model for multidimensional forced choice (MFC) measures. We adapted the Zinnes and Griggs (ZG; 1974) IRT model and the multi-unidimensional pairwise preference (MUPP; Stark et al., 2005) model, henceforth referred to as ZG-MUPP. We derived the information function to evaluate the psychometric properties of MFC measures and developed a model parameter estimation algorithm using Markov chain Monte Carlo (MCMC). To evaluate the efficacy of the proposed model, we conducted a simulation study under various experimental conditions such as sample sizes, number of items, and ranges of discrimination and location parameters. The results showed that the model parameters were accurately estimated when the sample size was as low as 500. The empirical results also showed that the scores from the ZG-MUPP model were comparable to those from the MUPP model and the Thurstonian IRT (TIRT) model. Practical implications and limitations are further discussed.
Collapse
|
8
|
Huang HY. Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:146-180. [PMID: 36601255 PMCID: PMC9806518 DOI: 10.1177/00131644211069906] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.
Collapse
Affiliation(s)
- Hung-Yu Huang
- University of Taipei, Taiwan
- Hung-Yu Huang, Distinguished Professor,
Department of Psychology and Counseling, University of Taipei, No.1, Ai-Guo West
Road, Taipei, 10048, Taiwan.
| |
Collapse
|
9
|
Frick S, Brown A, Wetzel E. Investigating the Normativity of Trait Estimates from Multidimensional Forced-Choice Data. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:1-29. [PMID: 34464217 DOI: 10.1080/00273171.2021.1938960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The Thurstonian item response model (Thurstonian IRT model) allows deriving normative trait estimates from multidimensional forced-choice (MFC) data. In the MFC format, persons must rank-order items that measure different attributes according to how well the items describe them. This study evaluated the normativity of Thurstonian IRT trait estimates both in a simulation and empirically. The simulation investigated normativity and compared Thurstonian IRT trait estimates to those using classical partially ipsative scoring, from dichotomous true-false (TF) data and rating scale data. The results showed that, with blocks of opposite keyed items, Thurstonian IRT trait estimates were normative in contrast to classical partially ipsative estimates. Unbalanced numbers of items per trait, few opposite keyed items, traits correlated positively or assessing fewer traits did not decrease measurement precision markedly. Measurement precision was lower than that of rating scale data. The empirical study investigated whether relative MFC responses provide a better differentiation of behaviors within persons than absolute TF responses. However, criterion validity was equal and construct validity (with constructs measured by rating scales) lower in MFC. Thus, Thurstonian IRT modeling of MFC data overcomes the drawbacks of classical scoring, but gains in validity may depend on eliminating common method biases from the comparison.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, University of Mannheim
| | - Anna Brown
- Department of Psychology, University of Kent
| | - Eunike Wetzel
- Department of Psychology, Otto-von-Guericke University Magdeburg
- Department of Psychology, University of Koblenz-Landau
| |
Collapse
|
10
|
Bürkner PC. On the Information Obtainable from Comparative Judgments. PSYCHOMETRIKA 2022; 87:1439-1472. [PMID: 35133553 PMCID: PMC9636126 DOI: 10.1007/s11336-022-09843-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 11/04/2021] [Indexed: 06/14/2023]
Abstract
Personality tests employing comparative judgments have been proposed as an alternative to Likert-type rating scales. One of the main advantages of a comparative format is that it can reduce faking of responses in high-stakes situations. However, previous research has shown that it is highly difficult to obtain trait score estimates that are both faking resistant and sufficiently accurate for individual-level diagnostic decisions. With the goal of contributing to a solution, I study the information obtainable from comparative judgments analyzed by means of Thurstonian IRT models. First, I extend the mathematical theory of ordinal comparative judgments and corresponding models. Second, I provide optimal test designs for Thurstonian IRT models that maximize the accuracy of people's trait score estimates from both frequentist and Bayesian statistical perspectives. Third, I derive analytic upper bounds for the accuracy of these trait estimates achievable through ordinal Thurstonian IRT models. Fourth, I perform numerical experiments that complement results obtained in earlier simulation studies. The combined analytical and numerical results suggest that it is indeed possible to design personality tests using comparative judgments that yield trait scores estimates sufficiently accurate for individual-level diagnostic decisions, while reducing faking in high-stakes situations. Recommendations for the practical application of comparative judgments for the measurement of personality, specifically in high-stakes situations, are given.
Collapse
|
11
|
A Big Five-Based Multimethod Social and Emotional Skills Assessment: The Mosaic™ by ACT® Social Emotional Learning Assessment. J Intell 2022; 10:jintelligence10040072. [PMID: 36278594 PMCID: PMC9590031 DOI: 10.3390/jintelligence10040072] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 08/29/2022] [Accepted: 09/07/2022] [Indexed: 12/01/2022] Open
Abstract
A focus on implementing social and emotional (SE) learning into curricula continues to gain popularity in K-12 educational contexts at the policy and practitioner levels. As it continues to be elevated in educational discourse, it becomes increasingly clear that it is important to have reliable, validated measures of students’ SE skills. Here we argue that framework and design are additional important considerations for the development and selection of SE skill assessments. We report the reliability and validity evidence for The Mosaic™ by ACT® Social Emotional Learning Assessment, an assessment designed to measure SE skills in middle and high school students that makes use of a research-based framework (the Big Five) and a multi-method approach (three item types including Likert, forced choice, and situational judgment tests). Here, we provide the results from data collected from more than 33,000 students who completed the assessment and for whom we have data on various outcome measures. We examined the validity evidence for the individual item types and the aggregate scores based on those three. Our findings support the contribution of multi-method assessment and an aggregate score. We discuss the ways the field can benefit from this or similarly designed assessments and discuss how the assessment results can be used by practitioners to promote programs aimed at stimulating students’ personal growth.
Collapse
|
12
|
Bunji K, Okada K. Linear Ballistic Accumulator Item Response Theory Model for Multidimensional Multiple-Alternative Forced-Choice Measurement of Personality. MULTIVARIATE BEHAVIORAL RESEARCH 2022; 57:658-678. [PMID: 33750245 DOI: 10.1080/00273171.2021.1896351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
There has been a growing interest in psychological measurements that use the multiple-alternative forced-choice (MAFC) response format for its resistance to response biases. Although several models have been proposed for the data obtained from such measurements, none have succeeded in incorporating the response time information. Given that currently, many psychological measurements are performed via computers, it would be beneficial to develop a joint model involving an MAFC item response and response time. The present study proposes the first model that combines a cognitive process model that underlies the observed response time and the forced-choice item response model. Specifically, the proposed model is based on the linear ballistic accumulator model of response time, which is substantially extended by reformulating its parameters so as to incorporate the MAFC item responses. The model parameters are estimated by the Markov chain Monte Carlo (MCMC) algorithm. A simulation study confirmed that the proposed approach could appropriately recover the parameters. Two empirical applications are reported to demonstrate the use of the proposed model and compare it with existing models. The results showed that the proposed model could be a useful tool for jointly modeling the MAFC item responses and response times.
Collapse
Affiliation(s)
- Kyosuke Bunji
- Graduate School of Education, The University of Tokyo
- Japan Society for the Promotion of Science
| | - Kensuke Okada
- Graduate School of Education, The University of Tokyo
| |
Collapse
|
13
|
Hilliard A, Kazim E, Bitsakis T, Leutner F. Scoring a forced-choice image-based assessment of personality: A comparison of machine learning, regression, and summative approaches. Acta Psychol (Amst) 2022; 228:103659. [PMID: 35780596 DOI: 10.1016/j.actpsy.2022.103659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 06/22/2022] [Accepted: 06/22/2022] [Indexed: 11/01/2022] Open
Abstract
Recent years have seen rapid advancements in the way that personality is measured, resulting in a number of innovative predictive measures being proposed, including using features extracted from videos and social media profiles. In the context of selection, game- and image-based assessments of personality are emerging, which can overcome issues like social desirability bias, lack of engagement and low response rates that are associated with traditional self-report measures. Forced-choice formats, where respondents are asked to rank responses, can also mitigate issues such as acquiescence and social desirability bias. Previously, we reported on the development of a gamified forced-choice image-based assessment of the Big Five personality traits created for use in selection, using Lasso regression for the scoring algorithms. In this study, we compare the machine-learning-based Lasso approach to ordinary least squares regression, as well as the summative approach that is typical of forced-choice formats. We find that the Lasso approach performs best in terms of generalisability and convergent validity, although the other methods have greater discriminate validity. We recommend the use of predictive Lasso regression models for scoring forced-choice image-based measures of personality over the other approaches. Potential further studies are suggested.
Collapse
Affiliation(s)
- Airlie Hilliard
- Institute of Management Studies, Goldsmiths, University of London, New Cross, London SE14 6NW, UK; Holistic AI, London, UK.
| | - Emre Kazim
- Holistic AI, London, UK; Department of Computer Science, University College London, Gower St, London WC1E 6EA, UK
| | | | - Franziska Leutner
- Institute of Management Studies, Goldsmiths, University of London, New Cross, London SE14 6NW, UK; HireVue, London, UK
| |
Collapse
|
14
|
Frick S. Modeling Faking in the Multidimensional Forced-Choice Format: The Faking Mixture Model. PSYCHOMETRIKA 2022; 87:773-794. [PMID: 34927219 PMCID: PMC9166892 DOI: 10.1007/s11336-021-09818-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 08/31/2021] [Accepted: 10/15/2021] [Indexed: 06/14/2023]
Abstract
The multidimensional forced-choice (MFC) format has been proposed to reduce faking because items within blocks can be matched on desirability. However, the desirability of individual items might not transfer to the item blocks. The aim of this paper is to propose a mixture item response theory model for faking in the MFC format that allows to estimate the fakability of MFC blocks, termed the Faking Mixture model. Given current computing capabilities, within-subject data from both high- and low-stakes contexts are needed to estimate the model. A simulation showed good parameter recovery under various conditions. An empirical validation showed that matching was necessary but not sufficient to create an MFC questionnaire that can reduce faking. The Faking Mixture model can be used to reduce fakability during test construction.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, Mannheim, Germany.
| |
Collapse
|
15
|
Qiu XL, de la Torre J, Ro S, Wang WC. Computerized Adaptive Testing for Ipsative Tests with Multidimensional Pairwise-Comparison Items: Algorithm Development and Applications. APPLIED PSYCHOLOGICAL MEASUREMENT 2022; 46:255-272. [PMID: 35601264 PMCID: PMC9118927 DOI: 10.1177/01466216221084209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A computerized adaptive testing (CAT) solution for tests with multidimensional pairwise-comparison (MPC) items, aiming to measure career interest, value, and personality, is rare. This paper proposes new item selection and exposure control methods for CAT with dichotomous and polytomous MPC items and present simulation study results. The results show that the procedures are effective in selecting items and controlling within-person statement exposure with no loss of efficiency. Implications are discussed in two applications of the proposed CAT procedures: a work attitude test with dichotomous MPC items and a career interest assessment with polytomous MPC items.
Collapse
|
16
|
Bayesian paired comparison with the bpcs package. Behav Res Methods 2021; 54:2025-2045. [PMID: 34846675 PMCID: PMC9374650 DOI: 10.3758/s13428-021-01714-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/16/2021] [Indexed: 11/08/2022]
Abstract
This article introduces the bpcs R package (Bayesian Paired Comparison in Stan) and the statistical models implemented in the package. This package aims to facilitate the use of Bayesian models for paired comparison data in behavioral research. Bayesian analysis of paired comparison data allows parameter estimation even in conditions where the maximum likelihood does not exist, allows easy extension of paired comparison models, provides straightforward interpretation of the results with credible intervals, has better control of type I error, has more robust evidence towards the null hypothesis, allows propagation of uncertainties, includes prior information, and performs well when handling models with many parameters and latent variables. The bpcs package provides a consistent interface for R users and several functions to evaluate the posterior distribution of all parameters to estimate the posterior distribution of any contest between items and to obtain the posterior distribution of the ranks. Three reanalyses of recent studies that used the frequentist Bradley-Terry model are presented. These reanalyses are conducted with the Bayesian models of the bpcs package, and all the code used to fit the models, generate the figures, and the tables are available in the online appendix.
Collapse
|
17
|
Chen CW, Wang WC, Mok MMC, Scherer R. A Lognormal Ipsative Model for Multidimensional Compositional Items. Front Psychol 2021; 12:573252. [PMID: 34712161 PMCID: PMC8545823 DOI: 10.3389/fpsyg.2021.573252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 09/14/2021] [Indexed: 11/13/2022] Open
Abstract
Compositional items – a form of forced-choice items – require respondents to allocate a fixed total number of points to a set of statements. To describe the responses to these items, the Thurstonian item response theory (IRT) model was developed. Despite its prominence, the model requires that items composed of parts of statements result in a factor loading matrix with full rank. Without this requirement, the model cannot be identified, and the latent trait estimates would be seriously biased. Besides, the estimation of the Thurstonian IRT model often results in convergence problems. To address these issues, this study developed a new version of the Thurstonian IRT model for analyzing compositional items – the lognormal ipsative model (LIM) – that would be sufficient for tests using items with all statements positively phrased and with equal factor loadings. We developed an online value test following Schwartz’s values theory using compositional items and collected response data from a sample size of N = 512 participants with ages from 13 to 51 years. The results showed that our LIM had an acceptable fit to the data, and that the reliabilities exceeded 0.85. A simulation study resulted in good parameter recovery, high convergence rate, and the sufficient precision of estimation in the various conditions of covariance matrices between traits, test lengths and sample sizes. Overall, our results indicate that the proposed model can overcome the problems of the Thurstonian IRT model when all statements are positively phrased and factor loadings are similar.
Collapse
Affiliation(s)
- Chia-Wen Chen
- Centre for Educational Measurement, University of Oslo, Oslo, Norway
| | - Wen-Chung Wang
- Assessment Research Centre, The Education University of Hong Kong, Tai Po, Hong Kong, SAR China
| | - Magdalena Mo Ching Mok
- Assessment Research Centre, The Education University of Hong Kong, Tai Po, Hong Kong, SAR China.,Graduate Institute of Educational Information and Measurement, National Taichung University of Education, Taichung, Taiwan
| | - Ronny Scherer
- Centre for Educational Measurement, University of Oslo, Oslo, Norway
| |
Collapse
|
18
|
Martínez A, Salgado JF. A Meta-Analysis of the Faking Resistance of Forced-Choice Personality Inventories. Front Psychol 2021; 12:732241. [PMID: 34659043 PMCID: PMC8511514 DOI: 10.3389/fpsyg.2021.732241] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/01/2021] [Indexed: 11/13/2022] Open
Abstract
This study presents a comprehensive meta-analysis on the faking resistance of forced-choice (FC) inventories. The results showed that (1) FC inventories show resistance to faking behavior; (2) the magnitude of faking is higher in experimental contexts than in real-life selection processes, suggesting that the effects of faking may be, in part, a laboratory phenomenon; and (3) quasi-ipsative FC inventories are more resistant to faking than the other FC formats. Smaller effect sizes were found for conscientiousness when the quasi-ipsative format was used (δ = 0.49 vs. δ = 1.27 for ipsative formats). Also, the effect sizes were smaller for the applicant samples than for the experimental samples. Finally, the contributions and practical implications of these findings are discussed.
Collapse
Affiliation(s)
- Alexandra Martínez
- Department of Political Science and Sociology, Faculty of Labor Relations, University of Santiago de Compostela, Santiago de Compostela, Spain
| | | |
Collapse
|
19
|
Walton KE, Radunzel J, Moore R, Burrus J, Anguiano-Carrasco C, Murano D. Adjectives vs. Statements in Forced Choice and Likert Item Types: Which Is More Resistant to Impression Management in Personality Assessment? J Pers Assess 2021; 103:842-853. [PMID: 33533652 DOI: 10.1080/00223891.2021.1878523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Our objective was to compare individuals' ability to intentionally make a positive impression when responding to a Five-Factor Model personality measure under adjective vs. statement and forced choice vs. Likert conditions. Participants were 1,798 high school students who were randomly assigned to either a condition receiving normal instructions or instructions to make a positive impression. We compared the groups' scores and validity estimates under the various conditions. Although impression management occurred on all item types, participants could more easily manipulate their responses to Likert items vs. forced choice items, and statements vs. adjectives. Item type made little difference in terms of convergent and discriminant validity and criterion-related validity for all outcomes but one, ACT scores, which suggests cognitive ability plays a role in impression management ability.
Collapse
|
20
|
Adaptive testing with the GGUM-RANK multidimensional forced choice model: Comparison of pair, triplet, and tetrad scoring. Behav Res Methods 2020; 52:761-772. [PMID: 31342469 DOI: 10.3758/s13428-019-01274-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Likert-type measures have been criticized in psychological assessment because they are vulnerable to response biases, including central tendency, acquiescence, leniency, halo, and socially desirable responding. As an alternative, multidimensional forced choice (MFC) testing has been proposed to address these concerns. A number of researchers have developed item response theory (IRT) models for MFC data and have examined latent trait estimation with tests of different dimensionality and length. Research has also explored the advantages of computerized adaptive testing (CAT) with MFC pair tests having as many as 25 dimensions, but there have been no published studies on CAT with MFC triplets or tetrads. Thus, in this research we aimed to address that issue. We used recently developed item information functions for an MFC ranking model to compare the benefits of CAT with MFC pair, triplet, and tetrad tests. A simulation study showed that CAT substantially outperformed nonadaptive testing for latent trait estimation across MFC formats. More importantly, CAT with MFC pairs provided estimation accuracy similar to or better than that from tests of equivalent numbers of nonadaptive MFC triplets. On the basis of these findings, implications and recommendations are further discussed for constructing MFC measures to use in psychological contexts.
Collapse
|
21
|
Lee P, Joo SH, Stark S. Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model. ORGANIZATIONAL RESEARCH METHODS 2020. [DOI: 10.1177/1094428120959822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.
Collapse
|
22
|
Lee H, Smith WZ. A Bayesian Random Block Item Response Theory Model for Forced-Choice Formats. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2020; 80:578-603. [PMID: 32425220 PMCID: PMC7221495 DOI: 10.1177/0013164419871659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.
Collapse
Affiliation(s)
- HyeSun Lee
- California State University Channel
Islands, Camarillo, CA, USA
| | - Weldon Z. Smith
- California State University Channel
Islands, Camarillo, CA, USA
| |
Collapse
|
23
|
Lee H, Smith WZ. Fit Indices for Measurement Invariance Tests in the Thurstonian IRT Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:282-295. [PMID: 32536730 PMCID: PMC7262996 DOI: 10.1177/0146621619893785] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This study examined whether cutoffs in fit indices suggested for traditional formats with maximum likelihood estimators can be utilized to assess model fit and to test measurement invariance when a multiple group confirmatory factor analysis was employed for the Thurstonian item response theory (IRT) model. Regarding the performance of the evaluation criteria, detection of measurement non-invariance and Type I error rates were examined. The impact of measurement non-invariance on estimated scores in the Thurstonian IRT model was also examined through accuracy and efficiency in score estimation. The fit indices used for the evaluation of model fit performed well. Among six cutoffs for changes in model fit indices, only ΔCFI > .01 and ΔNCI > .02 detected metric non-invariance when the medium magnitude of non-invariance occurred and none of the cutoffs performed well to detect scalar non-invariance. Based on the generated sampling distributions of fit index differences, this study suggested ΔCFI > .001 and ΔNCI > .004 for scalar non-invariance and ΔCFI > .007 for metric non-invariance. Considering Type I error rate control and detection rates of measurement non-invariance, ΔCFI was recommended for measurement non-invariance tests for forced-choice format data. Challenges in measurement non-invariance tests in the Thurstonian IRT model were discussed along with the direction for future research to enhance the utility of forced-choice formats in test development for cross-cultural and international settings.
Collapse
Affiliation(s)
- HyeSun Lee
- California State University Channel Islands, Camarillo, USA
| | | |
Collapse
|
24
|
Ng V, Lee P, Ho MHR, Kuykendall L, Stark S, Tay L. The Development and Validation of a Multidimensional Forced-Choice Format Character Measure: Testing the Thurstonian IRT Approach. J Pers Assess 2020; 103:224-237. [DOI: 10.1080/00223891.2020.1739056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Vincent Ng
- Department of Psychology, University of Houston, Houston, Texas
| | - Philseok Lee
- Department of Psychology, George Mason University, Fairfax, Virginia
| | - Moon-Ho Ringo Ho
- School of Humanities and Social Sciences, Nanyang Technological University, Singapore
| | - Lauren Kuykendall
- Department of Psychology, George Mason University, Fairfax, Virginia
| | - Stephen Stark
- Department of Psychology, University of South Florida, Tampa, Florida
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana
| |
Collapse
|
25
|
Glendon AI, Prendergast S. Rank-ordering anti-speeding messages. ACCIDENT; ANALYSIS AND PREVENTION 2019; 132:105254. [PMID: 31470279 DOI: 10.1016/j.aap.2019.07.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 06/20/2019] [Accepted: 07/28/2019] [Indexed: 06/10/2023]
Abstract
PURPOSE Further explore the utility of protection motivation theory (PMT) in developing effective roadside anti-speeding messages. METHOD Via an electronic link, 81 participants holding a current Australian driver's license rated all possible pairs of 18 PMT-derived anti-speeding messages in terms of their perceived effectiveness in reducing speed for themselves, and for drivers in general. RESULTS While some messages revealed third-person effects (perceived as being more relevant to drivers-in-general than to self-as-driver), others showed reverse third-person effects (perceived as being more relevant to self-as-driver than to drivers-in-general). Compared with messages based on coping appraisal components, those derived from threat appraisal PMT components (perceived severity, counter-rewards, vulnerability) were rated as being more effective, both for participants themselves as driver, and for drivers-in-general. Compared with females, males reported threat appraisal messages as being more effective for reducing speed in themselves (reverse third-person effect). Aggregate scores for the 18 messages derived from this ipsative methodology correlated modestly with those from a normative study using similarly-worded items. DISCUSSION As jurisdictions globally recognize speeding as a major road safety issue, effective anti-speeding campaigns are essential. Findings added to current knowledge of PMT's efficacy as a basis for generating effective anti-speeding messages and indicated areas for future research and application.
Collapse
Affiliation(s)
- A Ian Glendon
- School of Applied Psychology, Griffith University, Gold Coast Campus, Queensland, 4222, Australia; Centre for Work, Organisation and Wellbeing, Griffith University, Nathan Campus, Queensland, 4111, Australia; Cities Research Institute, Griffith University, Gold Coast Campus, Queensland, 4222, Australia.
| | - Samantha Prendergast
- School of Applied Psychology, Griffith University, Gold Coast Campus, Queensland, 4222, Australia
| |
Collapse
|
26
|
Kreitchmann RS, Abad FJ, Ponsoda V, Nieto MD, Morillo D. Controlling for Response Biases in Self-Report Scales: Forced-Choice vs. Psychometric Modeling of Likert Items. Front Psychol 2019; 10:2309. [PMID: 31681103 PMCID: PMC6803422 DOI: 10.3389/fpsyg.2019.02309] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 09/27/2019] [Indexed: 11/23/2022] Open
Abstract
One important problem in the measurement of non-cognitive characteristics such as personality traits and attitudes is that it has traditionally been made through Likert scales, which are susceptible to response biases such as social desirability (SDR) and acquiescent (ACQ) responding. Given the variability of these response styles in the population, ignoring their possible effects on the scores may compromise the fairness and the validity of the assessments. Also, response-style-induced errors of measurement can affect the reliability estimates and overestimate convergent validity by correlating higher with other Likert-scale-based measures. Conversely, it can attenuate the predictive power over non-Likert-based indicators, given that the scores contain more errors. This study compares the validity of the Big Five personality scores obtained: (1) ignoring the SDR and ACQ in graded-scale items (GSQ), (2) accounting for SDR and ACQ with a compensatory IRT model, and (3) using forced-choice blocks with a multi-unidimensional pairwise preference model (MUPP) variant for dominance items. The overall results suggest that ignoring SDR and ACQ offered the worst validity evidence, with a higher correlation between personality and SDR scores. The two remaining strategies have their own advantages and disadvantages. The results from the empirical reliability and the convergent validity analysis indicate that when modeling social desirability with graded-scale items, the SDR factor apparently captures part of the variance of the Agreeableness factor. On the other hand, the correlation between the corrected GSQ-based Openness to Experience scores, and the University Access Examination grades was higher than the one with the uncorrected GSQ-based scores, and considerably higher than that using the estimates from the forced-choice data. Conversely, the criterion-related validity of the Forced Choice Questionnaire (FCQ) scores was similar to the results found in meta-analytic studies, correlating higher with Conscientiousness. Nonetheless, the FCQ-scores had considerably lower reliabilities and would demand administering more blocks. Finally, the results are discussed, and some notes are provided for the treatment of SDR and ACQ in future studies.
Collapse
Affiliation(s)
- Rodrigo Schames Kreitchmann
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Francisco J Abad
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Vicente Ponsoda
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Maria Dolores Nieto
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | | |
Collapse
|
27
|
Bürkner PC, Schulte N, Holling H. On the Statistical and Practical Limitations of Thurstonian IRT Models. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2019; 79:827-854. [PMID: 31488915 PMCID: PMC6713979 DOI: 10.1177/0013164419832063] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.
Collapse
|
28
|
Seeking more solitude: Conceptualization, assessment, and implications of aloneliness. PERSONALITY AND INDIVIDUAL DIFFERENCES 2019. [DOI: 10.1016/j.paid.2019.05.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
29
|
Chen C, Wang W, Chiu MM, Ro S. Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items. JOURNAL OF EDUCATIONAL MEASUREMENT 2019. [DOI: 10.1111/jedm.12252] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
30
|
Lee P, Joo SH, Stark S, Chernyshenko OS. GGUM-RANK Statement and Person Parameter Estimation With Multidimensional Forced Choice Triplets. APPLIED PSYCHOLOGICAL MEASUREMENT 2019; 43:226-240. [PMID: 31019358 PMCID: PMC6463341 DOI: 10.1177/0146621618768294] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Historically, multidimensional forced choice (MFC) measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for interindividual comparisons. However, with the recent advent of item response theory (IRT) scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components in high-stake evaluation settings. This article aims to add to burgeoning methodological advances in MFC measurement by focusing on statement and person parameter recovery for the GGUM-RANK (generalized graded unfolding-RANK) IRT model. Markov chain Monte Carlo (MCMC) algorithm was developed for estimating GGUM-RANK statement and person parameters directly from MFC rank responses. In simulation studies, it was examined that how the psychometric properties of statements composing MFC items, test length, and sample size influenced statement and person parameter estimation; and it was explored for the benefits of measurement using MFC triplets relative to pairs. To demonstrate this methodology, an empirical validity study was then conducted using an MFC triplet personality measure. The results and implications of these studies for future research and practice are discussed.
Collapse
|
31
|
Walton KE, Cherkasova L, Roberts RD. On the Validity of Forced Choice Scores Derived From the Thurstonian Item Response Theory Model. Assessment 2019; 27:706-718. [PMID: 31007043 DOI: 10.1177/1073191119843585] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Forced choice (FC) measures may be a desirable alternative to single stimulus (SS) Likert items, which are easier to fake and can have associated response biases. However, classical methods of scoring FC measures lead to ipsative data, which have a number of psychometric problems. A Thurstonian item response theory (TIRT) model has been introduced as a way to overcome these issues, but few empirical validity studies have been conducted to ensure its effectiveness. This was the goal of the current three studies, which used FC measures of domains from popular personality frameworks including the Big Five and HEXACO, and both statement and adjective item stems. We computed TIRT and ipsative scores and compared their validity estimates. Convergent and discriminant validity of the scores were evaluated by correlating them with SS scores, and test-criterion validity evidence was evaluated by examining their relationships with meaningful outcomes. In all three studies, there was evidence for the convergent and test-criterion validity of the TIRT scores, though at times this was on par with the validity of the ipsative scores. The discriminant validity of the TIRT scores was problematic and was often worse than the ipsative scores.
Collapse
Affiliation(s)
| | | | - Richard D Roberts
- Research and Assessment Design (RAD): Science Solution, Philadelphia, PA, USA
| |
Collapse
|
32
|
Joo SH, Lee P, Stark S. Development of Information Functions and Indices for the GGUM-RANK Multidimensional Forced Choice IRT Model. JOURNAL OF EDUCATIONAL MEASUREMENT 2018. [DOI: 10.1111/jedm.12183] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
33
|
Persich MR, Bair JL, Steinemann B, Nelson S, Fetterman AK, Robinson MD. Hello darkness my old friend: preferences for darkness vary by neuroticism and co-occur with negative affect .. Cogn Emot 2018; 33:885-900. [PMID: 30058438 DOI: 10.1080/02699931.2018.1504746] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Metaphors frequently link negative affect with darkness and associations of this type have been established in several experimental paradigms. Given the ubiquity and strength of these associations, people who prefer dark to light may be more prone to negative emotional experiences and symptoms. A five study investigation (total N = 605) couches these ideas in a new theoretical framework and then examines them. Across studies, 1 in 4 people preferred the perceptual concept of dark over the perceptual concept of light. These dark-preferring people scored higher in neuroticism (Studies 1 and 2) and experienced greater depressive feelings in daily life (Study 3). Moreover, dark preferences shared a robust relationship with depressive symptoms (Study 4) as well as generalised anxiety symptoms (Study 5). The results provide novel insights into negative affectivity and extend conceptual metaphor theory in a way that is capable of making individual difference predictions.
Collapse
Affiliation(s)
| | - Jessica L Bair
- b Psychology , University of Minnesota , Minneapolis , MN , USA
| | | | | | - Adam K Fetterman
- c Psychology , University of Texas at El Paso , El Paso , TX , USA
| | | |
Collapse
|
34
|
Lee P, Lee S, Stark S. Examining validity evidence for multidimensional forced choice measures with different scoring approaches. PERSONALITY AND INDIVIDUAL DIFFERENCES 2018. [DOI: 10.1016/j.paid.2017.11.031] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
35
|
Wang WC, Qiu XL, Chen CW, Ro S, Jin KY. Item Response Theory Models for Ipsative Tests With Multidimensional Pairwise Comparison Items. APPLIED PSYCHOLOGICAL MEASUREMENT 2017; 41:600-613. [PMID: 29881107 PMCID: PMC5978479 DOI: 10.1177/0146621617703183] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
There is re-emerging interest in adopting forced-choice items to address the issue of response bias in Likert-type items for noncognitive latent traits. Multidimensional pairwise comparison (MPC) items are commonly used forced-choice items. However, few studies have been aimed at developing item response theory models for MPC items owing to the challenges associated with ipsativity. Acknowledging that the absolute scales of latent traits are not identifiable in ipsative tests, this study developed a Rasch ipsative model for MPC items that has desirable measurement properties, yields a single utility value for each statement, and allows for comparing psychological differentiation between and within individuals. The simulation results showed a good parameter recovery for the new model with existing computer programs. This article provides an empirical example of an ipsative test on work style and behaviors.
Collapse
Affiliation(s)
| | - Xue-Lan Qiu
- The Education University of Hong Kong, Hong Kong
| | | | | | - Kuan-Yu Jin
- The Education University of Hong Kong, Hong Kong
| |
Collapse
|
36
|
Xiao Y, Liu H, Li H. Integration of the Forced-Choice Questionnaire and the Likert Scale: A Simulation Study. Front Psychol 2017; 8:806. [PMID: 28572781 PMCID: PMC5435816 DOI: 10.3389/fpsyg.2017.00806] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 05/02/2017] [Indexed: 11/23/2022] Open
Abstract
The Thurstonian item response theory (IRT) model allows estimating the latent trait scores of respondents directly through their responses in forced-choice questionnaires. It solves a part of problems brought by the traditional scoring methods of this kind of questionnaires. However, the forced-choice designs may still have their own limitations: The model may encounter underidentification and non-convergence and the test may show low test reliability in simple test designs (e.g., test designs with only a small number of traits measured or short length). To overcome these weaknesses, the present study applied the Thurstonian IRT model and the Graded Response Model to a different test format that comprises both forced-choice blocks and Likert-type items. And the Likert items should have low social desirability. A Monte Carlo simulation study is used to investigate how the mixed response format performs under various conditions. Four factors are considered: the number of traits, test length, the percentage of Likert items, and the proportion of pairs composed of items keyed in opposite directions. Results reveal that the mixed response format can be superior to the forced-choice format, especially in simple designs where the latter performs poorly. Besides the number of Likert items needed is small. One point to note is that researchers need to choose Likert items cautiously as Likert items may bring other response biases to the test. Discussion and suggestions are given to construct personality tests that can resist faking as much as possible and have acceptable reliability.
Collapse
Affiliation(s)
- Yue Xiao
- School of Psychology, Beijing Normal UniversityBeijing, China
| | - Hongyun Liu
- School of Psychology, Beijing Normal UniversityBeijing, China.,Beijing Key Laboratory of Applied Experimental Psychology, School of Psychology, Beijing Normal UniversityBeijing, China
| | - Hui Li
- School of Psychology, Beijing Normal UniversityBeijing, China
| |
Collapse
|
37
|
Morillo D, Leenen I, Abad FJ, Hontangas P, de la Torre J, Ponsoda V. A Dominance Variant Under the Multi-Unidimensional Pairwise-Preference Framework: Model Formulation and Markov Chain Monte Carlo Estimation. APPLIED PSYCHOLOGICAL MEASUREMENT 2016; 40:500-516. [PMID: 29881066 PMCID: PMC5978637 DOI: 10.1177/0146621616662226] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Forced-choice questionnaires have been proposed as a way to control some response biases associated with traditional questionnaire formats (e.g., Likert-type scales). Whereas classical scoring methods have issues of ipsativity, item response theory (IRT) methods have been claimed to accurately account for the latent trait structure of these instruments. In this article, the authors propose the multi-unidimensional pairwise preference two-parameter logistic (MUPP-2PL) model, a variant within Stark, Chernyshenko, and Drasgow's MUPP framework for items that are assumed to fit a dominance model. They also introduce a Markov Chain Monte Carlo (MCMC) procedure for estimating the model's parameters. The authors present the results of a simulation study, which shows appropriate goodness of recovery in all studied conditions. A comparison of the newly proposed model with a Brown and Maydeu's Thurstonian IRT model led us to the conclusion that both models are theoretically very similar and that the Bayesian estimation procedure of the MUPP-2PL may provide a slightly better recovery of the latent space correlations and a more reliable assessment of the latent trait estimation errors. An application of the model to a real data set shows convergence between the two estimation procedures. However, there is also evidence that the MCMC may be advantageous regarding the item parameters and the latent trait correlations.
Collapse
Affiliation(s)
- Daniel Morillo
- Faculty of Psychology, Universidad
Autónoma de Madrid, Spain
| | - Iwin Leenen
- Instituto Nacional para la Evaluación de
la Educación, Mexico City, Mexico
| | | | | | | | - Vicente Ponsoda
- Faculty of Psychology, Universidad
Autónoma de Madrid, Spain
| |
Collapse
|