1
|
Guo Z, Wang D, Cai Y, Tu D. An Item Response Theory Model for Incorporating Response Times in Forced-Choice Measures. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2024; 84:450-480. [PMID: 38756463 PMCID: PMC11095319 DOI: 10.1177/00131644231171193] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Forced-choice (FC) measures have been widely used in many personality or attitude tests as an alternative to rating scales, which employ comparative rather than absolute judgments. Several response biases, such as social desirability, response styles, and acquiescence bias, can be reduced effectively. Another type of data linked with comparative judgments is response time (RT), which contains potential information concerning respondents' decision-making process. It would be challenging but exciting to combine RT into FC measures better to reveal respondents' behaviors or preferences in personality measurement. Given this situation, this study aims to propose a new item response theory (IRT) model that incorporates RT into FC measures to improve personality assessment. Simulation studies show that the proposed model can effectively improve the estimation accuracy of personality traits with the ancillary information contained in RT. Also, an application on a real data set reveals that the proposed model estimates similar but different parameter values compared with the conventional Thurstonian IRT model. The RT information can explain these differences.
Collapse
Affiliation(s)
| | - Daxun Wang
- Jiangxi Normal University, Nanchang, China
| | - Yan Cai
- Jiangxi Normal University, Nanchang, China
| | - Dongbo Tu
- Jiangxi Normal University, Nanchang, China
| |
Collapse
|
2
|
Tu N, Kumar LS, Joo S, Stark S. Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2024; 48:104-124. [PMID: 38585303 PMCID: PMC10993864 DOI: 10.1177/01466216241238741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios. Results indicated that the ICC method outperformed the M/M method, which was better than the M/S method, with the TCC method being the least effective. However, as the number of items "per dimension" and the percentage of anchor items increased, the differences between the ICC, M/M, and M/S methods decreased. Study implications and practical recommendations for MUPP linking, as well as limitations, are discussed.
Collapse
Affiliation(s)
- Naidan Tu
- University of South Florida, FL, USA
| | | | | | | |
Collapse
|
3
|
Valone ALY, Meade AW. Can Forced-Choice Response Format Reduce Faking of Socially Aversive Personality Traits? J Pers Assess 2024:1-13. [PMID: 38501713 DOI: 10.1080/00223891.2024.2326893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 02/20/2024] [Indexed: 03/20/2024]
Abstract
Self-report assessments are the standard for personality measurement, but motivated respondents are able to manipulate or fake their responses to typical Likert scale self-report. Although progress has been made in research seeking to reduce faking, most of it has focused on normative personality traits such as those measured by the five factor model. The measurement of socially aversive personality (e.g., the Dark Triad) is less well-researched. The negative aspects of socially aversive traits increase the opportunity and motivation of respondents to fake typical single-stimulus self-report assessments underscoring the need for faking resistant response formats. A possible way to reduce faking that has been explored in basic personality research is the use of the forced-choice response format. This study applied this method to socially aversive traits and illustrated best practices to create new multidimensional forced-choice and single-stimulus measures of socially aversive personality traits. Results indicated that participants were able to artificially alter their scores when asked to respond like an ideal job applicant, and counter to expectations, the forced-choice format did not decrease faking. Our results indicate that even when best practices are followed, forced-choice format is not a panacea for respondent faking.
Collapse
Affiliation(s)
- Amanda L Y Valone
- Personnel Decisions Research Institutes, LLC, Arlington, Virginia, USA
| | - Adam W Meade
- Department of Psychology, North Carolina State University, Raleigh, North Carolina, USA
| |
Collapse
|
4
|
Nie L, Xu P, Hu D. Multidimensional IRT for forced choice tests: A literature review. Heliyon 2024; 10:e26884. [PMID: 38449643 PMCID: PMC10915382 DOI: 10.1016/j.heliyon.2024.e26884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/11/2024] [Accepted: 02/21/2024] [Indexed: 03/08/2024] Open
Abstract
The Multidimensional Forced Choice (MFC) test is frequently utilized in non-cognitive evaluations because of its effectiveness in reducing response bias commonly associated with the conventional Likert scale. Nonetheless, it is critical to recognize that the MFC test generates ipsative data, a type of measurement that has been criticized due to its limited applicability for comparing individuals. Multidimensional item response theory (MIRT) models have recently sparked renewed interest among academics and professionals. This is largely due to the development of several models that make it easier to collect normative data from forced-choice tests. The paper introduces a modeling framework made up of three key components: response format, measurement model, and decision theory. Under this paradigm, four IRT models were chosen as examples. Following that, a comprehensive study is carried out to compare and characterize the parameter estimation techniques used in MFC-IRT models. This work then examines empirical research on the concept by analyzing three distinct domains: parameter invariance testing, computerized adaptive testing (CAT), and validity investigation. Finally, it is recommended that future research initiatives follow four distinct paths: modeling, parameter invariance testing, forced-choice CAT, and validity studies.
Collapse
Affiliation(s)
- Lei Nie
- School of Public Administration, East China Normal University, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, China
| | - Di Hu
- School of Education and Social Policy, Northwestern University, USA
| |
Collapse
|
5
|
Zheng C, Liu J, Li Y, Xu P, Zhang B, Wei R, Zhang W, Liu B, Huang J. A 2PLM-RANK multidimensional forced-choice model and its fast estimation algorithm. Behav Res Methods 2024:10.3758/s13428-023-02315-x. [PMID: 38409459 DOI: 10.3758/s13428-023-02315-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/28/2024]
Abstract
High-stakes non-cognitive tests frequently employ forced-choice (FC) scales to deter faking. To mitigate the issue of score ipsativity derived, many scoring models have been devised. Among them, the multi-unidimensional pairwise preference (MUPP) framework is a highly flexible and commonly used framework. However, the original MUPP model was developed for unfolding response process and can only handle paired comparisons. The present study proposes the 2PLM-RANK as a generalization of the MUPP model to accommodate dominance RANK format response. In addition, an improved stochastic EM (iStEM) algorithm is devised for more stable and efficient parameter estimation. Simulation results generally supported the efficiency and utility of the new algorithm in estimating the 2PLM-RANK when applied to both triplets and tetrads across various conditions. An empirical illustration with responses to a 24-dimensional personality test further supported the practicality of the proposed model. To further aid in the application of the new model, a user-friendly R package is also provided.
Collapse
Affiliation(s)
- Chanjin Zheng
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China.
| | - Juan Liu
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Yaling Li
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Bo Zhang
- School of Labor and Employment Relations and Department of Psychology, University of Illinois Urbana-Champaign, Champaign, USA
| | - Ran Wei
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Wenqing Zhang
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Boyang Liu
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Jing Huang
- Educational Psychology and Research Methodology, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
6
|
Wang Q, Zheng Y, Liu K, Cai Y, Peng S, Tu D. Item selection methods in multidimensional computerized adaptive testing for forced-choice items using Thurstonian IRT model. Behav Res Methods 2024; 56:600-614. [PMID: 36750522 DOI: 10.3758/s13428-022-02037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2022] [Indexed: 02/09/2023]
Abstract
Multidimensional computerized adaptive testing for forced-choice items (MFC-CAT) combines the benefits of multidimensional forced-choice (MFC) items and computerized adaptive testing (CAT) in that it eliminates response biases and reduces administration time. Previous studies that explored designs of MFC-CAT only discussed item selection methods based on the Fisher information (FI), which is known to perform unstably at early stages of CAT. This study proposes a set of new item selection methods based on the KL information for MFC-CAT (namely MFC-KI, MFC-KB, and MFC-KLP) based on the Thurstonian IRT (TIRT) model. Three simulation studies, including one based on real data, were conducted to compare the performance of the proposed KL-based item selection methods against the existing FI-based methods in three- and five-dimensional MFC-CAT scenarios with various test lengths and inter-trait correlations. Results demonstrate that the proposed KL-based item selection methods are feasible for MFC-CAT and generate acceptable trait estimation accuracy and uniformity of item pool usage. Among the three proposed methods, MFC-KB and MFC-KLP outperformed the existing FI-based item selection methods and resulted in the most accurate trait estimation and relatively even utilization of the item pool.
Collapse
Affiliation(s)
- Qin Wang
- Jiangxi Normal University, Nanchang, China
| | - Yi Zheng
- Arizonal State Univerity, Tempe, AZ, USA
| | - Kai Liu
- Jiangxi Normal University, Nanchang, China
| | - Yan Cai
- Jiangxi Normal University, Nanchang, China.
| | - Siwei Peng
- Jiangxi Normal University, Nanchang, China
| | - Dongbo Tu
- Jiangxi Normal University, Nanchang, China.
| |
Collapse
|
7
|
Frick S. Estimating and Using Block Information in the Thurstonian IRT Model. PSYCHOMETRIKA 2023; 88:1556-1589. [PMID: 37640828 PMCID: PMC10656335 DOI: 10.1007/s11336-023-09931-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Indexed: 08/31/2023]
Abstract
Multidimensional forced-choice (MFC) tests are increasing in popularity but their construction is complex. The Thurstonian item response model (Thurstonian IRT model) is most often used to score MFC tests that contain dominance items. Currently, in a frequentist framework, information about the latent traits in the Thurstonian IRT model is computed for binary outcomes of pairwise comparisons, but this approach neglects stochastic dependencies. In this manuscript, it is shown how to estimate Fisher information on the block level. A simulation study showed that the observed and expected standard errors based on the block information were similarly accurate. When local dependencies for block sizes [Formula: see text] were neglected, the standard errors were underestimated, except with the maximum a posteriori estimator. It is shown how the multidimensional block information can be summarized for test construction. A simulation study and an empirical application showed small differences between the block information summaries depending on the outcome considered. Thus, block information can aid the construction of reliable MFC tests.
Collapse
Affiliation(s)
- Susanne Frick
- University of Mannheim, Mannheim, Germany.
- TU Dortmund University, Dortmund, Germany.
| |
Collapse
|
8
|
Qiu X, de la Torre J. A dual process item response theory model for polytomous multidimensional forced-choice items. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2023; 76:491-512. [PMID: 36967236 DOI: 10.1111/bmsp.12303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/03/2023] [Indexed: 06/18/2023]
Abstract
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.
Collapse
Affiliation(s)
- Xuelan Qiu
- Institute for Learning Sciences & Teacher Education, Australian Catholic University, Brisbane, Queensland, Australia
| | | |
Collapse
|
9
|
Zhang B, Luo J, Li J. Moving beyond Likert and Traditional Forced-Choice Scales: A Comprehensive Investigation of the Graded Forced-Choice Format. MULTIVARIATE BEHAVIORAL RESEARCH 2023:1-27. [PMID: 37652572 DOI: 10.1080/00273171.2023.2235682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
The graded forced-choice (FC) format has recently emerged as an alternative that may preserve the advantages and overcome the issues of the dichotomous FC measures. The current study presented the first large-scale evaluation of the performance of three types of FC measures (FC2, FC4 and FC5 with 2, 4 and 5 response options, respectively) and compared their performance to their Likert (LK) counterparts (LK2, LK4, and LK5) on (1) psychometric properties, (2) respondent reactions, and (3) susceptibility to response styles. Results showed that, compared to LK measures with the same number of response options, the three FC scales provided better support for the hypothesized factor structure, were perceived as more faking-resistant and cognitive demanding, and were less susceptible to response styles. FC4/5 and LK4/5 demonstrated similarly good reliability, while LK2 provided more reliable scores than FC2. When compared across the three FC measures, FC4 and FC5 displayed comparable psychometric performance and respondent reactions. FC4 exhibited a moderate presence of extreme response style, while FC5 had a weak presence of both extreme and middle response styles. Based on these findings, the study recommends the use of graded FC over dichotomous FC and LK, particularly FC5 when extreme response style is a concern.
Collapse
Affiliation(s)
- Bo Zhang
- School of Labor and Employment Relations, University of Illinois Urbana-Champaign
- Department of Psychology, University of Illinois Urbana-Champaign
| | - Jing Luo
- Feinberg School of Medicine, Northwestern University
| | - Jian Li
- Faculty of Psychology, Beijing Normal University
| |
Collapse
|
10
|
Kreitchmann RS, Sorrel MA, Abad FJ. On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:294-321. [PMID: 36866066 PMCID: PMC9972126 DOI: 10.1177/00131644221087986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and (b) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.
Collapse
|
11
|
Joo SH, Lee P, Stark S. Modeling Multidimensional Forced Choice Measures with the Zinnes and Griggs Pairwise Preference Item Response Theory Model. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:241-261. [PMID: 34370564 DOI: 10.1080/00273171.2021.1960142] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This research developed a new ideal point-based item response theory (IRT) model for multidimensional forced choice (MFC) measures. We adapted the Zinnes and Griggs (ZG; 1974) IRT model and the multi-unidimensional pairwise preference (MUPP; Stark et al., 2005) model, henceforth referred to as ZG-MUPP. We derived the information function to evaluate the psychometric properties of MFC measures and developed a model parameter estimation algorithm using Markov chain Monte Carlo (MCMC). To evaluate the efficacy of the proposed model, we conducted a simulation study under various experimental conditions such as sample sizes, number of items, and ranges of discrimination and location parameters. The results showed that the model parameters were accurately estimated when the sample size was as low as 500. The empirical results also showed that the scores from the ZG-MUPP model were comparable to those from the MUPP model and the Thurstonian IRT (TIRT) model. Practical implications and limitations are further discussed.
Collapse
|
12
|
Huang HY. Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:146-180. [PMID: 36601255 PMCID: PMC9806518 DOI: 10.1177/00131644211069906] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.
Collapse
Affiliation(s)
- Hung-Yu Huang
- University of Taipei, Taiwan
- Hung-Yu Huang, Distinguished Professor,
Department of Psychology and Counseling, University of Taipei, No.1, Ai-Guo West
Road, Taipei, 10048, Taiwan.
| |
Collapse
|
13
|
Frick S, Brown A, Wetzel E. Investigating the Normativity of Trait Estimates from Multidimensional Forced-Choice Data. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:1-29. [PMID: 34464217 DOI: 10.1080/00273171.2021.1938960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The Thurstonian item response model (Thurstonian IRT model) allows deriving normative trait estimates from multidimensional forced-choice (MFC) data. In the MFC format, persons must rank-order items that measure different attributes according to how well the items describe them. This study evaluated the normativity of Thurstonian IRT trait estimates both in a simulation and empirically. The simulation investigated normativity and compared Thurstonian IRT trait estimates to those using classical partially ipsative scoring, from dichotomous true-false (TF) data and rating scale data. The results showed that, with blocks of opposite keyed items, Thurstonian IRT trait estimates were normative in contrast to classical partially ipsative estimates. Unbalanced numbers of items per trait, few opposite keyed items, traits correlated positively or assessing fewer traits did not decrease measurement precision markedly. Measurement precision was lower than that of rating scale data. The empirical study investigated whether relative MFC responses provide a better differentiation of behaviors within persons than absolute TF responses. However, criterion validity was equal and construct validity (with constructs measured by rating scales) lower in MFC. Thus, Thurstonian IRT modeling of MFC data overcomes the drawbacks of classical scoring, but gains in validity may depend on eliminating common method biases from the comparison.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, University of Mannheim
| | - Anna Brown
- Department of Psychology, University of Kent
| | - Eunike Wetzel
- Department of Psychology, Otto-von-Guericke University Magdeburg
- Department of Psychology, University of Koblenz-Landau
| |
Collapse
|
14
|
Frick S. Modeling Faking in the Multidimensional Forced-Choice Format: The Faking Mixture Model. PSYCHOMETRIKA 2022; 87:773-794. [PMID: 34927219 PMCID: PMC9166892 DOI: 10.1007/s11336-021-09818-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 08/31/2021] [Accepted: 10/15/2021] [Indexed: 06/14/2023]
Abstract
The multidimensional forced-choice (MFC) format has been proposed to reduce faking because items within blocks can be matched on desirability. However, the desirability of individual items might not transfer to the item blocks. The aim of this paper is to propose a mixture item response theory model for faking in the MFC format that allows to estimate the fakability of MFC blocks, termed the Faking Mixture model. Given current computing capabilities, within-subject data from both high- and low-stakes contexts are needed to estimate the model. A simulation showed good parameter recovery under various conditions. An empirical validation showed that matching was necessary but not sufficient to create an MFC questionnaire that can reduce faking. The Faking Mixture model can be used to reduce fakability during test construction.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, Mannheim, Germany.
| |
Collapse
|
15
|
Lee P, Joo SH, Zhou S, Son M. Investigating the impact of negatively keyed statements on multidimensional forced-choice personality measures: A comparison of partially ipsative and IRT scoring methods. PERSONALITY AND INDIVIDUAL DIFFERENCES 2022. [DOI: 10.1016/j.paid.2022.111555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Qiu XL, de la Torre J, Ro S, Wang WC. Computerized Adaptive Testing for Ipsative Tests with Multidimensional Pairwise-Comparison Items: Algorithm Development and Applications. APPLIED PSYCHOLOGICAL MEASUREMENT 2022; 46:255-272. [PMID: 35601264 PMCID: PMC9118927 DOI: 10.1177/01466216221084209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A computerized adaptive testing (CAT) solution for tests with multidimensional pairwise-comparison (MPC) items, aiming to measure career interest, value, and personality, is rare. This paper proposes new item selection and exposure control methods for CAT with dichotomous and polytomous MPC items and present simulation study results. The results show that the procedures are effective in selecting items and controlling within-person statement exposure with no loss of efficiency. Implications are discussed in two applications of the proposed CAT procedures: a work attitude test with dichotomous MPC items and a career interest assessment with polytomous MPC items.
Collapse
|
17
|
Bayesian paired comparison with the bpcs package. Behav Res Methods 2021; 54:2025-2045. [PMID: 34846675 PMCID: PMC9374650 DOI: 10.3758/s13428-021-01714-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/16/2021] [Indexed: 11/08/2022]
Abstract
This article introduces the bpcs R package (Bayesian Paired Comparison in Stan) and the statistical models implemented in the package. This package aims to facilitate the use of Bayesian models for paired comparison data in behavioral research. Bayesian analysis of paired comparison data allows parameter estimation even in conditions where the maximum likelihood does not exist, allows easy extension of paired comparison models, provides straightforward interpretation of the results with credible intervals, has better control of type I error, has more robust evidence towards the null hypothesis, allows propagation of uncertainties, includes prior information, and performs well when handling models with many parameters and latent variables. The bpcs package provides a consistent interface for R users and several functions to evaluate the posterior distribution of all parameters to estimate the posterior distribution of any contest between items and to obtain the posterior distribution of the ranks. Three reanalyses of recent studies that used the frequentist Bradley-Terry model are presented. These reanalyses are conducted with the Bayesian models of the bpcs package, and all the code used to fit the models, generate the figures, and the tables are available in the online appendix.
Collapse
|
18
|
A genetic algorithm for optimal assembly of pairwise forced-choice questionnaires. Behav Res Methods 2021; 54:1476-1492. [PMID: 34505277 PMCID: PMC9170671 DOI: 10.3758/s13428-021-01677-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2021] [Indexed: 11/26/2022]
Abstract
The use of multidimensional forced-choice questionnaires has been proposed as a means of improving validity in the assessment of non-cognitive attributes in high-stakes scenarios. However, the reduced precision of trait estimates in this questionnaire format is an important drawback. Accordingly, this article presents an optimization procedure for assembling pairwise forced-choice questionnaires while maximizing posterior marginal reliabilities. This procedure is performed through the adaptation of a known genetic algorithm (GA) for combinatorial problems. In a simulation study, the efficiency of the proposed procedure was compared with a quasi-brute-force (BF) search. For this purpose, five-dimensional item pools were simulated to emulate the real problem of generating a forced-choice personality questionnaire under the five-factor model. Three factors were manipulated: (1) the length of the questionnaire, (2) the relative item pool size with respect to the questionnaire’s length, and (3) the true correlations between traits. The recovery of the person parameters for each assembled questionnaire was evaluated through the squared correlation between estimated and true parameters, the root mean square error between the estimated and true parameters, the average difference between the estimated and true inter-trait correlations, and the average standard error for each trait level. The proposed GA offered more accurate trait estimates than the BF search within a reasonable computation time in every simulation condition. Such improvements were especially important when measuring correlated traits and when the relative item pool sizes were higher. A user-friendly online implementation of the algorithm was made available to the users.
Collapse
|
19
|
Abstract
Forced-choice (FC) assessments of noncognitive psychological constructs (e.g., personality, behavioral tendencies) are popular in high-stakes organizational testing scenarios (e.g., informing hiring decisions) due to their enhanced resistance against response distortions (e.g., faking good, impression management). The measurement precisions of FC assessment scores used to inform personnel decisions are of paramount importance in practice. Different types of reliability estimates are reported for FC assessment scores in current publications, while consensus on best practices appears to be lacking. In order to provide understanding and structure around the reporting of FC reliability, this study systematically examined different types of reliability estimation methods for Thurstonian IRT-based FC assessment scores: their theoretical differences were discussed, and their numerical differences were illustrated through a series of simulations and empirical studies. In doing so, this study provides a practical guide for appraising different reliability estimation methods for IRT-based FC assessment scores.
Collapse
|
20
|
Adaptive testing with the GGUM-RANK multidimensional forced choice model: Comparison of pair, triplet, and tetrad scoring. Behav Res Methods 2020; 52:761-772. [PMID: 31342469 DOI: 10.3758/s13428-019-01274-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Likert-type measures have been criticized in psychological assessment because they are vulnerable to response biases, including central tendency, acquiescence, leniency, halo, and socially desirable responding. As an alternative, multidimensional forced choice (MFC) testing has been proposed to address these concerns. A number of researchers have developed item response theory (IRT) models for MFC data and have examined latent trait estimation with tests of different dimensionality and length. Research has also explored the advantages of computerized adaptive testing (CAT) with MFC pair tests having as many as 25 dimensions, but there have been no published studies on CAT with MFC triplets or tetrads. Thus, in this research we aimed to address that issue. We used recently developed item information functions for an MFC ranking model to compare the benefits of CAT with MFC pair, triplet, and tetrad tests. A simulation study showed that CAT substantially outperformed nonadaptive testing for latent trait estimation across MFC formats. More importantly, CAT with MFC pairs provided estimation accuracy similar to or better than that from tests of equivalent numbers of nonadaptive MFC triplets. On the basis of these findings, implications and recommendations are further discussed for constructing MFC measures to use in psychological contexts.
Collapse
|
21
|
Lee P, Joo SH, Stark S. Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model. ORGANIZATIONAL RESEARCH METHODS 2020. [DOI: 10.1177/1094428120959822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.
Collapse
|
22
|
Lee H, Smith WZ. A Bayesian Random Block Item Response Theory Model for Forced-Choice Formats. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2020; 80:578-603. [PMID: 32425220 PMCID: PMC7221495 DOI: 10.1177/0013164419871659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Based on the framework of testlet models, the current study suggests the Bayesian random block item response theory (BRB IRT) model to fit forced-choice formats where an item block is composed of three or more items. To account for local dependence among items within a block, the BRB IRT model incorporated a random block effect into the response function and used a Markov Chain Monte Carlo procedure for simultaneous estimation of item and trait parameters. The simulation results demonstrated that the BRB IRT model performed well for the estimation of item and trait parameters and for screening those with relatively low scores on target traits. As found in the literature, the composition of item blocks was crucial for model performance; negatively keyed items were required for item blocks. The empirical application showed the performance of the BRB IRT model was equivalent to that of the Thurstonian IRT model. The potential advantage of the BRB IRT model as a base for more complex measurement models was also demonstrated by incorporating gender as a covariate into the BRB IRT model to explain response probabilities. Recommendations for the adoption of forced-choice formats were provided along with the discussion about using negatively keyed items.
Collapse
Affiliation(s)
- HyeSun Lee
- California State University Channel
Islands, Camarillo, CA, USA
| | - Weldon Z. Smith
- California State University Channel
Islands, Camarillo, CA, USA
| |
Collapse
|
23
|
Ng V, Lee P, Ho MHR, Kuykendall L, Stark S, Tay L. The Development and Validation of a Multidimensional Forced-Choice Format Character Measure: Testing the Thurstonian IRT Approach. J Pers Assess 2020; 103:224-237. [DOI: 10.1080/00223891.2020.1739056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Vincent Ng
- Department of Psychology, University of Houston, Houston, Texas
| | - Philseok Lee
- Department of Psychology, George Mason University, Fairfax, Virginia
| | - Moon-Ho Ringo Ho
- School of Humanities and Social Sciences, Nanyang Technological University, Singapore
| | - Lauren Kuykendall
- Department of Psychology, George Mason University, Fairfax, Virginia
| | - Stephen Stark
- Department of Psychology, University of South Florida, Tampa, Florida
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana
| |
Collapse
|
24
|
Pelt DHM, Van der Linden D, Dunkel CS, Born MP. The Motivation and Opportunity for Socially Desirable Responding Does Not Alter the General Factor of Personality. Assessment 2019; 28:1376-1396. [PMID: 31619053 PMCID: PMC8167912 DOI: 10.1177/1073191119880960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Socially desirable responding may affect the factor structure of personality questionnaires and may be one of the reasons for the common variance among personality traits. In this study, we test this hypothesis by investigating the influence of the motivational test-taking context (development vs. selection) and the opportunity to distort responses (forced-choice vs. Likert response format) on personality questionnaire scores. Data from real selection and assessment candidates (total N = 3,980) matched on gender, age, and educational level were used. Mean score differences were found between the selection and development groups, with smaller differences for the FC version. Yet, exploratory structural equation models showed that the overall factor structures as well as the general factor were highly similar across the four groups. Thus, although socially desirable responding may affect mean scores on personality traits, it does not appear to affect factor structures. This study further suggests that the common variance in personality questionnaires is consistent and appears to be little influenced by motivational pressures for response distortion.
Collapse
Affiliation(s)
- Dirk H M Pelt
- Erasmus University Rotterdam, Rotterdam, Netherlands.,Ixly, Utrecht, Netherlands
| | | | | | - Marise Ph Born
- Erasmus University Rotterdam, Rotterdam, Netherlands.,North-West University, Vanderbijlpark, South Africa
| |
Collapse
|
25
|
Kreitchmann RS, Abad FJ, Ponsoda V, Nieto MD, Morillo D. Controlling for Response Biases in Self-Report Scales: Forced-Choice vs. Psychometric Modeling of Likert Items. Front Psychol 2019; 10:2309. [PMID: 31681103 PMCID: PMC6803422 DOI: 10.3389/fpsyg.2019.02309] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 09/27/2019] [Indexed: 11/23/2022] Open
Abstract
One important problem in the measurement of non-cognitive characteristics such as personality traits and attitudes is that it has traditionally been made through Likert scales, which are susceptible to response biases such as social desirability (SDR) and acquiescent (ACQ) responding. Given the variability of these response styles in the population, ignoring their possible effects on the scores may compromise the fairness and the validity of the assessments. Also, response-style-induced errors of measurement can affect the reliability estimates and overestimate convergent validity by correlating higher with other Likert-scale-based measures. Conversely, it can attenuate the predictive power over non-Likert-based indicators, given that the scores contain more errors. This study compares the validity of the Big Five personality scores obtained: (1) ignoring the SDR and ACQ in graded-scale items (GSQ), (2) accounting for SDR and ACQ with a compensatory IRT model, and (3) using forced-choice blocks with a multi-unidimensional pairwise preference model (MUPP) variant for dominance items. The overall results suggest that ignoring SDR and ACQ offered the worst validity evidence, with a higher correlation between personality and SDR scores. The two remaining strategies have their own advantages and disadvantages. The results from the empirical reliability and the convergent validity analysis indicate that when modeling social desirability with graded-scale items, the SDR factor apparently captures part of the variance of the Agreeableness factor. On the other hand, the correlation between the corrected GSQ-based Openness to Experience scores, and the University Access Examination grades was higher than the one with the uncorrected GSQ-based scores, and considerably higher than that using the estimates from the forced-choice data. Conversely, the criterion-related validity of the Forced Choice Questionnaire (FCQ) scores was similar to the results found in meta-analytic studies, correlating higher with Conscientiousness. Nonetheless, the FCQ-scores had considerably lower reliabilities and would demand administering more blocks. Finally, the results are discussed, and some notes are provided for the treatment of SDR and ACQ in future studies.
Collapse
Affiliation(s)
- Rodrigo Schames Kreitchmann
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Francisco J Abad
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Vicente Ponsoda
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Maria Dolores Nieto
- Department of Social Psychology and Methodology, Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | | |
Collapse
|
26
|
Bürkner PC, Schulte N, Holling H. On the Statistical and Practical Limitations of Thurstonian IRT Models. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2019; 79:827-854. [PMID: 31488915 PMCID: PMC6713979 DOI: 10.1177/0013164419832063] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.
Collapse
|
27
|
Morillo D, Abad FJ, Kreitchmann RS, Leenen I, Hontangas P, Ponsoda V. The Journey from Likert to Forced-Choice Questionnaires: Evidence of the Invariance of Item Parameters. REVISTA DE PSICOLOGÍA DEL TRABAJO Y DE LAS ORGANIZACIONES 2019. [DOI: 10.5093/jwop2019a11] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
28
|
Lee P, Lee S, Stark S. Examining validity evidence for multidimensional forced choice measures with different scoring approaches. PERSONALITY AND INDIVIDUAL DIFFERENCES 2018. [DOI: 10.1016/j.paid.2017.11.031] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|