1
|
Classe F, Kern C. Detecting Differential Item Functioning in Multidimensional Graded Response Models With Recursive Partitioning. Appl Psychol Meas 2024; 48:83-103. [PMID: 38585304 PMCID: PMC10993862 DOI: 10.1177/01466216241238743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Differential item functioning (DIF) is a common challenge when examining latent traits in large scale surveys. In recent work, methods from the field of machine learning such as model-based recursive partitioning have been proposed to identify subgroups with DIF when little theoretical guidance and many potential subgroups are available. On this basis, we propose and compare recursive partitioning techniques for detecting DIF with a focus on measurement models with multiple latent variables and ordinal response data. We implement tree-based approaches for identifying subgroups that contribute to DIF in multidimensional latent variable modeling and propose a robust, yet scalable extension, inspired by random forests. The proposed techniques are applied and compared with simulations. We show that the proposed methods are able to efficiently detect DIF and allow to extract decision rules that lead to subgroups with well fitting models.
Collapse
Affiliation(s)
| | - Christoph Kern
- Ludwig-Maximilians-University of Munich, Munchen, Germany
| |
Collapse
|
2
|
Cho AE, Xiao J, Wang C, Xu G. Regularized Variational Estimation for Exploratory Item Factor Analysis. Psychometrika 2024; 89:347-375. [PMID: 35831697 DOI: 10.1007/s11336-022-09874-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 05/09/2022] [Accepted: 06/02/2022] [Indexed: 06/15/2023]
Abstract
Item factor analysis (IFA), also known as Multidimensional Item Response Theory (MIRT), is a general framework for specifying the functional relationship between respondents' multiple latent traits and their responses to assessment items. The key element in MIRT is the relationship between the items and the latent traits, so-called item factor loading structure. The correct specification of this loading structure is crucial for accurate calibration of item parameters and recovery of individual latent traits. This paper proposes a regularized Gaussian Variational Expectation Maximization (GVEM) algorithm to efficiently infer item factor loading structure directly from data. The main idea is to impose an adaptive L 1 -type penalty to the variational lower bound of the likelihood to shrink certain loadings to 0. This new algorithm takes advantage of the computational efficiency of GVEM algorithm and is suitable for high-dimensional MIRT applications. Simulation studies show that the proposed method accurately recovers the loading structure and is computationally efficient. The new method is also illustrated using the National Education Longitudinal Study of 1988 (NELS:88) mathematics and science assessment data.
Collapse
Affiliation(s)
- April E Cho
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA
| | - Jiaying Xiao
- College of Education, University of Washington, 312E Miller Hall, 2012 Skagit Ln, Seattle, WA, 98105, USA
| | - Chun Wang
- College of Education, University of Washington, 312E Miller Hall, 2012 Skagit Ln, Seattle, WA, 98105, USA.
| | - Gongjun Xu
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
3
|
Cui C, Wang C, Xu G. Variational Estimation for Multidimensional Generalized Partial Credit Model. Psychometrika 2024:10.1007/s11336-024-09955-8. [PMID: 38429494 DOI: 10.1007/s11336-024-09955-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 01/22/2024] [Indexed: 03/03/2024]
Abstract
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.
Collapse
Affiliation(s)
- Chengyu Cui
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA
| | - Chun Wang
- College of Education, University of Washington, 312 E Miller Hall, 2012 Skagit Lane, Seattle, WA, 98105, USA.
| | - Gongjun Xu
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
4
|
Ma C, Ouyang J, Wang C, Xu G. A Note on Improving Variational Estimation for Multidimensional Item Response Theory. Psychometrika 2024; 89:172-204. [PMID: 37979074 DOI: 10.1007/s11336-023-09939-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Indexed: 11/19/2023]
Abstract
Survey instruments and assessments are frequently used in many domains of social science. When the constructs that these assessments try to measure become multifaceted, multidimensional item response theory (MIRT) provides a unified framework and convenient statistical tool for item analysis, calibration, and scoring. However, the computational challenge of estimating MIRT models prohibits its wide use because many of the extant methods can hardly provide results in a realistic time frame when the number of dimensions, sample size, and test length are large. Instead, variational estimation methods, such as Gaussian variational expectation-maximization (GVEM) algorithm, have been recently proposed to solve the estimation challenge by providing a fast and accurate solution. However, results have shown that variational estimation methods may produce some bias on discrimination parameters during confirmatory model estimation, and this note proposes an importance-weighted version of GVEM (i.e., IW-GVEM) to correct for such bias under MIRT models. We also use the adaptive moment estimation method to update the learning rate for gradient descent automatically. Our simulations show that IW-GVEM can effectively correct bias with modest increase of computation time, compared with GVEM. The proposed method may also shed light on improving the variational estimation for other psychometrics models.
Collapse
Affiliation(s)
- Chenchen Ma
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA
| | - Jing Ouyang
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA
| | - Chun Wang
- College of Education, University of Washington, 312 E Miller Hall, 2012 Skagit Lane, Seattle, WA, 98105, USA.
| | - Gongjun Xu
- Department of Statistics, University of Michigan, 456 West Hall, 1085 South University, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
5
|
Wu J, Tao Z, Gao N, Shen J, Chen ZL, Zhou H, Zheng S. The Use of Multidimensional Nomial Logistic Model and Structural Equation Model in the Validation of the 14-Item Health-Literacy Scale in Chinese Patients Living with Type 2 Diabetes. Risk Manag Healthc Policy 2023; 16:1567-1579. [PMID: 37602365 PMCID: PMC10439802 DOI: 10.2147/rmhp.s419879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/04/2023] [Indexed: 08/22/2023] Open
Abstract
Objective To evaluate the psychometric properties of the 14-item health literacy scale (HL-14) in patients living with type 2 diabetes mellitus (T2DM) in clinical setting. Methods Cross-sectional study using item response theory and structural equation modeling (SEM) for testing the item difficulty and three dimensional-HL configurations was adopted in this study. Chinese patients living with T2DM admitted to endocrinology department of Huadong hospital were evaluated by the HL-14 including communication, functional and critical health literacy from August to December 2021. Results The multidimensional random coefficients multinomial logistic model indicated the difficulty settings of the scale are appropriate for the study populations, and differential item functioning was not observed for sex in the study. SEM demonstrated that the three-dimensional configuration of the scale is good in the study population (x2/df=2.698, Comparative Fit Index = 0.965, Root Mean Square Error of Approximation = 0.076, standard residual mean root = 0.042). Conclusion The HL-14 scale is a reliable and valid measurement, which can perform equitably across sex in evaluating the health literacy in Chinese patients living with T2DM. Moreover, the scale may help fill the gaps of multidimensional health literacy assessment and rapid screening of health literacy ability for clinical practice.
Collapse
Affiliation(s)
- JianBo Wu
- Department of Pharmacy, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| | - ZhuJun Tao
- Department of Pharmacy, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| | - NingZhou Gao
- Department of Pharmacy, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| | - Jie Shen
- Department of Pharmacy, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| | - Zhi-Long Chen
- Department of Pharmacy, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| | - HaiFeng Zhou
- Department of Pharmacy, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| | - SongBai Zheng
- Department of Geriatrics, Huadong Hospital, Fudan University, Shanghai, People’s Republic of China
| |
Collapse
|
6
|
Jin KY, Paulhus DL, Shih CL. A New Approach to Desirable Responding: Multidimensional Item Response Model of Overclaiming Data. Appl Psychol Meas 2023; 47:221-236. [PMID: 37113521 PMCID: PMC10126390 DOI: 10.1177/01466216231151704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A variety of approaches have been presented for assessing desirable responding in self-report measures. Among them, the overclaiming technique asks respondents to rate their familiarity with a large set of real and nonexistent items (foils). The application of signal detection formulas to the endorsement rates of real items and foils yields indices of (a) knowledge accuracy and (b) knowledge bias. This overclaiming technique reflects both cognitive ability and personality. Here, we develop an alternative measurement model based on multidimensional item response theory (MIRT). We report three studies demonstrating this new model's capacity to analyze overclaiming data. First, a simulation study illustrates that MIRT and signal detection theory yield comparable indices of accuracy and bias-although MIRT provides important additional information. Two empirical examples-one based on mathematical terms and one based on Chinese idioms-are then elaborated. Together, they demonstrate the utility of this new approach for group comparisons and item selection. The implications of this research are illustrated and discussed.
Collapse
Affiliation(s)
- Kuan-Yu Jin
- Hong Kong Examinations and Assessment
Authority, Hong Kong, Hong Kong
| | | | - Ching-Lin Shih
- University of British
Columbia, Vancouver, BC, Canada
- National Sun Yat-Sen
University, Kaohsiung, Taiwan
| |
Collapse
|
7
|
Lin Y, Brown A, Williams P. Multidimensional Forced-Choice CAT With Dominance Items: An Empirical Comparison With Optimal Static Testing Under Different Desirability Matching. Educ Psychol Meas 2023; 83:322-350. [PMID: 36866068 PMCID: PMC9972128 DOI: 10.1177/00131644221077637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.
Collapse
Affiliation(s)
- Yin Lin
- University of Kent, Canterbury,
UK
- SHL, Thames Ditton, Surrey, UK
| | | | | |
Collapse
|
8
|
Dong F, Moore TM, Westfall M, Kohler C, Calkins ME. Development of empirically derived brief program evaluation measures in Pennsylvania first-episode psychosis coordinated specialty care programs. Early Interv Psychiatry 2023; 17:96-106. [PMID: 35343055 DOI: 10.1111/eip.13298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 02/17/2022] [Accepted: 03/13/2022] [Indexed: 01/21/2023]
Abstract
AIM The Pennsylvania first episode psychosis program evaluation (PA-FEP-PE) core assessment battery was developed as a standard and comprehensive clinical assessment and data collection tool in Pennsylvania coordinated specialty care programs (CSC). To reduce administrative time and maximize clinical utility by maintaining acceptable levels of precision, we aimed to generate a short form using item response theory (IRT)-based computer-adaptive test (CAT) simulation and analyse the implementation and acceptability of the short form among providers from PA-CSC. METHODS FEP participants (n = 759; age 14-36) from nine coordinated specialty care programs completed 156 items drawn from the PA-FEP-PE battery. Multidimensional IRT-based CAT simulations were used to select the best PA-FEP-PE items for abbreviated forms. RESULTS A 67-item PA-FEP-PE short form was developed to capture six factors: (1) positive affect and surgency (with negative loadings on Anxious-Misery items); (2) psychiatric services satisfaction; (3) antipsychotic side effect severity; (4) family turmoil and associated traumas; (5) trauma load; and (6) psychosis. The total number of items was reduced more than 50% in the PA-FEP-PE shortened forms. The short form demonstrated good psychometric properties, and it was well accepted by our providers in the implementation. CONCLUSIONS The empirical derivation and implementation of abbreviated measures of key domains and constructs in FEP care have streamlined and facilitated PA-FEP program evaluation. Our work supports potential application of IRT based methods to empirically reduce core assessment battery measures in large-scale data collection efforts such as in the Early Psychosis Intervention Network.
Collapse
Affiliation(s)
- Fanghong Dong
- School of Nursing, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Tyler M Moore
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Lifespan Brain Institute, Penn Medicine and Children's Hospital of Philadelphia (CHOP), Philadelphia, Pennsylvania, USA
| | - Megan Westfall
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christian Kohler
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Monica E Calkins
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Lifespan Brain Institute, Penn Medicine and Children's Hospital of Philadelphia (CHOP), Philadelphia, Pennsylvania, USA
| |
Collapse
|
9
|
Abstract
In traditional test models, test items are independent, and test-takers slowly and thoughtfully respond to each test item. However, some test items have a common stimulus (dependent test items in a testlet), and sometimes test-takers lack motivation, knowledge, or time (speededness), so they perform rapid guessing (RG). Ignoring the dependence in responses to testlet items can negatively bias standard errors of measurement, and ignoring RG by fitting a simpler item response theory (IRT) model can bias the results. Because computer-based testing captures response times on testlet responses, we propose a mixture testlet IRT model with item responses and response time to model RG behaviors in computer-based testlet items. Two simulation studies with Markov chain Monte Carlo estimation using the JAGS program showed (a) good recovery of the item and person parameters in this new model and (b) the harmful consequences of ignoring RG (biased parameter estimates: overestimated item difficulties, underestimated time intensities, underestimated respondent latent speed parameters, and overestimated precision of respondent latent estimates). The application of IRT models with and without RG to data from a computer-based language test showed parameter differences resembling those in the simulations.
Collapse
Affiliation(s)
- Kuan-Yu Jin
- Assessment Technology and Research
Division, Hong Kong Examinations and Assessment
Authority, Wan Chai, Hong Kong
| | - Chia-Ling Hsu
- Assessment Technology and Research
Division, Hong Kong Examinations and Assessment
Authority, Wan Chai, Hong Kong
| | | | | |
Collapse
|
10
|
Svicher A, Palazzeschi L, Gori A, Di Fabio A. The Gratitude Resentment and Appreciation Test-Revised Short (GRAT-RS): A Multidimensional Item Response Theory Analysis in Italian Workers. Int J Environ Res Public Health 2022; 19:16786. [PMID: 36554667 PMCID: PMC9779112 DOI: 10.3390/ijerph192416786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]
Abstract
Gratitude is a promising resource from a healthy organizational perspective. It is related to many positive outcomes at work. The Gratitude Resentment and Appreciation Test-Revised Short (GRAT-RS) is the most widely used self-report questionnaire to detect gratitude. The present study examined GRAT-RS (the Italian version) by implementing multidimensional item response theory (MIRT) analyses to explore its psychometric properties. The participants were 537 Italian workers. Confirmatory factor analyses (CFA) of the GRAT-RS and MIRT analyses using the Grade Response Model were run. The MIRT discrimination and MIRT difficulty parameters were calculated. A test information function (TIF) and measure of reliability associated with (TIF) scores were also implemented. CFA highlighted that a bifactor model showed the best fit. Hence, MIRT analyses were carried out by implementing a bifactor model. The MIRT bifactor structure showed a good data fit with discrimination parameters ranging from good to excellent and adequate reliability. The good psychometric properties of GRAT-RS were confirmed, highlighting the questionnaire as a reliable tool to measure gratitude in Italian workers.
Collapse
Affiliation(s)
- Andrea Svicher
- Department of Education, Languages, Intercultures, Literatures and Psychology, (Psychology Section), University of Florence, 50135 Florence, Italy
| | - Letizia Palazzeschi
- Department of Education, Languages, Intercultures, Literatures and Psychology, (Psychology Section), University of Florence, 50135 Florence, Italy
| | - Alessio Gori
- Department of Health Sciences (Psychology Section), University of Florence, 50135 Florence, Italy
| | - Annamaria Di Fabio
- Department of Education, Languages, Intercultures, Literatures and Psychology, (Psychology Section), University of Florence, 50135 Florence, Italy
| |
Collapse
|
11
|
Kornely MJK, Kateri M. Asymptotic Posterior Normality of Multivariate Latent Traits in an IRT Model. Psychometrika 2022; 87:1146-1172. [PMID: 35149979 PMCID: PMC9433366 DOI: 10.1007/s11336-021-09838-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 12/12/2021] [Indexed: 06/14/2023]
Abstract
The asymptotic posterior normality (APN) of the latent variable vector in an item response theory (IRT) model is a crucial argument in IRT modeling approaches. In case of a single latent trait and under general assumptions, Chang and Stout (Psychometrika, 58(1):37-52, 1993) proved the APN for a broad class of latent trait models for binary items. Under the same setup, they also showed the consistency of the latent trait's maximum likelihood estimator (MLE). Since then, several modeling approaches have been developed that consider multivariate latent traits and assume their APN, a conjecture which has not been proved so far. We fill this theoretical gap by extending the results of Chang and Stout for multivariate latent traits. Further, we discuss the existence and consistency of MLEs, maximum a-posteriori and expected a-posteriori estimators for the latent traits under the same broad class of latent trait models.
Collapse
Affiliation(s)
- Mia J K Kornely
- Institute of Statistics, RWTH Aachen University, Aachen, Germany
| | - Maria Kateri
- Institute of Statistics, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
12
|
Kim KY. Item Response Theory True Score Equating for the Bifactor Model Under the Common-Item Nonequivalent Groups Design. Appl Psychol Meas 2022; 46:479-493. [PMID: 35991829 PMCID: PMC9382090 DOI: 10.1177/01466216221108995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Applying item response theory (IRT) true score equating to multidimensional IRT models is not straightforward due to the one-to-many relationship between a true score and latent variables. Under the common-item nonequivalent groups design, the purpose of the current study was to introduce two IRT true score equating procedures that adopted different dimension reduction strategies for the bifactor model. The first procedure, which was referred to as the integration procedure, linked the latent variable scales for the bifactor model and integrated out the specific factors from the item response function of the bifactor model. Then, IRT true score equating was applied to the marginalized bifactor model. The second procedure, which was referred to as the PIRT-based procedure, projected the specific dimensions onto the general dimension to obtain a locally dependent unidimensional IRT (UIRT) model and linked the scales of the UIRT model, followed by the application of IRT true score equating to the locally dependent UIRT model. Equating results obtained with the two equating procedures along with those obtained with the unidimensional three-parameter logistic (3PL) model were compared using both simulated and real data. In general, the integration and PIRT-based procedures provided equating results that were not practically different. Furthermore, the equating results produced by the two bifactor-based procedures became more accurate than the results returned by the 3PL model as tests became more multidimensional.
Collapse
Affiliation(s)
- Kyung Yong Kim
- Department of Educational Research
Methodology, University of North Carolina at
Greensboro, Greensboro, NC, USA
| |
Collapse
|
13
|
Strachan T, Cho UH, Ackerman T, Chen SH, de la Torre J, Ip EH. Evaluation of the Linear Composite Conjecture for Unidimensional IRT Scale for Multidimensional Responses. Appl Psychol Meas 2022; 46:347-360. [PMID: 35812816 PMCID: PMC9265490 DOI: 10.1177/01466216221084218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The linear composite direction represents, theoretically, where the unidimensional scale would lie within a multidimensional latent space. Using compensatory multidimensional IRT, the linear composite can be derived from the structure of the items and the latent distribution. The purpose of this study was to evaluate the validity of the linear composite conjecture and examine how well a fitted unidimensional IRT model approximates the linear composite direction in a multidimensional latent space. Simulation experiment results overall show that the fitted unidimensional IRT model sufficiently approximates linear composite direction when correlation between bivariate latent variables is positive. When the correlation between bivariate latent variables is negative, instability occurs when the fitted unidimensional IRT model is used to approximate linear composite direction. A real data experiment was also conducted using 20 items from a multiple-choice mathematics test from American College Testing.
Collapse
Affiliation(s)
- Tyler Strachan
- Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC, USA
| | - Uk Hyun Cho
- Educational Research Methodology, University of North Carolina at Greensboro, Greensboro, NC, USA
| | | | - Shyh-Huei Chen
- Department on Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jimmy de la Torre
- Department of Education, The University of Hong Kong, Pokfulam, Hong Kong
| | - Edward H. Ip
- Department on Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA
| |
Collapse
|
14
|
Luo H, Wang D, Guo Z, Cai Y, Tu D. Combining Cognitive Diagnostic Computerized Adaptive Testing With Multidimensional Item Response Theory. Appl Psychol Meas 2022; 46:288-302. [PMID: 35601262 PMCID: PMC9118931 DOI: 10.1177/01466216221084214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The new generation of tests not only focuses on the general ability but also the process of finer-grained skills. Under the guidance of this thought, researchers have developed a dual-purpose CD-CAT (Dual-CAT). In the existing Dual-CAT, the models used in overall ability estimation are unidimensional IRT models, which cannot apply to the multidimensional tests. This article intends to develop a multidimensional Dual-CAT to improve its applicability. To achieve this goal, this article firstly proposes some item selection methods for the multidimensional Dual-CAT, and then verifies the estimation accuracy and exposure rate of these methods through both simulation study and a real item bank study. The results show that the established multidimensional Dual-CAT is effective and the new proposed methods outperform the traditional methods. Finally, this article discusses the future direction of the Dual-CAT.
Collapse
Affiliation(s)
- Hao Luo
- School of Psychology, Jiangxi normal university, Nanchang, China
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
| | - Daxun Wang
- School of Psychology, Jiangxi normal university, Nanchang, China
| | - Zhiming Guo
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
| | - Yan Cai
- School of Psychology, Jiangxi normal university, Nanchang, China
| | - Dongbo Tu
- School of Psychology, Jiangxi normal university, Nanchang, China
| |
Collapse
|
15
|
Ferrando PJ, Navarro-González D. A Multidimensional Item Response Theory Model for Continuous and Graded Responses With Error in Persons and Items. Educ Psychol Meas 2021; 81:1029-1053. [PMID: 34552274 PMCID: PMC8451022 DOI: 10.1177/0013164421998412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Item response theory "dual" models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis.
Collapse
|
16
|
Soland J, Kuhfeld M. Do Response Styles Affect Estimates of Growth on Social-Emotional Constructs? Evidence from Four Years of Longitudinal Survey Scores. Multivariate Behav Res 2021; 56:853-873. [PMID: 32633574 DOI: 10.1080/00273171.2020.1778440] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Survey respondents employ different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest. For example, respondents may be more likely to use the extremes of the response scale independent of their true score. Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data. While some initial studies examine the effect of response styles on survey scores in longitudinal analyses, the issue of how response styles affect estimates of growth is underexamined. In this study, we conducted empirical and simulation analyses in which we scored surveys using item response theory (IRT) models that do and do not account for response styles, and then used those different scores in growth models and compared results. Generally, we found that response styles can affect estimates of growth parameters including the slope, but that the effects vary by psychological construct, response style, and IRT model used.
Collapse
Affiliation(s)
- James Soland
- Curry School of Education and Human Development, University of Virginia
- NWEA
| | | |
Collapse
|
17
|
Abstract
Reading subskills are generally regarded as continuous variables, while most models used in the previous reading diagnoses have the hypothesis that the latent variables are dichotomous. Considering that the multidimensional item response theory (MIRT) model has continuous latent variables and can be used for diagnostic purposes, this study compared the performances of MIRT with two representatives of traditionally widely used models in reading diagnoses [reduced reparametrized unified model (R-RUM) and generalized deterministic, noisy, and gate (G-DINA)]. The comparison was carried out with both empirical and simulated data. First, model-data fit indices were used to evaluate whether MIRT was more appropriate than R-RUM and G-DINA with real data. Then, with the simulated data, relations between the estimated scores from MIRT, R-RUM, and G-DINA and the true scores were compared to examine whether the true abilities were well-represented, correct classification rates under different research conditions for MIRT, R-RUM, and G-DINA were calculated to examine the person parameter recovery, and the frequency distributions of subskill mastery probability were also compared to show the deviation of the estimated subskill mastery probabilities from the true values in the general value distribution. The MIRT obtained better model-data fit, gained estimated scores being a more reasonable representation for the true abilities, had an advantage on correct classification rates, and showed less deviation from the true values in frequency distributions of subskill mastery probabilities, which means it can produce more accurate diagnostic information about the reading abilities of the test-takers. Considering that more accurate diagnostic information has greater guiding value for the remedial teaching and learning, and in reading diagnoses, the score interpretation will be more reasonable with the MIRT model, this study recommended MIRT as a new methodology for future reading diagnostic analyses.
Collapse
Affiliation(s)
- Hui Liu
- Faculty of Linguistic Sciences, Beijing Language and Culture University, Beijing, China.,Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| | - Yufang Bian
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| |
Collapse
|
18
|
Abstract
The use of multidimensional forced-choice questionnaires has been proposed as a means of improving validity in the assessment of non-cognitive attributes in high-stakes scenarios. However, the reduced precision of trait estimates in this questionnaire format is an important drawback. Accordingly, this article presents an optimization procedure for assembling pairwise forced-choice questionnaires while maximizing posterior marginal reliabilities. This procedure is performed through the adaptation of a known genetic algorithm (GA) for combinatorial problems. In a simulation study, the efficiency of the proposed procedure was compared with a quasi-brute-force (BF) search. For this purpose, five-dimensional item pools were simulated to emulate the real problem of generating a forced-choice personality questionnaire under the five-factor model. Three factors were manipulated: (1) the length of the questionnaire, (2) the relative item pool size with respect to the questionnaire’s length, and (3) the true correlations between traits. The recovery of the person parameters for each assembled questionnaire was evaluated through the squared correlation between estimated and true parameters, the root mean square error between the estimated and true parameters, the average difference between the estimated and true inter-trait correlations, and the average standard error for each trait level. The proposed GA offered more accurate trait estimates than the BF search within a reasonable computation time in every simulation condition. Such improvements were especially important when measuring correlated traits and when the relative item pool sizes were higher. A user-friendly online implementation of the algorithm was made available to the users.
Collapse
|
19
|
Abstract
An increased use of models for measuring response styles is apparent in recent years with the multidimensional nominal response model (MNRM) as one prominent example. Inclusion of latent constructs representing extreme (ERS) or midpoint response style (MRS) often improves model fit according to information criteria. However, a test of absolute model fit is often not reported even though it could comprise an important piece of validity evidence. Limited information test statistics are candidates for this task, including the full (M2), ordinal (M2*), and mixed (C2) statistics, which differ in whether additional collapsing of univariate or bivariate contingency tables is conducted. Such collapsing makes sense when item categories are ordinal, which may not hold under the MNRM. More generally, limited information test statistics have gone unevaluated under nominal data and non-ordinal latent trait models. We present a simulation study evaluating the performance of M2, M2*, and C2 with the MNRM. Manipulated conditions included sample size, presence and type of response style, and strength of item slopes on substantive and style dimensions. We found that M2 sometimes had inflated Type I error rates, M2* always had little power, and C2 lacked power under some conditions. M2 and C2 may provide complementary and valuable information regarding model fit.
Collapse
|
20
|
Cho AE, Wang C, Zhang X, Xu G. Gaussian variational estimation for multidimensional item response theory. Br J Math Stat Psychol 2021; 74 Suppl 1:52-85. [PMID: 33064318 DOI: 10.1111/bmsp.12219] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 07/30/2020] [Indexed: 06/11/2023]
Abstract
Multidimensional item response theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying a functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that the likelihood involves intractable multidimensional integrals due to the latent variable structure. Various methods have been proposed that involve either direct numerical approximations to the integrals or Monte Carlo simulations. However, these methods are known to be computationally demanding in high dimensions and rely on sampling data points from a posterior distribution. We propose a new Gaussian variational expectation--maximization (GVEM) algorithm which adopts variational inference to approximate the intractable marginal likelihood by a computationally feasible lower bound. In addition, the proposed algorithm can be applied to assess the dimensionality of the latent traits in an exploratory analysis. Simulation studies are conducted to demonstrate the computational efficiency and estimation precision of the new GVEM algorithm compared to the popular alternative Metropolis-Hastings Robbins-Monro algorithm. In addition, theoretical results are presented to establish the consistency of the estimator from the new GVEM algorithm.
Collapse
Affiliation(s)
- April E Cho
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Chun Wang
- College of Education, University of Washington, Seattle, Washington, USA
| | - Xue Zhang
- China Institute of Rural Education Development, Northeast Normal University, Changchun, China
| | - Gongjun Xu
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
21
|
Gil-Llario MD, Castro-Calvo J, Fernández-García O, Elipe-Miravet M, Ballester-Arnal R. Estimating sexual knowledge of people with mild intellectual disability through a valid and reliable assessment scale: The ISK-ID. J Appl Res Intellect Disabil 2021; 35:988-1000. [PMID: 34132002 DOI: 10.1111/jar.12909] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 04/03/2021] [Accepted: 04/28/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Despite the relevance of assessing sexual knowledge in people with Intellectual Disability, there is a lack of appropriate assessment tools to measure this domain. The current study tests the psychometric properties of the new 'Inventory of Sexual Knowledge of people with Intellectual Disability' (ISK-ID). METHOD 345 individuals with mild intellectual disability completed the ISK-ID before and after the implementation of a sexual education program. Psychometric properties of the ISK-ID were analysed according to Multidimensional Item Response Theory (MIRT). RESULTS Its underlying factorial structure, along with parameters derived from the MIRT (item discrimination, difficulty, and participant's ability), support the use of the ISK-ID as a measure of sexual knowledge. Moreover, the ISK-ID was able to detect changes in the level of sexual knowledge resulting from educational interventions (i.e., responsiveness). CONCLUSIONS The ISK-ID is an appropriate assessment tool to measure sexual knowledge in men and women with mild intellectual disability.
Collapse
Affiliation(s)
- Mª Dolores Gil-Llario
- Department of Developmental and Educational Psychology, University of Valencia, Valencia, Spain
| | - Jesus Castro-Calvo
- Department of Personality, Assessment, and Psychological Treatments, Faculty of Psychology, University of Valencia, Valencia, Spain
| | - Olga Fernández-García
- Department of Developmental and Educational Psychology, University of Valencia, Valencia, Spain
| | - Marcel Elipe-Miravet
- Department of Basic and Clinical Psychology and Psychobiology, Jaume I University, Castello de la Plana, Spain
| | - Rafael Ballester-Arnal
- Department of Basic and Clinical Psychology and Psychobiology, Jaume I University, Castello de la Plana, Spain
| |
Collapse
|
22
|
Zhan P, Jiao H, Man K, Wang WC, He K. Variable Speed Across Dimensions of Ability in the Joint Model for Responses and Response Times. Front Psychol 2021; 12:469196. [PMID: 33854454 PMCID: PMC8039373 DOI: 10.3389/fpsyg.2021.469196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 03/01/2021] [Indexed: 11/19/2022] Open
Abstract
Working speed as a latent variable reflects a respondent’s efficiency to apply a specific skill, or a piece of knowledge to solve a problem. In this study, the common assumption of many response time models is relaxed in which respondents work with a constant speed across all test items. It is more likely that respondents work with different speed levels across items, in specific when these items measure different dimensions of ability in a multidimensional test. Multiple speed factors are used to model the speed process by allowing speed to vary across different domains of ability. A joint model for multidimensional abilities and multifactor speed is proposed. Real response time data are analyzed with an exploratory factor analysis as an example to uncover the complex structure of working speed. The feasibility of the proposed model is examined using simulation data. An empirical example with responses and response times is presented to illustrate the proposed model’s applicability and rationality.
Collapse
Affiliation(s)
- Peida Zhan
- Zhejiang Normal University, Jinhua, China
| | - Hong Jiao
- University of Maryland, College Park, MD, United States
| | - Kaiwen Man
- University of Alabama, Tuscaloosa, AL, United States
| | - Wen-Chung Wang
- The Education University of Hong Kong, Tai Po, Hong Kong
| | - Keren He
- Zhejiang Normal University, Jinhua, China
| |
Collapse
|
23
|
Zhou S, Huggins-Manley AC. The Performance of the Semigeneralized Partial Credit Model for Handling Item-Level Missingness. Educ Psychol Meas 2020; 80:1196-1215. [PMID: 33116332 PMCID: PMC7565116 DOI: 10.1177/0013164420918392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The semi-generalized partial credit model (Semi-GPCM) has been proposed as a unidimensional modeling method for handling not applicable scale responses and neutral scale responses, and it has been suggested that the model may be of use in handling missing data in scale items. The purpose of this study is to evaluate the ability of the unidimensional Semi-GPCM to aid in the recovery of person parameters from item response data in the presence of item-level missingness, and to compare the performance of the model with two other proposed methods for handling such missingness: a multidimensional modeling approach for missingness and full information maximum likelihood estimation. The results indicate that the Semi-GPCM performs acceptably in an absolute sense when less than 30% of the item data is missing but does not outperform the other two methods under any particular conditions. We conclude with a discussion about when practitioners may or may not want to use the Semi-GPCM to recover person parameters from item response data with missingness.
Collapse
|
24
|
Gibbons RD, Kupfer DJ, Frank E, Lahey BB, George-Milford BA, Biernesser CL, Porta G, Moore TL, Kim JB, Brent DA. Computerized Adaptive Tests for Rapid and Accurate Assessment of Psychopathology Dimensions in Youth. J Am Acad Child Adolesc Psychiatry 2020; 59:1264-1273. [PMID: 31465832 PMCID: PMC7042076 DOI: 10.1016/j.jaac.2019.08.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 07/24/2019] [Accepted: 08/19/2019] [Indexed: 12/26/2022]
Abstract
OBJECTIVE At least half of youths with mental disorders are unrecognized and untreated. Rapid, accurate assessment of child mental disorders could facilitate identification and referral and potentially reduce the occurrence of functional disability that stems from early-onset mental disorders. METHOD Computerized adaptive tests (CATs) based on multidimensional item response theory were developed for depression, anxiety, mania/hypomania, attention-deficit/hyperactivity disorder, conduct disorder, oppositional defiant disorder, and suicidality, based on parent and child ratings of 1,060 items each. In phase 1, CATs were developed from 801 participants. In phase 2, predictive, discriminant, and convergent validity were tested against semi-structured research interviews for diagnoses and suicidality in 497 patients and 104 healthy controls. Overall strength of association was determined by area under the receiver operating characteristic curve (AUC). RESULTS The child and parent independently completed the Kiddie-Computerized Adaptive Tests (K-CATs) in a median time of 7.56 and 5.03 minutes, respectively, with an average of 7 items per domain. The K-CATs accurately captured the presence of diagnoses (AUCs from 0.83 for generalized anxiety disorder to 0.92 for major depressive disorder) and suicidal ideation (AUC = 0.996). Strong correlations with extant measures were found (r ≥ 0.60). Test-retest reliability averaged r = 0.80. CONCLUSION These K-CATs provide a new approach to child psychopathology screening and measurement. Testing can be completed by child and parent in less than 8 minutes and yields results that are highly convergent with much more time-consuming structured clinical interviews and dimensional severity assessment and measurement. Testing of the implementation of the K-CAT is now indicated.
Collapse
Affiliation(s)
| | | | - Ellen Frank
- University of Pittsburgh School of Medicine, PA
| | | | | | - Candice L. Biernesser
- UPMC Western Psychiatric Hospital and the University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA
| | | | | | - Jong Bae Kim
- Center for Health Statistics, University of Chicago, IL
| | - David A. Brent
- University of Pittsburgh School of Medicine, PA.,UPMC Western Psychiatric Hospital, Pittsburgh, PA
| |
Collapse
|
25
|
Chalmers RP. Partially and Fully Noncompensatory Response Models for Dichotomous and Polytomous Items. Appl Psychol Meas 2020; 44:415-430. [PMID: 32788814 PMCID: PMC7383690 DOI: 10.1177/0146621620909898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article extends Sympson's partially noncompensatory dichtomous response model to ordered response data, and introduces a set of fully noncompensatory models for dichotomous and polytomous response data. The theoretical properties of the partially and fully noncompensatory response models are contrasted, and a small set of Monte Carlo simulations are presented to evaluate their parameter recovery performance. Results indicate that the respective models fit the data similarly when correctly matched to their respective population generating model. The fully noncompensatory models, however, demonstrated lower sampling variability and smaller degrees of bias than the partially noncompensatory counterparts. Based on the theoretical properties and empirical performance, it is argued that the fully noncompensatory models should be considered in item response theory applications when investigating conjunctive response processes.
Collapse
|
26
|
Strachan T, Ip E, Fu Y, Ackerman T, Chen SH, Willse J. Robustness of Projective IRT to Misspecification of the Underlying Multidimensional Model. Appl Psychol Meas 2020; 44:362-375. [PMID: 32879536 PMCID: PMC7433385 DOI: 10.1177/0146621620909894] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
As a method to derive a "purified" measure along a dimension of interest from response data that are potentially multidimensional in nature, the projective item response theory (PIRT) approach requires first fitting a multidimensional item response theory (MIRT) model to the data before projecting onto a dimension of interest. This study aims to explore how accurate the PIRT results are when the estimated MIRT model is misspecified. Specifically, we focus on using a (potentially misspecified) two-dimensional (2D)-MIRT for projection because of its advantages, including interpretability, identifiability, and computational stability, over higher dimensional models. Two large simulation studies (I and II) were conducted. Both studies examined whether the fitting of a 2D-MIRT is sufficient to recover the PIRT parameters when multiple nuisance dimensions exist in the test items, which were generated, respectively, under compensatory MIRT and bifactor models. Various factors were manipulated, including sample size, test length, latent factor correlation, and number of nuisance dimensions. The results from simulation studies I and II showed that the PIRT was overall robust to a misspecified 2D-MIRT. Smaller third and fourth simulation studies were done to evaluate recovery of the PIRT model parameters when the correctly specified higher dimensional MIRT or bifactor model was fitted with the response data. In addition, a real data set was used to illustrate the robustness of PIRT.
Collapse
Affiliation(s)
| | - Edward Ip
- Wake Forest School of Medicine,
Winston-Salem, NC, USA
| | - Yanyan Fu
- Graduate Management Admission Council,
Reston, VA, USA
| | | | | | - John Willse
- The University of North Carolina at
Greensboro, USA
| |
Collapse
|
27
|
Falk CF, Ju U. Estimation of Response Styles Using the Multidimensional Nominal Response Model: A Tutorial and Comparison With Sum Scores. Front Psychol 2020; 11:72. [PMID: 32116902 PMCID: PMC7017717 DOI: 10.3389/fpsyg.2020.00072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 01/10/2020] [Indexed: 11/16/2022] Open
Abstract
Recent years have seen a dramatic increase in item response models for measuring response styles on Likert-type items. These model-based approaches stand in contrast to traditional sum-score-based methods where researchers count the number of times that participants selected certain response options. The multidimensional nominal response model (MNRM) offers a flexible model-based approach that may be intuitive to those familiar with sum score approaches. This paper presents a tutorial on the model along with code for estimating it using three different software packages: flexMIRT®, mirt, and Mplus. We focus on specification and interpretation of response functions. In addition, we provide analytical details on how sum score to scale score conversion can be done with the MNRM. In the context of a real data example, three different scoring approaches are then compared. This example illustrates how sum-score-based approaches can sometimes yield scores that are confounded with substantive content. We expect that the current paper will facilitate further investigations as to whether different substantive conclusions are reached under alternative approaches to measuring response styles.
Collapse
Affiliation(s)
- Carl F Falk
- Department of Psychology, McGill University, Montreal, QC, Canada
| | - Unhee Ju
- Riverside Insights, Itasca, IL, United States
| |
Collapse
|
28
|
Abstract
A theoretical and conceptual framework for true-score equating using a simple-structure multidimensional item response theory (SS-MIRT) model is developed. A true-score equating method, referred to as the SS-MIRT true-score equating (SMT) procedure, also is developed. SS-MIRT has several advantages over other complex multidimensional item response theory models including improved efficiency in estimation and straightforward interpretability. The performance of the SMT procedure was examined and evaluated through four studies using different data types. In these studies, results from the SMT procedure were compared with results from four other equating methods to assess the relative benefits of SMT compared with the other procedures. In general, SMT showed more accurate equating results compared with the traditional unidimensional IRT (UIRT) equating when the data were multidimensional. More accurate performance of SMT over UIRT true-score equating was consistently observed across the studies, which supports the benefits of a multidimensional approach in equating for multidimensional data. Also, SMT performed similarly to a SS-MIRT observed score method across all studies.
Collapse
Affiliation(s)
- Stella Y. Kim
- University of North Carolina at Charlotte, Charlotte, NC, USA
| | | | | |
Collapse
|
29
|
Zhang S, Chen Y, Liu Y. An improved stochastic EM algorithm for large-scale full-information item factor analysis. Br J Math Stat Psychol 2020; 73:44-71. [PMID: 30511445 DOI: 10.1111/bmsp.12153] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 09/05/2018] [Indexed: 06/09/2023]
Abstract
In this paper, we explore the use of the stochastic EM algorithm (Celeux & Diebolt (1985) Computational Statistics Quarterly, 2, 73) for large-scale full-information item factor analysis. Innovations have been made on its implementation, including an adaptive-rejection-based Gibbs sampler for the stochastic E step, a proximal gradient descent algorithm for the optimization in the M step, and diagnostic procedures for determining the burn-in size and the stopping of the algorithm. These developments are based on the theoretical results of Nielsen (2000, Bernoulli, 6, 457), as well as advanced sampling and optimization techniques. The proposed algorithm is computationally efficient and virtually tuning-free, making it scalable to large-scale data with many latent traits (e.g. more than five latent traits) and easy to use for practitioners. Standard errors of parameter estimation are also obtained based on the missing-information identity (Louis, 1982, Journal of the Royal Statistical Society, Series B, 44, 226). The performance of the algorithm is evaluated through simulation studies and an application to the analysis of the IPIP-NEO personality inventory. Extensions of the proposed algorithm to other latent variable models are discussed.
Collapse
Affiliation(s)
- Siliang Zhang
- Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| | - Yunxiao Chen
- Department of Statistics, London School of Economics and Political Science, London, UK
| | - Yang Liu
- Department of Human Development and Quantitative Methodology, University of Maryland, College Park MD
| |
Collapse
|
30
|
Zhang J, Lu J, Chen F, Tao J. Exploring the Correlation Between Multiple Latent Variables and Covariates in Hierarchical Data Based on the Multilevel Multidimensional IRT Model. Front Psychol 2019; 10:2387. [PMID: 31708833 PMCID: PMC6823212 DOI: 10.3389/fpsyg.2019.02387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 10/07/2019] [Indexed: 11/13/2022] Open
Abstract
In many large-scale tests, it is very common that students are nested within classes or schools and that the test designers try to measure their multidimensional latent traits (e.g., logical reasoning ability and computational ability in the mathematics test). It is particularly important to explore the influences of covariates on multiple abilities for development and improvement of educational quality monitoring mechanism. In this study, motivated by a real dataset of a large-scale English achievement test, we will address how to construct an appropriate multilevel structural models to fit the data in many of multilevel models, and what are the effects of gender and socioeconomic-status differences on English multidimensional abilities at the individual level, and how does the teachers' satisfaction and school climate affect students' English abilities at the school level. A full Gibbs sampling algorithm within the Markov chain Monte Carlo (MCMC) framework is used for model estimation. Moreover, a unique form of the deviance information criterion (DIC) is used as a model comparison index. In order to verify the accuracy of the algorithm estimation, two simulations are considered in this paper. Simulation studies show that the Gibbs sampling algorithm works well in estimating all model parameters across a broad spectrum of scenarios, which can be used to guide the real data analysis. A brief discussion and suggestions for further research are shown in the concluding remarks.
Collapse
Affiliation(s)
- Jiwei Zhang
- School of Mathematics and Statistics, Yunnan University, Kunming, China
| | - Jing Lu
- School of Mathematics and Statistics, Northeast Normal University, Changchun, China
| | - Feng Chen
- Department of East Asian Studies, The University of Arizona, Tucson, AZ, United States
| | - Jian Tao
- School of Mathematics and Statistics, Northeast Normal University, Changchun, China
| |
Collapse
|
31
|
Primi R, Santos D, De Fruyt F, John OP. Comparison of classical and modern methods for measuring and correcting for acquiescence. Br J Math Stat Psychol 2019; 72:447-465. [PMID: 31032894 DOI: 10.1111/bmsp.12168] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 02/07/2019] [Indexed: 06/09/2023]
Abstract
Likert-type self-report scales are frequently used in large-scale educational assessment of social-emotional skills. Self-report scales rely on the assumption that their items elicit information only about the trait they are supposed to measure. However, different response biases may threaten this assumption. Specifically, in children, the response style of acquiescence is an important source of systematic error. Balanced scales, including an equal number of positively and negatively keyed items, have been proposed as a solution to control for acquiescence, but the reasons why this design feature worked from the perspective of modern psychometric models have been underexplored. Three methods for controlling for acquiescence are compared: classical method by partialling out the mean; an item response theory method to measure differential person functioning (DPF); and multidimensional item response theory (MIRT) with random intercept. Comparative analyses are conducted on simulated ratings and on self-ratings provided by 40,649 students (aged 11-18) on a fully balanced 30-item scale assessing conscientious self-management. Acquiescence bias was explained as DPF and it was demonstrated that: the acquiescence index is highly related to DPF; balanced scales produce scores controlled for DPF; and MIRT factor scores are highly related to scores controlled for DPF and the random intercept is highly related to DPF.
Collapse
Affiliation(s)
- Ricardo Primi
- Postgraduate Program in Psychology, Universidade São Francisco, Campinas, São Paulo, Brazil
- EduLab21, Ayrton Senna Institute, São Paulo, Brazil
| | - Daniel Santos
- EduLab21, Ayrton Senna Institute, São Paulo, Brazil
- Faculty of Economics, Administration and Accounting of Ribeirão Preto, University of São Paulo, Ribeirão Preto, Brazil
| | - Filip De Fruyt
- EduLab21, Ayrton Senna Institute, São Paulo, Brazil
- Department of Developmental, Personality and Social Psychology, Ghent University, Belgium
| | - Oliver P John
- EduLab21, Ayrton Senna Institute, São Paulo, Brazil
- Department of Psychology and Institute of Personality and Social Psychology, University of California, Berkeley, California, USA
| |
Collapse
|
32
|
Abstract
Computer-based testing (CBT) is becoming increasingly popular in assessing test-takers' latent abilities and making inferences regarding their cognitive processes. In addition to collecting item responses, an important benefit of using CBT is that response times (RTs) can also be recorded and used in subsequent analyses. To better understand the structural relations between multidimensional cognitive attributes and the working speed of test-takers, this research proposes a joint-modeling approach that integrates compensatory multidimensional latent traits and response speediness using item responses and RTs. The joint model is cast as a multilevel model in which the structural relation between working speed and accuracy are connected through their variance-covariance structures. The feasibility of this modeling approach is investigated via a Monte Carlo simulation study using a Bayesian estimation scheme. The results indicate that integrating RTs increased model parameter recovery and precision. In addition, Program of International Student Assessment (PISA) 2015 mathematics standard unit items are analyzed to further evaluate the feasibility of the approach to recover model parameters.
Collapse
Affiliation(s)
- Kaiwen Man
- University of Maryland, College Park,
USA
- Authors share the first authorship
| | - Jeffrey R. Harring
- University of Maryland, College Park,
USA
- Authors share the first authorship
| | - Hong Jiao
- University of Maryland, College Park,
USA
- Authors share the first authorship
| | - Peida Zhan
- Zhejiang Normal University, Jinhua,
China
| |
Collapse
|
33
|
Khorramdel L, von Davier M, Pokropek A. Combining mixture distribution and multidimensional IRTree models for the measurement of extreme response styles. Br J Math Stat Psychol 2019; 72:538-559. [PMID: 31385610 DOI: 10.1111/bmsp.12179] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 04/21/2019] [Indexed: 05/10/2023]
Abstract
Personality constructs, attitudes and other non-cognitive variables are often measured using rating or Likert-type scales, which does not come without problems. Especially in low-stakes assessments, respondents may produce biased responses due to response styles (RS) that reduce the validity and comparability of the measurement. Detecting and correcting RS is not always straightforward because not all respondents show RS and the ones who do may not do so to the same extent or in the same direction. The present study proposes the combination of a multidimensional IRTree model with a mixture distribution item response theory model and illustrates the application of the approach using data from the Programme for the International Assessment of Adult Competencies (PIAAC). This joint approach allows for the differentiation between different latent classes of respondents who show different RS behaviours and respondents who show RS versus respondents who give (largely) unbiased responses. We illustrate the application of the approach by examining extreme RS and show how the resulting latent classes can be further examined using external variables and process data from computer-based assessments to develop a better understanding of response behaviour and RS.
Collapse
|
34
|
Adams DJ, Bolt DM, Deng S, Smith SS, Baker TB. Using multidimensional item response theory to evaluate how response styles impact measurement. Br J Math Stat Psychol 2019; 72:466-485. [PMID: 30919943 PMCID: PMC6765459 DOI: 10.1111/bmsp.12169] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 02/21/2019] [Indexed: 05/25/2023]
Abstract
Multidimensional item response theory (MIRT) models for response style (e.g., Bolt, Lu, & Kim, 2014, Psychological Methods, 19, 528; Falk & Cai, 2016, Psychological Methods, 21, 328) provide flexibility in accommodating various response styles, but often present difficulty in isolating the effects of response style(s) from the intended substantive trait(s). In the presence of such measurement limitations, we consider several ways in which MIRT models are nevertheless useful in lending insight into how response styles may interfere with measurement for a given test instrument. Such a study can also inform whether alternative design considerations (e.g., anchoring vignettes, self-report items of heterogeneous content) that seek to control for response style effects may be helpful. We illustrate several aspects of an MIRT approach using real and simulated analyses.
Collapse
Affiliation(s)
- Daniel J Adams
- Department of Educational Psychology, University of Wisconsin, Madison, Wisconsin, USA
| | - Daniel M Bolt
- Department of Educational Psychology, University of Wisconsin, Madison, Wisconsin, USA
| | | | - Stevens S Smith
- Department of Medicine, University of Wisconsin, Madison, Wisconsin, USA
| | - Timothy B Baker
- Department of Medicine, University of Wisconsin, Madison, Wisconsin, USA
| |
Collapse
|
35
|
Abstract
Recently, large-scale testing programs have an increasing interest in providing examinees with more accurate diagnostic information by reporting overall and domain scores simultaneously. However, there are few studies focusing on how to report and interpret reliable total scores and domain scores based on bi-factor models. In this study, the authors introduced six methods of reporting overall and domain scores as weighted composite scores of the general and specific factors in a bi-factor model, and compared their performance with Yao's MIRT (multidimensional item response theory) method using both simulated and empirical data. In the simulation study, four factors were considered: test length, number of dimensions, correlation between dimensions, and sample size. Major findings are that Bifactor-M4 and Bifactor-M6, the methods utilizing discrimination parameters of the specific dimensions to compute the weights, provided the most accurate and reliable overall and domain scores in most conditions, especially when the test was long, the correlation between dimensions was high and the number of dimensions was large; additionally, Bifactor-M4 recovered the relationship of true ability parameters the best of all the proposed methods; On the contrary, Bifactor-M2, the method with equal weights, performed poor on the overall score estimation; Bifactor-M3 and Bifactor-M5, the methods where weights were computed using the discrimination parameters of all the dimensions, performed poor on the domain score estimation; Bifactor-M1, the original factor method, obtained the worst estimations.
Collapse
Affiliation(s)
- Yue Liu
- Beijing Normal University, China
| | - Zhen Li
- eMetric, San Antonio, TX, USA
| | | |
Collapse
|
36
|
Kim KY. A Comparison of the Separate and Concurrent Calibration Methods for the Full-Information Bifactor model. Appl Psychol Meas 2019; 43:512-526. [PMID: 31534287 PMCID: PMC6739745 DOI: 10.1177/0146621618813095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
When calibrating items using multidimensional item response theory (MIRT) models, item response theory (IRT) calibration programs typically set the probability density of latent variables to a multivariate standard normal distribution to handle three types of indeterminacies: (a) the location of the origin, (b) the unit of measurement along each coordinate axis, and (c) the orientation of the coordinate axes. However, by doing so, item parameter estimates obtained from two independent calibration runs on nonequivalent groups are on two different coordinate systems. To handle this issue and place all the item parameter estimates on a common coordinate system, a process called linking is necessary. Although various linking methods have been introduced and studied for the full MIRT model, little research has been conducted on linking methods for the bifactor model. Thus, the purpose of this study was to provide detailed descriptions of two separate calibration methods and the concurrent calibration method for the bifactor model and to compare the three linking methods through simulation. In general, the concurrent calibration method provided more accurate linking results than the two separate calibration methods, demonstrating better recovery of the item parameters, item characteristic surfaces, and expected score distribution.
Collapse
|
37
|
Mao X, Zhang J, Xin T. Application of Dimension Reduction to CAT Item Selection Under the Bifactor Model. Appl Psychol Meas 2019; 43:419-434. [PMID: 31452552 PMCID: PMC6696870 DOI: 10.1177/0146621618813086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multidimensional computerized adaptive testing (MCAT) based on the bifactor model is suitable for tests with multidimensional bifactor measurement structures. Several item selection methods that proved to be more advantageous than the maximum Fisher information method are not practical for bifactor MCAT due to time-consuming computations resulting from high dimensionality. To make them applicable in bifactor MCAT, dimension reduction is applied to four item selection methods, which are the posterior-weighted Fisher D-optimality (PDO) and three non-Fisher information-based methods-posterior expected Kullback-Leibler information (PKL), continuous entropy (CE), and mutual information (MI). They were compared with the Bayesian D-optimality (BDO) method in terms of estimation precision. When both the general and group factors are the measurement objectives, BDO, PDO, CE, and MI perform equally well and better than PKL. When the group factors represent nuisance dimensions, MI and CE perform the best in estimating the general factor, followed by the BDO, PDO, and PKL. How the bifactor pattern and test length affect estimation accuracy was also discussed.
Collapse
Affiliation(s)
| | - Jiahui Zhang
- Michigan State University, East Lansing, MI, USA
| | - Tao Xin
- Beijing Normal University, Beijing, China
| |
Collapse
|
38
|
da Silva MA, Liu R, Huggins-Manley AC, Bazán JL. Incorporating the Q-Matrix Into Multidimensional Item Response Theory Models. Educ Psychol Meas 2019; 79:665-687. [PMID: 32655178 PMCID: PMC7328237 DOI: 10.1177/0013164418814898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multidimensional item response theory (MIRT) models use data from individual item responses to estimate multiple latent traits of interest, making them useful in educational and psychological measurement, among other areas. When MIRT models are applied in practice, it is not uncommon to see that some items are designed to measure all latent traits while other items may only measure one or two traits. In order to facilitate a clear expression of which items measure which traits and formulate such relationships as a math function in MIRT models, we applied the concept of the Q-matrix commonly used in diagnostic classification models to MIRT models. In this study, we introduced how to incorporate a Q-matrix into an existing MIRT model, and demonstrated benefits of the proposed hybrid model through two simulation studies and an applied study. In addition, we showed the relative ease in modeling educational and psychological data through a Bayesian approach via the NUTS algorithm.
Collapse
Affiliation(s)
- Marcelo A. da Silva
- University of São Paulo, São Paulo, Brazil
- Federal University of São Carlos, São Carlos, Brazil
| | - Ren Liu
- University of California, Merced, CA, USA
| | | | | |
Collapse
|
39
|
Abstract
It is commonly known that respondents exhibit different response styles when responding to Likert-type items. For example, some respondents tend to select the extreme categories (e.g., strongly disagree and strongly agree), whereas some tend to select the middle categories (e.g., disagree, neutral, and agree). Furthermore, some respondents tend to disagree with every item (e.g., strongly disagree and disagree), whereas others tend to agree with every item (e.g., agree and strongly agree). In such cases, fitting standard unfolding item response theory (IRT) models that assume no response style will yield a poor fit and biased parameter estimates. Although there have been attempts to develop dominance IRT models to accommodate the various response styles, such models are usually restricted to a specific response style and cannot be used for unfolding data. In this study, a general unfolding IRT model is proposed that can be combined with a softmax function to accommodate various response styles via scoring functions. The parameters of the new model can be estimated using Bayesian Markov chain Monte Carlo algorithms. An empirical data set is used for demonstration purposes, followed by simulation studies to assess the parameter recovery of the new model, as well as the consequences of ignoring the impact of response styles on parameter estimators by fitting standard unfolding IRT models. The results suggest the new model to exhibit good parameter recovery and seriously biased estimates when the response styles are ignored.
Collapse
Affiliation(s)
- Chen-Wei Liu
- The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Wen-Chung Wang
- The Education University of Hong Kong, Tai Po, New Territories, Hong Kong
- Deceased
| |
Collapse
|
40
|
Rose N, Nagy G, Nagengast B, Frey A, Becker M. Modeling Multiple Item Context Effects With Generalized Linear Mixed Models. Front Psychol 2019; 10:248. [PMID: 30858809 PMCID: PMC6397884 DOI: 10.3389/fpsyg.2019.00248] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 01/25/2019] [Indexed: 11/30/2022] Open
Abstract
Item context effects refer to the impact of features of a test on an examinee's item responses. These effects cannot be explained by the abilities measured by the test. Investigations typically focus on only a single type of item context effects, such as item position effects, or mode effects, thereby ignoring the fact that different item context effects might operate simultaneously. In this study, two different types of context effects were modeled simultaneously drawing on data from an item calibration study of a multidimensional computerized test (N = 1,632) assessing student competencies in mathematics, science, and reading. We present a generalized linear mixed model (GLMM) parameterization of the multidimensional Rasch model including item position effects (distinguishing between within-block position effects and block position effects), domain order effects, and the interactions between them. Results show that both types of context effects played a role, and that the moderating effect of domain orders was very strong. The findings have direct consequences for planning and applying mixed domain assessment designs.
Collapse
Affiliation(s)
- Norman Rose
- Hector Research Institute of Education Sciences and Psychology, University of Tübingen, Tübingen, Germany
| | - Gabriel Nagy
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| | - Benjamin Nagengast
- Hector Research Institute of Education Sciences and Psychology, University of Tübingen, Tübingen, Germany
| | - Andreas Frey
- Department of Educational Psychology, Measurement, Evaluation and Counseling, Institute of Psychology, Goethe-University Frankfurt, Frankfurt, Germany.,Faculty of Education, Centre for Educational Measurement, University of Oslo, Oslo, Norway
| | - Michael Becker
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany.,German Institute for International Educational Research, Frankfurt, Germany
| |
Collapse
|
41
|
Abstract
The validity of inferences based on test scores will be threatened when examinees' test-taking non-effort is ignored. A possible solution is to add test-taking effort indicators in the measurement model after the non-effortful responses are flagged. As a new application of the multidimensional item response theory (MIRT) model for non-ignorable missing responses, this article proposed a MIRT method to account for non-effortful responses. Two simulation studies were conducted to examine the impact of non-effortful responses on item and latent ability parameter estimates, and to evaluate the performance of the MIRT method, comparing to the three-parameter logistic (3PL) model as well as the effort-moderated model. Results showed that: (a) as the percentage of non-effortful responses increased, the unidimensional 3PL model yielded poorer parameter estimates; (b) the MIRT model could obtain as accurate item parameter estimates as the effort-moderated model; (c) the MIRT model provided the most accurate ability parameter estimates when the correlation between test-taking effort and ability was high. A real data analysis was also conducted for illustration. The limitation and future research were discussed further.
Collapse
Affiliation(s)
- Yue Liu
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Zhen Li
- eMetric LLC, San Antonio, TX, United States
| | - Hongyun Liu
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| | - Fang Luo
- Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, China
| |
Collapse
|
42
|
Jordan P, Spiess M. A New Explanation and Proof of the Paradoxical Scoring Results in Multidimensional Item Response Models. Psychometrika 2018; 83:831-846. [PMID: 29030750 DOI: 10.1007/s11336-017-9588-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 07/17/2017] [Indexed: 06/07/2023]
Abstract
In multidimensional item response models, paradoxical scoring effects can arise, wherein correct answers are penalized and incorrect answers are rewarded. For the most prominent class of IRT models, the class of linearly compensatory models, a general derivation of paradoxical scoring effects based on the geometry of item discrimination vectors is given, which furthermore corrects an error in an established theorem on paradoxical results. This approach highlights the very counterintuitive way in which item discrimination parameters (and also factor loadings) have to be interpreted in terms of their influence on the latent ability estimate. It is proven that, despite the error in the original proof, the key result concerning the existence of paradoxical effects remains true-although the actual relation to the item parameters is shown to be a more complicated function than previous results suggested. The new proof enables further insights into the actual mathematical causation of the paradox and generalizes the findings within the class of linearly compensatory models.
Collapse
Affiliation(s)
- Pascal Jordan
- University of Hamburg, Von-Melle-Park 5, 20146 , Hamburg, Germany.
| | - Martin Spiess
- University of Hamburg, Von-Melle-Park 5, 20146 , Hamburg, Germany
| |
Collapse
|
43
|
Fujimoto KA. A general Bayesian multilevel multidimensional IRT model for locally dependent data. Br J Math Stat Psychol 2018; 71:536-560. [PMID: 29882212 DOI: 10.1111/bmsp.12133] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 01/05/2018] [Indexed: 05/28/2023]
Abstract
Many item response theory (IRT) models take a multidimensional perspective to deal with sources that induce local item dependence (LID), with these models often making an orthogonal assumption about the dimensional structure of the data. One reason for this assumption is because of the indeterminacy issue in estimating the correlations among the dimensions in structures often specified to deal with sources of LID (e.g., bifactor and two-tier structures), and the assumption usually goes untested. Unfortunately, the mere fact that assessing these correlations is a challenge for some estimation methods does not mean that data seen in practice support such orthogonal structure. In this paper, a Bayesian multilevel multidimensional IRT model for locally dependent data is presented. This model can test whether item response data violate the orthogonal assumption that many IRT models make about the dimensional structure of the data when addressing sources of LID, and this test is carried out at the dimensional level while accounting for sampling clusters. Simulations show that the model presented is effective at carrying out this task. The utility of the model is also illustrated on an empirical data set.
Collapse
|
44
|
van Lier HG, Siemons L, van der Laar MAFJ, Glas CAW. Estimating Optimal Weights for Compound Scores: A Multidimensional IRT Approach. Multivariate Behav Res 2018; 53:914-924. [PMID: 30463444 DOI: 10.1080/00273171.2018.1478712] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 05/09/2018] [Accepted: 05/14/2018] [Indexed: 06/09/2023]
Abstract
A method is proposed for constructing indices as linear functions of variables such that the reliability of the compound score is maximized. Reliability is defined in the framework of latent variable modeling [i.e., item response theory (IRT)] and optimal weights of the components of the index are found by maximizing the posterior variance relative to the total latent variable variance. Three methods for estimating the weights are proposed. The first is a likelihood-based approach, that is, marginal maximum likelihood (MML). The other two are Bayesian approaches based on Markov chain Monte Carlo (MCMC) computational methods. One is based on an augmented Gibbs sampler specifically targeted at IRT, and the other is based on a general purpose Gibbs sampler such as implemented in OpenBugs and Jags. Simulation studies are presented to demonstrate the procedure and to compare the three methods. Results are very similar, so practitioners may be suggested the use of the easily accessible latter method. A real-data set pertaining to the 28-joint Disease Activity Score is used to show how the methods can be applied in a complex measurement situation with multiple time points and mixed data formats.
Collapse
Affiliation(s)
| | - Liseth Siemons
- b Department of Research Methodology, Measurement and Data-analysis, Universiteit Twente
| | | | - Cees A W Glas
- b Department of Research Methodology, Measurement and Data-analysis, Universiteit Twente
| |
Collapse
|
45
|
Falk CF, Monroe S. On Lagrange Multiplier Tests in Multidimensional Item Response Theory: Information Matrices and Model Misspecification. Educ Psychol Meas 2018; 78:653-678. [PMID: 30147121 PMCID: PMC6096471 DOI: 10.1177/0013164417714506] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Lagrange multiplier (LM) or score tests have seen renewed interest for the purpose of diagnosing misspecification in item response theory (IRT) models. LM tests can also be used to test whether parameters differ from a fixed value. We argue that the utility of LM tests depends on both the method used to compute the test and the degree of misspecification in the initially fitted model. We demonstrate both of these points in the context of a multidimensional IRT framework. Through an extensive Monte Carlo simulation study, we examine the performance of LM tests under varying degrees of model misspecification, model size, and different information matrix approximations. A generalized LM test designed specifically for use under misspecification, which has apparently not been previously studied in an IRT framework, performed the best in our simulations. Finally, we reemphasize caution in using LM tests for model specification searches.
Collapse
Affiliation(s)
- Carl F. Falk
- Michigan State University, East Lansing,
MI, USA
| | | |
Collapse
|
46
|
Abstract
The measurement of individual change has been an important topic in both education and psychology. For instance, teachers are interested in whether students have significantly improved (e.g., learned) from instruction, and counselors are interested in whether particular behaviors have been significantly changed after certain interventions. Although classical test methods have been unable to adequately resolve the problems in measuring change, recent approaches for measuring change have begun to use item response theory (IRT). However, all prior methods mainly focus on testing whether growth is significant at the group level. The present research targets a key research question: Is the "change" in latent trait estimates for each individual significant across occasions? Many researchers have addressed this research question assuming that the latent trait is unidimensional. This research generalizes their earlier work and proposes four hypothesis testing methods to evaluate individual change on multiple latent traits: a multivariate Z-test, a multivariate likelihood ratio test, a multivariate score test, and a Kullback-Leibler test. Simulation results show that these tests hold promise of detecting individual change with low Type I error and high power. A real-data example from an educational assessment illustrates the application of the proposed methods.
Collapse
Affiliation(s)
- Chun Wang
- University of Minnesota, Minneapolis, MN, USA
| | | |
Collapse
|
47
|
Walker CM, Gocer Sahin S. Using a Multidimensional IRT Framework to Better Understand Differential Item Functioning (DIF): A Tale of Three DIF Detection Procedures. Educ Psychol Meas 2017; 77:945-970. [PMID: 29795940 PMCID: PMC5965646 DOI: 10.1177/0013164416657137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The theoretical reason for the presence of differential item functioning (DIF) is that data are multidimensional and two groups of examinees differ in their underlying ability distribution for the secondary dimension(s). Therefore, the purpose of this study was to determine how much the secondary ability distributions must differ before DIF is detected. Two-dimensional binary data sets were simulated using a compensatory multidimensional item response theory (MIRT) model, incrementally varying the mean difference on the second dimension between reference and focal group examinees while systematically increasing the correlation between dimensions. Three different DIF detection procedures were used to test for DIF: (1) SIBTEST, (2) Mantel-Haenszel, and (3) logistic regression. Results indicated that even with a very small mean difference on the secondary dimension, smaller than typically considered in previous research, DIF will be detected. Additional analyses indicated that even with the smallest mean difference considered in this study, 0.25, statistically significant differences will almost always be found between reference and focal group examinees on subtest scores consisting of items measuring the secondary dimension.
Collapse
|
48
|
Liu CW, Wang WC. Non-ignorable missingness item response theory models for choice effects in examinee-selected items. Br J Math Stat Psychol 2017; 70:499-524. [PMID: 28390145 DOI: 10.1111/bmsp.12097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Revised: 01/19/2017] [Indexed: 06/07/2023]
Abstract
Examinee-selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non-ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two-dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non-ignorable and to determine how to apply the new model to the data collected. Two follow-up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non-ignorable missing data were mistakenly treated as ignorable.
Collapse
Affiliation(s)
- Chen-Wei Liu
- Department of Psychology, The Education University of Hong Kong, Hong Kong
| | - Wen-Chung Wang
- Department of Psychology, The Education University of Hong Kong, Hong Kong
| |
Collapse
|
49
|
Abstract
This article introduces Bayesian estimation and evaluation procedures for the multidimensional nominal response model. The utility of this model is to perform a nominal factor analysis of items that consist of a finite number of unordered response categories. The key aspect of the model, in comparison with traditional factorial model, is that there is a slope for each response category on the latent dimensions, instead of having slopes associated to the items. The extended parameterization of the multidimensional nominal response model requires large samples for estimation. When sample size is of a moderate or small size, some of these parameters may be weakly empirically identifiable and the estimation algorithm may run into difficulties. We propose a Bayesian MCMC inferential algorithm to estimate the parameters and the number of dimensions underlying the multidimensional nominal response model. Two Bayesian approaches to model evaluation were compared: discrepancy statistics (DIC, WAICC, and LOO) that provide an indication of the relative merit of different models, and the standardized generalized discrepancy measure that requires resampling data and is computationally more involved. A simulation study was conducted to compare these two approaches, and the results show that the standardized generalized discrepancy measure can be used to reliably estimate the dimensionality of the model whereas the discrepancy statistics are questionable. The paper also includes an example with real data in the context of learning styles, in which the model is used to conduct an exploratory factor analysis of nominal data.
Collapse
Affiliation(s)
- Javier Revuelta
- Department of Psychology, Autonoma University of MadridMadrid, Spain
| | - Carmen Ximénez
- Department of Psychology, Autonoma University of MadridMadrid, Spain
| |
Collapse
|
50
|
Bashkov BM, DeMars CE. Examining the Performance of the Metropolis-Hastings Robbins-Monro Algorithm in the Estimation of Multilevel Multidimensional IRT Models. Appl Psychol Meas 2017; 41:323-337. [PMID: 29881095 PMCID: PMC5978673 DOI: 10.1177/0146621616688923] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The purpose of this study was to examine the performance of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm in the estimation of multilevel multidimensional item response theory (ML-MIRT) models. The accuracy and efficiency of MH-RM in recovering item parameters, latent variances and covariances, as well as ability estimates within and between clusters (e.g., schools) were investigated in a simulation study, varying the number of dimensions, the intraclass correlation coefficient, the number of clusters, and cluster size, for a total of 24 conditions. Overall, MH-RM performed well in recovering the item, person, and group-level parameters of the model. Ratios of the empirical to analytical standard errors indicated that the analytical standard errors reported in flexMIRT were somewhat overestimated for the cluster-level ability estimates, a little too large for the person-level ability estimates, and essentially accurate for the other parameters. Limitations of the study, implications for educational measurement practice, and directions for future research are offered.
Collapse
|