1
|
Robitzsch A. To Check or Not to Check? A Comment on the Contemporary Psychometrics (ConPsy) Checklist for the Analysis of Questionnaire Items. Eur J Investig Health Psychol Educ 2023; 13:2150-2159. [PMID: 37887152 PMCID: PMC10606083 DOI: 10.3390/ejihpe13100151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/28/2023] Open
Abstract
In a recent paper, the first version of the contemporary psychometrics (ConPsy) checklist for assessing measurement tool quality has been published. This checklist aims to provide guidelines and references to researchers to assess measurement properties for newly developed measurement instruments. The ConPsy checklist recommends appropriate statistical methods for measurement instrument evaluation to guide researchers in instrument development and to support peer review. In this opinion article, I critically review some aspects of the checklist and question the usefulness of certain psychometric analyses in research practice.
Collapse
Affiliation(s)
- Alexander Robitzsch
- IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany;
- Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
| |
Collapse
|
2
|
Grund S, Lüdtke O, Robitzsch A. Pooling methods for likelihood ratio tests in multiply imputed data sets. Psychol Methods 2023; 28:1207-1221. [PMID: 37104764 DOI: 10.1037/met0000556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Likelihood ratio tests (LRTs) are a popular tool for comparing statistical models. However, missing data are also common in empirical research, and multiple imputation (MI) is often used to deal with them. In multiply imputed data, there are multiple options for conducting LRTs, and new methods are still being proposed. In this article, we compare all available methods in multiple simulations covering applications in linear regression, generalized linear models, and structural equation modeling. In addition, we implemented these methods in an R package, and we illustrate its application in an example analysis concerned with the investigation of measurement invariance. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
- Simon Grund
- Leibniz Institute for Science and Mathematics Education (IPN)
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education (IPN)
| | | |
Collapse
|
3
|
Abstract
Local structural equation models (LSEM) are structural equation models that study model parameters as a function of a moderator. This article reviews and extends LSEM estimation methods and discusses the implementation in the R package sirt. In previous studies, LSEM was fitted as a sequence of models separately evaluated as each value of the moderator variables. In this article, a joint estimation approach is proposed that is a simultaneous estimation method across all moderator values and also allows some model parameters to be invariant with respect to the moderator. Moreover, sufficient details on the main estimation functions in the R package sirt are provided. The practical implementation of LSEM is demonstrated using illustrative datasets and an empirical example. Moreover, two simulation studies investigate the statistical properties of parameter estimation and significance testing in LSEM.
Collapse
Affiliation(s)
- Alexander Robitzsch
- IPN–Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany;
- Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
| |
Collapse
|
4
|
Fuentes A, Lüdtke O, Robitzsch A. Causal Inference with Multilevel Data: A Comparison of Different Propensity Score Weighting Approaches. Multivariate Behav Res 2022; 57:916-939. [PMID: 34128730 DOI: 10.1080/00273171.2021.1925521] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Propensity score methods are a widely recommended approach to adjust for confounding and to recover treatment effects with non-experimental, single-level data. This article reviews propensity score weighting estimators for multilevel data in which individuals (level 1) are nested in clusters (level 2) and nonrandomly assigned to either a treatment or control condition at level 1. We address the choice of a weighting strategy (inverse probability weights, trimming, overlap weights, calibration weights) and discuss key issues related to the specification of the propensity score model (fixed-effects model, multilevel random-effects model) in the context of multilevel data. In three simulation studies, we show that estimates based on calibration weights, which prioritize balancing the sample distribution of level-1 and (unmeasured) level-2 covariates, should be preferred under many scenarios (i.e., treatment effect heterogeneity, presence of strong level-2 confounding) and can accommodate covariate-by-cluster interactions. However, when level-1 covariate effects vary strongly across clusters (i.e., under random slopes), and this variation is present in both the treatment and outcome data-generating mechanisms, large cluster sizes are needed to obtain accurate estimates of the treatment effect. We also discuss the implementation of survey weights and present a real-data example that illustrates the different methods.
Collapse
Affiliation(s)
- Alvaro Fuentes
- Centre for International Student Assessment, Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| | - Oliver Lüdtke
- Centre for International Student Assessment, Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| | | |
Collapse
|
5
|
Grund S, Lüdtke O, Robitzsch A. Using synthetic data to improve the reproducibility of statistical results in psychological research. Psychol Methods 2022:2022-87072-001. [PMID: 35925728 DOI: 10.1037/met0000526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In recent years, psychological research has faced a credibility crisis, and open data are often regarded as an important step toward a more reproducible psychological science. However, privacy concerns are among the main reasons that prevent data sharing. Synthetic data procedures, which are based on the multiple imputation (MI) approach to missing data, can be used to replace sensitive data with simulated values, which can be analyzed in place of the original data. One crucial requirement of this approach is that the synthesis model is correctly specified. In this article, we investigated the statistical properties of synthetic data with a particular emphasis on the reproducibility of statistical results. To this end, we compared conventional approaches to synthetic data based on MI with a data-augmented approach (DA-MI) that attempts to combine the advantages of masking methods and synthetic data, thus making the procedure more robust to misspecification. In multiple simulation studies, we found that the good properties of the MI approach strongly depend on the correct specification of the synthesis model, whereas the DA-MI approach can provide useful results even under various types of misspecification. This suggests that the DA-MI approach to synthetic data can provide an important tool that can be used to facilitate data sharing and improve reproducibility in psychological research. In a working example, we also demonstrate the implementation of these approaches in widely available software, and we provide recommendations for practice. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Simon Grund
- Leibniz Institute for Science and Mathematics Education
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education
| | | |
Collapse
|
6
|
Abstract
The bivariate Stable Trait, AutoRegressive Trait, and State (STARTS) model provides a general approach for estimating reciprocal effects between constructs over time. However, previous research has shown that this model is difficult to estimate using the maximum likelihood (ML) method (e.g., nonconvergence). In this article, we introduce a Bayesian approach for estimating the bivariate STARTS model and implement it in the software Stan. We discuss issues of model parameterization and show how appropriate prior distributions for model parameters can be selected. Specifically, we propose the four-parameter beta distribution as a flexible prior distribution for the autoregressive and cross-lagged effects. Using a simulation study, we show that the proposed Bayesian approach provides more accurate estimates than ML estimation in challenging data constellations. An example is presented to illustrate how the Bayesian approach can be used to stabilize the parameter estimates of the bivariate STARTS model.
Collapse
Affiliation(s)
- Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| | - Esther Ulitzsch
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| |
Collapse
|
7
|
Ulitzsch E, Lüdtke O, Robitzsch A. Alleviating estimation problems in small sample structural equation modeling-A comparison of constrained maximum likelihood, Bayesian estimation, and fixed reliability approaches. Psychol Methods 2021:2022-13410-001. [PMID: 34928675 DOI: 10.1037/met0000435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Small sample structural equation modeling (SEM) may exhibit serious estimation problems, such as failure to converge, inadmissible solutions, and unstable parameter estimates. A vast literature has compared the performance of different solutions for small sample SEM in contrast to unconstrained maximum likelihood (ML) estimation. Less is known, however, on the gains and pitfalls of different solutions in contrast to each other. Focusing on three current solutions-constrained ML, Bayesian methods using Markov chain Monte Carlo techniques, and fixed reliability single indicator (SI) approaches-we bridge this gap. When doing so, we evaluate the potential and boundaries of different parameterizations, constraints, and weakly informative prior distributions for improving the quality of the estimation procedure and stabilizing parameter estimates. The performance of all approaches is compared in a simulation study. Under conditions with low reliabilities, Bayesian methods without additional prior information by far outperform constrained ML in terms of accuracy of parameter estimates as well as the worst-performing fixed reliability SI approach and do not perform worse than the best-performing fixed reliability SI approach. Under conditions with high reliabilities, constrained ML shows good performance. Both constrained ML and Bayesian methods exhibit conservative to acceptable Type I error rates. Fixed reliability SI approaches are prone to undercoverage and severe inflation of Type I error rates. Stabilizing effects on Bayesian parameter estimates can be achieved even with mildly incorrect prior information. In an empirical example, we illustrate the practical importance of carefully choosing the method of analysis for small sample SEM. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
|
8
|
Robitzsch A. On the Treatment of Missing Item Responses in Educational Large-Scale Assessment Data: An Illustrative Simulation Study and a Case Study Using PISA 2018 Mathematics Data. Eur J Investig Health Psychol Educ 2021; 11:1653-1687. [PMID: 34940395 PMCID: PMC8700118 DOI: 10.3390/ejihpe11040117] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 11/26/2021] [Accepted: 12/10/2021] [Indexed: 11/17/2022] Open
Abstract
Missing item responses are prevalent in educational large-scale assessment studies such as the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians have advocated for a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. In an illustrative simulation study, it is shown that the Mislevy-Wu model provides unbiased model parameters. Moreover, the simulation replicates the finding from various simulation studies from the literature that scoring missing item responses as wrong provides biased estimates if the latent ignorability assumption holds in the data-generating model. However, if missing item responses are generated such that they can only be generated from incorrect item responses, applying an item response model that relies on latent ignorability results in biased estimates. The Mislevy-Wu model guarantees unbiased parameter estimates if the more general Mislevy-Wu model holds in the data-generating model. In addition, this article uses the PISA 2018 mathematics dataset as a case study to investigate the consequences of different missing data treatments on country means and country standard deviations. Obtained country means and country standard deviations can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, in the discussion section, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.
Collapse
Affiliation(s)
- Alexander Robitzsch
- IPN—Leibniz Institute for Science and Mathematics Education, University of Kiel, Olshausenstraße 62, 24118 Kiel, Germany;
- Centre for International Student Assessment (ZIB), University of Kiel, Olshausenstraße 62, 24118 Kiel, Germany
| |
Collapse
|
9
|
Nestler S, Lüdtke O, Robitzsch A. Erratum to: Maximum Likelihood Estimation of a Social Relations Structural Equation Model. Psychometrika 2021; 86:842. [PMID: 34331189 PMCID: PMC8502134 DOI: 10.1007/s11336-021-09793-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Affiliation(s)
- Steffen Nestler
- Institut für Psychologie, University of Münster, Fliednerstr. 21, 48149, Münster, Germany.
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Kiel, Germany
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Kiel, Germany
| |
Collapse
|
10
|
Köhler C, Robitzsch A, Fährmann K, von Davier M, Hartig J. A semiparametric approach for item response function estimation to detect item misfit. Br J Math Stat Psychol 2021; 74 Suppl 1:157-175. [PMID: 33332585 DOI: 10.1111/bmsp.12224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 09/17/2020] [Indexed: 06/12/2023]
Abstract
When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000).
Collapse
Affiliation(s)
- Carmen Köhler
- DIPF - Leibniz Institute for Research and Information in Education, Frankfurt, Germany
| | - Alexander Robitzsch
- IPN - Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment (ZIB), Munich, Germany
| | - Katharina Fährmann
- DIPF - Leibniz Institute for Research and Information in Education, Frankfurt, Germany
| | | | - Johannes Hartig
- DIPF - Leibniz Institute for Research and Information in Education, Frankfurt, Germany
| |
Collapse
|
11
|
Lüdtke O, Ulitzsch E, Robitzsch A. A Comparison of Penalized Maximum Likelihood Estimation and Markov Chain Monte Carlo Techniques for Estimating Confirmatory Factor Analysis Models With Small Sample Sizes. Front Psychol 2021; 12:615162. [PMID: 33995176 PMCID: PMC8118082 DOI: 10.3389/fpsyg.2021.615162] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 03/29/2021] [Indexed: 11/13/2022] Open
Abstract
With small to modest sample sizes and complex models, maximum likelihood (ML) estimation of confirmatory factor analysis (CFA) models can show serious estimation problems such as non-convergence or parameter estimates outside the admissible parameter space. In this article, we distinguish different Bayesian estimators that can be used to stabilize the parameter estimates of a CFA: the mode of the joint posterior distribution that is obtained from penalized maximum likelihood (PML) estimation, and the mean (EAP), median (Med), or mode (MAP) of the marginal posterior distribution that are calculated by using Markov Chain Monte Carlo (MCMC) methods. In two simulation studies, we evaluated the performance of the Bayesian estimators from a frequentist point of view. The results show that the EAP produced more accurate estimates of the latent correlation in many conditions and outperformed the other Bayesian estimators in terms of root mean squared error (RMSE). We also argue that it is often advantageous to choose a parameterization in which the main parameters of interest are bounded, and we suggest the four-parameter beta distribution as a prior distribution for loadings and correlations. Using simulated data, we show that selecting weakly informative four-parameter beta priors can further stabilize parameter estimates, even in cases when the priors were mildly misspecified. Finally, we derive recommendations and propose directions for further research.
Collapse
Affiliation(s)
- Oliver Lüdtke
- IPN – Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Kiel, Germany
| | - Esther Ulitzsch
- IPN – Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| | - Alexander Robitzsch
- IPN – Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Kiel, Germany
| |
Collapse
|
12
|
Nestler S, Lüdtke O, Robitzsch A. Maximum likelihood estimation of a social relations structural equation model. Psychometrika 2020; 85:870-889. [PMID: 33094388 PMCID: PMC8502151 DOI: 10.1007/s11336-020-09728-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 07/06/2020] [Indexed: 06/11/2023]
Abstract
The social relations model (SRM) is widely used in psychology to investigate the components that underlie interpersonal perceptions, behaviors, and judgments. SRM researchers are often interested in investigating the multivariate relations between SRM effects. However, at present, it is not possible to investigate such relations without relying on a two-step approach that depends on potentially unreliable estimates of the true SRM effects. Here, we introduce a way to combine the SRM with the structural equation modeling (SEM) framework and show how the parameters of our combination can be estimated with a maximum likelihood (ML) approach. We illustrate the model with an example from personality psychology. We also investigate the statistical properties of the model in a small simulation study showing that our approach performs well in most simulation conditions. An R package (called srm) is available implementing the proposed methods.
Collapse
Affiliation(s)
- Steffen Nestler
- Institut für Psychologie, University of Münster, Fliednerstr. 21, 48149, Münster, Germany.
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Kiel, Germany
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Kiel, Germany
| |
Collapse
|
13
|
Jansen M, Lüdtke O, Robitzsch A. Disentangling different sources of stability and change in students’ academic self-concepts: An integrative data analysis using the STARTS model. Journal of Educational Psychology 2020. [DOI: 10.1037/edu0000448] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
14
|
Robitzsch A. Regularized Latent Class Analysis for Polytomous Item Responses: An Application to SPM-LS Data. J Intell 2020; 8:E30. [PMID: 32823949 PMCID: PMC7555561 DOI: 10.3390/jintelligence8030030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 07/26/2020] [Accepted: 08/10/2020] [Indexed: 11/28/2022] Open
Abstract
The last series of Raven's standard progressive matrices (SPM-LS) test was studied with respect to its psychometric properties in a series of recent papers. In this paper, the SPM-LS dataset is analyzed with regularized latent class models (RLCMs). For dichotomous item response data, an alternative estimation approach based on fused regularization for RLCMs is proposed. For polytomous item responses, different alternative fused regularization penalties are presented. The usefulness of the proposed methods is demonstrated in a simulated data illustration and for the SPM-LS dataset. For the SPM-LS dataset, it turned out the regularized latent class model resulted in five partially ordered latent classes. In total, three out of five latent classes are ordered for all items. For the remaining two classes, violations for two and three items were found, respectively, which can be interpreted as a kind of latent differential item functioning.
Collapse
Affiliation(s)
- Alexander Robitzsch
- IPN—Leibniz Institute for Science and Mathematics Education, D-24098 Kiel, Germany;
- Centre for International Student Assessment (ZIB), D-24098 Kiel, Germany
| |
Collapse
|
15
|
Robitzsch A, Lüdtke O, Goldhammer F, Kroehne U, Köller O. Reanalysis of the German PISA Data: A Comparison of Different Approaches for Trend Estimation With a Particular Emphasis on Mode Effects. Front Psychol 2020; 11:884. [PMID: 32528352 PMCID: PMC7264417 DOI: 10.3389/fpsyg.2020.00884] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 04/09/2020] [Indexed: 12/03/2022] Open
Abstract
International large-scale assessments, such as the Program for International Student Assessment (PISA), are conducted to provide information on the effectiveness of education systems. In PISA, the target population of 15-year-old students is assessed every 3 years. Trends show whether competencies have changed in the countries between PISA cycles. In order to provide valid trend estimates, it is desirable to retain the same test conditions and statistical methods in all PISA cycles. In PISA 2015, however, the test mode changed from paper-based to computer-based tests, and the scaling method was changed. In this paper, we investigate the effects of these changes on trend estimation in PISA using German data from all PISA cycles (2000–2015). Our findings suggest that the change from paper-based to computer-based tests could have a severe impact on trend estimation but that the change of the scaling model did not substantially change the trend estimates.
Collapse
Affiliation(s)
- Alexander Robitzsch
- IPN - Leibniz Institute for Science and Mathematics Education, Kiel, Germany.,Centre for International Student Assessment (ZIB), Kiel, Germany
| | - Oliver Lüdtke
- IPN - Leibniz Institute for Science and Mathematics Education, Kiel, Germany.,Centre for International Student Assessment (ZIB), Kiel, Germany
| | - Frank Goldhammer
- DIPF - Leibniz Institute for Research and Information in Education, Frankfurt, Germany.,Centre for International Student Assessment (ZIB), Frankfurt, Germany
| | - Ulf Kroehne
- DIPF - Leibniz Institute for Research and Information in Education, Frankfurt, Germany
| | - Olaf Köller
- IPN - Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| |
Collapse
|
16
|
Lüdtke O, Robitzsch A, West SG. Analysis of Interactions and Nonlinear Effects with Missing Data: A Factored Regression Modeling Approach Using Maximum Likelihood Estimation. Multivariate Behav Res 2020; 55:361-381. [PMID: 31366241 DOI: 10.1080/00273171.2019.1640104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.
Collapse
Affiliation(s)
- Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education
- Centre for International Student Assessment
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education
- Centre for International Student Assessment
| | | |
Collapse
|
17
|
Robitzsch A. Book Review: Modern Psychometrics With R. Front Psychol 2020. [PMCID: PMC7174676 DOI: 10.3389/fpsyg.2020.00606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Alexander Robitzsch
- IPN – Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment (ZIB), Kiel, Germany
- *Correspondence: Alexander Robitzsch
| |
Collapse
|
18
|
Lüdtke O, Robitzsch A, West SG. Regression models involving nonlinear effects with missing data: A sequential modeling approach using Bayesian estimation. Psychol Methods 2019; 25:157-181. [PMID: 31478719 DOI: 10.1037/met0000233] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a joint normal distribution, the default in many statistical software packages. This distribution will in general be misspecified if the predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we discuss a sequential modeling approach that can be applied to decompose the joint distribution of the variables into 2 parts: (a) a part that is due to the model of interest and (b) a part that is due to the model for the incomplete predictors. We demonstrate how the sequential modeling approach can be used to implement a multiple imputation strategy based on Bayesian estimation techniques that can accommodate rather complex substantive regression models with nonlinear effects and also allows a flexible treatment of auxiliary variables. In 4 simulation studies, we showed that the sequential modeling approach can be applied to estimate nonlinear effects in regression models with missing values on continuous, categorical, or skewed predictor variables under a broad range of conditions and investigated the robustness of the proposed approach against distributional misspecifications. We developed the R package mdmb, which facilitates a user-friendly application of the sequential modeling approach, and we present a real-data example that illustrates the flexibility of the software. (PsycINFO Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
- Oliver Lüdtke
- Department of Educational Measurement, Leibniz Institute for Science and Mathematics Education
| | - Alexander Robitzsch
- Department of Educational Measurement, Leibniz Institute for Science and Mathematics Education
| | | |
Collapse
|
19
|
Wagner J, Lüdtke O, Robitzsch A. Does personality become more stable with age? Disentangling state and trait effects for the big five across the life span using local structural equation modeling. J Pers Soc Psychol 2019; 116:666-680. [DOI: 10.1037/pspp0000203] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
20
|
Lüdtke O, Robitzsch A, Trautwein U. Integrating Covariates into Social Relations Models: A Plausible Values Approach for Handling Measurement Error in Perceiver and Target Effects. Multivariate Behav Res 2018; 53:102-124. [PMID: 29304292 DOI: 10.1080/00273171.2017.1406793] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The Social Relations Model (SRM) is a conceptual and analytical approach to examining dyadic behaviors and interpersonal perceptions within groups. In an SRM, the perceiver effect describes a person's tendency to perceive other group members in a certain way, whereas the target effect measures the tendency to be perceived by others in certain ways. In SRM research, it is often of interest to relate these individual SRM effects to covariates. However, the estimated individual SRM effects might not provide a very reliable measure of the true, unobserved SRM effects, resulting in distorted estimates of associations with other variables. This article introduces a plausible values approach that allows users to correct for measurement error when assessing the association of individual SRM effects with other individual difference variables. In the plausible values approach, the latent, true individual SRM effects are treated as missing values and are imputed from an imputation model by applying Bayesian estimation techniques. In a simulation study, the statistical properties of the plausible values approach are compared with two approaches that have been used in previous research. A data example from educational psychology is presented to illustrate how the plausible values approach can be implemented with the software WinBUGS.
Collapse
Affiliation(s)
- Oliver Lüdtke
- a Leibniz Institute for Science and Mathematics Education , Kiel , Germany
- b Centre for International Student Assessment , Germany
| | - Alexander Robitzsch
- a Leibniz Institute for Science and Mathematics Education , Kiel , Germany
- b Centre for International Student Assessment , Germany
| | - Ulrich Trautwein
- c Hector Research Institute of Education Sciences and Psychology , University of Tübingen , Germany
| |
Collapse
|
21
|
Lüdtke O, Robitzsch A, Wagner J. More stable estimation of the STARTS model: A Bayesian approach using Markov chain Monte Carlo techniques. Psychol Methods 2017; 23:570-593. [PMID: 29172612 DOI: 10.1037/met0000155] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The STARTS (Stable Trait, AutoRegressive Trait, and State) model decomposes individual differences in psychological measurement across time into 3 sources of variation: a time-invariant stable component, a time-varying autoregressive component, and an occasion-specific state component. Previous simulation research and applications of the STARTS model have shown that serious estimation problems such as nonconvergence or inadmissible estimates (e.g., negative variances) frequently occur for STARTS model parameters. This article introduces a general approach to estimating the parameters of the STARTS model by employing Bayesian methods that use Markov Chain Monte Carlo (MCMC) techniques. With the specification of appropriate prior distributions, the Bayesian approach offers the advantage that the model estimates will be within the admissible range, and it should be possible to avoid estimation problems. Furthermore, we show how Bayesian methods can be used to stabilize STARTS model estimates by specifying weakly informative prior distributions for the model parameters. In a simulation study, the statistical properties (bias, root mean square error, coverage rate) of the parameter estimates obtained from the Bayesian approach are compared with those of the maximum-likelihood approach. A data example is presented to illustrate how the Bayesian approach can be used to estimate the STARTS model. Finally, further extensions of the STARTS model are discussed, and suggestions for applied research are made. (PsycINFO Database Record
Collapse
Affiliation(s)
- Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education
| | | | - Jenny Wagner
- Leibniz Institute for Science and Mathematics Education
| |
Collapse
|
22
|
Wagner J, Lüdtke O, Robitzsch A, Göllner R, Trautwein U. Self-esteem development in the school context: The roles of intrapersonal and interpersonal social predictors. J Pers 2017; 86:481-497. [PMID: 28555752 DOI: 10.1111/jopy.12330] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 05/09/2017] [Accepted: 05/23/2017] [Indexed: 11/26/2022]
Abstract
OBJECTIVE When considering that social inclusion is a basic human need, it makes sense that self-esteem is fueled by social feedback and the sense of being liked by others. This is particularly true with respect to early adolescence, when peers become increasingly important. In the current article, we tested which components of social inclusion are particularly beneficial for the development of self-esteem by differentiating between intrapersonal components (i.e., self-perceptions of social inclusion) and interpersonal components (i.e., perceiver and target effects of liking). METHOD Using longitudinal data from 2,281 fifth graders and 1,766 eighth graders (TRAIN; Jonkmann et al., 2013), we tested mean-level self-esteem development and the role of intrapersonal components in this development. Using classroom round-robin data on liking from subsamples of 846 (689) fifth-(eighth-)grade students nested in 46 (39) classes, we tested effects of interpersonal relationship components on self-esteem development in the classroom context. RESULTS The three major findings demonstrated, first, no consistent trends in mean levels of self-esteem in early to middle adolescence; second, constant positive effects of intrapersonal components between students and within students across time; and third, no stable effects of interpersonal components. CONCLUSIONS The discussion highlights the role of intrapersonal components and the methodological challenges of our study.
Collapse
Affiliation(s)
- Jenny Wagner
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany.,Humboldt-University, Berlin, Germany
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany.,Centre for International Student Assessment (ZIB), Germany
| | | | | | | |
Collapse
|
23
|
Abstract
Zusammenfassung. In der psychologischen Forschung durchgeführte Messungen zur Erfassung von Konstrukten sind meistens mit einem Messfehler behaftet. Diese Messfehler führen zu verzerrten Schätzern von Populationsparametern und deren Standardfehlern. In den letzten Jahrzehnten hat sich im Bereich der Large-Scale-Assessments mit der Plausible-Values-Technik ein Verfahren zur Korrektur von messfehlerbehafteten Zusammenhängen zwischen latenten Variablen und beobachteten Kovariaten etabliert. Der vorliegende Beitrag führt anhand eines einfachen Beispiels aus der Klassischen Testtheorie in dieses komplexe statistische Verfahren ein. Es wird gezeigt, dass alternative Verfahren zur Schätzung von Personenwerten im Allgemeinen zu verzerrten Schätzungen von Zusammenhängen auf Populationsebene führen. In einer Simulationsstudie werden diese Befunde auf ein IRT-Modell für dichotome Indikatoren übertragen. Aus diagnostischer Sicht wird betont, dass Plausible Values nicht zur Schätzung von individuellen Fähigkeitsausprägungen verwendet werden sollen. Abschließend werden methodische Herausforderungen bei der Anwendung der Plausible-Values-Technik sowie das Potential für die psychologische Forschung diskutiert.
Collapse
Affiliation(s)
- Oliver Lüdtke
- Leibniz-Institut für Pädagogik der Naturwissenschaften und Mathematik, Kiel
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), München
| | - Alexander Robitzsch
- Leibniz-Institut für Pädagogik der Naturwissenschaften und Mathematik, Kiel
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), München
| |
Collapse
|
24
|
Affiliation(s)
- Simon Grund
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| |
Collapse
|
25
|
Abstract
Zusammenfassung. Internationale Schulleistungsstudien wie das Programme for International Student Assessment (PISA) dienen den teilnehmenden Ländern zur Feststellung der Leistungsfähigkeit ihrer Schulsysteme. In PISA wird die Zielpopulation (15-jährige Schülerinnen und Schüler) alle 3 Jahre getestet. Von besonderer Bedeutung sind dabei die Trendinformationen, die für die Zielpopulation ausweisen, ob sich ihre Leistungen gegenüber denen aus früheren Erhebungen verändert haben. Um solche Trends valide interpretieren zu können, sollten die PISA-Erhebungen unter möglichst vergleichbaren Bedingungen durchgeführt und die verwendeten statistischen Verfahren vergleichbar bleiben. In PISA 2015 wurde erstmalig computerbasiert getestet; zuvor mittels Papier-und-Bleistift-Tests. Es wurde das Skalierungsmodell verändert und in den Naturwissenschaften wurden neue Aufgabenformate eingesetzt. Im vorliegenden Beitrag gehen wir anhand der nationalen PISA-Stichproben von 2000 bis 2015 der Frage nach, inwiefern der Wechsel des Testmodus und der Wechsel des Skalierungsmodells die Interpretation der Trendschätzungen beeinflussen. Die Analysen belegen, dass die Veränderung von Papier-und-Bleistift-Tests auf Computertestung die Trendschätzung für Deutschland verzerrt haben könnte.
Collapse
Affiliation(s)
- Alexander Robitzsch
- IPN – Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik, Kiel
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), 80333 München
| | - Oliver Lüdtke
- IPN – Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik, Kiel
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), 80333 München
| | - Olaf Köller
- IPN – Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik, Kiel
| | - Ulf Kröhne
- Deutsches Institut für Internationale Pädagogische Forschung (DIPF), Bildungsqualität und Evaluation, Frankfurt am Main
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), 80333 München
| | - Frank Goldhammer
- Deutsches Institut für Internationale Pädagogische Forschung (DIPF), Bildungsqualität und Evaluation, Frankfurt am Main
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), 80333 München
| | - Jörg-Henrik Heine
- Technische Universität München, TUM School of Education
- Zentrum für internationale Bildungsvergleichsstudien (ZIB), 80333 München
| |
Collapse
|
26
|
Hülür G, Gasimova F, Robitzsch A, Wilhelm O. Change in Fluid and Crystallized Intelligence and Student Achievement: The Role of Intellectual Engagement. Child Dev 2017; 89:1074-1087. [PMID: 28369877 DOI: 10.1111/cdev.12791] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Intellectual engagement (IE) refers to enjoyment of intellectual activities and is proposed as causal for knowledge acquisition. The role of IE for cognitive development was examined utilizing 2-year longitudinal data from 112 ninth graders (average baseline age: 14.7 years). Higher baseline IE predicted higher baseline crystallized ability but not changes therein, and was not associated with fluid ability. Furthermore, IE predicted change in school grades in language but not in mathematics grades or in standardized tests. These findings suggest that IE is not a major predictor of knowledge acquisition in adolescence, where degree of self-determination in intellectual behaviors may be relatively limited. Open questions for future research are addressed, including reciprocal longitudinal associations between IE and academic and cognitive development.
Collapse
|
27
|
Lüdtke O, Robitzsch A, Grund S. Multiple imputation of missing data in multilevel designs: A comparison of different strategies. Psychol Methods 2016; 22:141-165. [PMID: 27607544 DOI: 10.1037/met0000096] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Multiple imputation is a widely recommended means of addressing the problem of missing data in psychological research. An often-neglected requirement of this approach is that the imputation model used to generate the imputed values must be at least as general as the analysis model. For multilevel designs in which lower level units (e.g., students) are nested within higher level units (e.g., classrooms), this means that the multilevel structure must be taken into account in the imputation model. In the present article, we compare different strategies for multiply imputing incomplete multilevel data using mathematical derivations and computer simulations. We show that ignoring the multilevel structure in the imputation may lead to substantial negative bias in estimates of intraclass correlations as well as biased estimates of regression coefficients in multilevel models. We also demonstrate that an ad hoc strategy that includes dummy indicators in the imputation model to represent the multilevel structure may be problematic under certain conditions (e.g., small groups, low intraclass correlations). Imputation based on a multivariate linear mixed effects model was the only strategy to produce valid inferences under most of the conditions investigated in the simulation study. Data from an educational psychology research project are also used to illustrate the impact of the various multiple imputation strategies. (PsycINFO Database Record
Collapse
Affiliation(s)
- Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education (IPN)
| | | | - Simon Grund
- Leibniz Institute for Science and Mathematics Education (IPN)
| |
Collapse
|
28
|
Abstract
Abstract. The analysis of variance (ANOVA) is frequently used to examine whether a number of groups differ on a variable of interest. The global hypothesis test of the ANOVA can be reformulated as a regression model in which all group differences are simultaneously tested against zero. Multiple imputation offers reliable and effective treatment of missing data; however, recommendations differ with regard to what procedures are suitable for pooling ANOVA results from multiply imputed datasets. In this article, we compared several procedures (known as D1, D2, and D3) using Monte Carlo simulations. Even though previous recommendations have advocated that D2 should be avoided in favor of D1 or D3, our results suggest that all procedures provide a suitable test of the ANOVA’s global null hypothesis in many plausible research scenarios. In more extreme settings, D1 was most reliable, whereas D2 and D3 suffered from different limitations. We provide guidelines on how the different methods can be applied in one- and two-factorial ANOVA designs and information about the conditions under which some procedures may perform better than others. Computer code is supplied for each method to be used in freely available statistical software.
Collapse
Affiliation(s)
- Simon Grund
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| | - Alexander Robitzsch
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany
- Centre for International Student Assessment, Germany
| |
Collapse
|
29
|
Hildebrandt A, Lüdtke O, Robitzsch A, Sommer C, Wilhelm O. Exploring Factor Model Parameters across Continuous Variables with Local Structural Equation Models. Multivariate Behav Res 2016; 51:257-258. [PMID: 27049892 DOI: 10.1080/00273171.2016.1142856] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Using an empirical data set, we investigated variation in factor model parameters across a continuous moderator variable and demonstrated three modeling approaches: multiple-group mean and covariance structure (MGMCS) analyses, local structural equation modeling (LSEM), and moderated factor analysis (MFA). We focused on how to study variation in factor model parameters as a function of continuous variables such as age, socioeconomic status, ability levels, acculturation, and so forth. Specifically, we formalized the LSEM approach in detail as compared with previous work and investigated its statistical properties with an analytical derivation and a simulation study. We also provide code for the easy implementation of LSEM. The illustration of methods was based on cross-sectional cognitive ability data from individuals ranging in age from 4 to 23 years. Variations in factor loadings across age were examined with regard to the age differentiation hypothesis. LSEM and MFA converged with respect to the conclusions. When there was a broad age range within groups and varying relations between the indicator variables and the common factor across age, MGMCS produced distorted parameter estimates. We discuss the pros of LSEM compared with MFA and recommend using the two tools as complementary approaches for investigating moderation in factor model parameters.
Collapse
Affiliation(s)
| | - Oliver Lüdtke
- b Leibniz Institute for Science and Mathematics Education, Kiel University
- c Centre for International Student Assessment, Technische Universität Mönchen
| | - Alexander Robitzsch
- b Leibniz Institute for Science and Mathematics Education, Kiel University
- c Centre for International Student Assessment, Technische Universität Mönchen
| | | | - Oliver Wilhelm
- e Department of Psychology and Education, Universität Ulm
| |
Collapse
|
30
|
van den Heuvel-Panhuizen M, Elia I, Robitzsch A. Effects of reading picture books on kindergartners' mathematics performance. Educ Psychol (Lond) 2016; 36:323-346. [PMID: 26855457 PMCID: PMC4720050 DOI: 10.1080/01443410.2014.963029] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 08/29/2014] [Indexed: 06/04/2023]
Abstract
This article describes a field experiment with a pretest-posttest control group design which investigated the potential of reading picture books to children for supporting their mathematical understanding. The study involved 384 children from 18 kindergarten classes in 18 schools in the Netherlands. During three months, the children in the nine experimental classes were read picture books. Data analysis revealed that, when controlled for relevant covariates, the picture book reading programme had a positive effect (d = .13) on kindergartners' mathematics performance as measured by a project test containing items on number, measurement and geometry. Compared to the increase from pretest to posttest in the control group, the increase in the experimental group was 22% larger. No significant differential intervention effects were found between subgroups based on kindergarten year, age, home language, socio-economic status and mathematics and language ability, but a significant intervention effect was found for girls and not for boys.
Collapse
Affiliation(s)
- Marja van den Heuvel-Panhuizen
- Faculty of Science & Faculty of Social and Behavioural Sciences, Freudenthal Institute for Science and Mathematics Education, Utrecht University, Utrecht, The Netherlands
| | - Iliada Elia
- Department of Education, University of Cyprus, Nicosia, Cyprus
| | - Alexander Robitzsch
- Federal Institute for Education Research, Innovation and Development of the Austrian School System, Salzburg, Austria
| |
Collapse
|
31
|
|
32
|
|
33
|
|
34
|
Abstract
Multilevel analyses are often used to estimate the effects of group-level constructs. However, when using aggregated individual data (e.g., student ratings) to assess a group-level construct (e.g., classroom climate), the observed group mean might not provide a reliable measure of the unobserved latent group mean. In the present article, we propose a Bayesian approach that can be used to estimate a multilevel latent covariate model, which corrects for the unreliable assessment of the latent group mean when estimating the group-level effect. A simulation study was conducted to evaluate the choice of different priors for the group-level variance of the predictor variable and to compare the Bayesian approach with the maximum likelihood approach implemented in the software Mplus. Results showed that, under problematic conditions (i.e., small number of groups, predictor variable with a small ICC), the Bayesian approach produced more accurate estimates of the group-level effect than the maximum likelihood approach did.
Collapse
Affiliation(s)
| | - Oliver Lüdtke
- c Centre for International Student Assessment, Leibniz Institute for Science and Mathematics Education
| | - Alexander Robitzsch
- b Federal Institute for Education Research, Innovation, and Development of the Austrian School System
| |
Collapse
|
35
|
Bakker M, van den Heuvel-Panhuizen M, Robitzsch A. Effects of playing mathematics computer games on primary school students’ multiplicative reasoning ability. Contemporary Educational Psychology 2015. [DOI: 10.1016/j.cedpsych.2014.09.001] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
36
|
Schroeders U, Robitzsch A, Schipolowski S. A Comparison of Different Psychometric Approaches to Modeling Testlet Structures: An Example with C-Tests. Journal of Educational Measurement 2014. [DOI: 10.1111/jedm.12054] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
| | - Alexander Robitzsch
- Federal Institute for Education Research, Innovation & Development of the Austrian Schooling System (BIFIE Salzburg)
| | - Stefan Schipolowski
- Institute for Educational Quality Improvement; Humboldt-Universität zu Berlin
| |
Collapse
|
37
|
Gasimova F, Robitzsch A, Wilhelm O, Boker SM, Hu Y, Hülür G. Dynamical systems analysis applied to working memory data. Front Psychol 2014; 5:687. [PMID: 25071657 PMCID: PMC4080465 DOI: 10.3389/fpsyg.2014.00687] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 06/15/2014] [Indexed: 11/20/2022] Open
Abstract
In the present paper we investigate weekly fluctuations in the working memory capacity (WMC) assessed over a period of 2 years. We use dynamical system analysis, specifically a second order linear differential equation, to model weekly variability in WMC in a sample of 112 9th graders. In our longitudinal data we use a B-spline imputation method to deal with missing data. The results show a significant negative frequency parameter in the data, indicating a cyclical pattern in weekly memory updating performance across time. We use a multilevel modeling approach to capture individual differences in model parameters and find that a higher initial performance level and a slower improvement at the MU task is associated with a slower frequency of oscillation. Additionally, we conduct a simulation study examining the analysis procedure's performance using different numbers of B-spline knots and values of time delay embedding dimensions. Results show that the number of knots in the B-spline imputation influence accuracy more than the number of embedding dimensions.
Collapse
Affiliation(s)
| | - Alexander Robitzsch
- Federal Institute for Education Research, Innovation and Development of the Austrian Schooling System (BIFIE Salzburg) Salzburg, Austria
| | | | - Steven M Boker
- Department of Psychology, University of Virginia Charlottesville, VA, USA
| | - Yueqin Hu
- Department of Psychology, Texas State University San Marcos, TX, USA
| | - Gizem Hülür
- Department of Psychology, Humboldt University Berlin, Germany
| |
Collapse
|
38
|
Wilhelm O, Hülür G, Gasimova F, Robitzsch A. Correlates and consequences of status and change in intellectual engagement. Personality and Individual Differences 2014. [DOI: 10.1016/j.paid.2013.07.392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
39
|
Gasimova F, Robitzsch A, Wilhelm O, Hülür G. A Hierarchical Bayesian Model With Correlated Residuals for Investigating Stability and Change in Intensive Longitudinal Data Settings. Methodology 2014. [DOI: 10.1027/1614-2241/a000083] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The present paper’s focus is the modeling of interindividual and intraindividual variability in longitudinal data. We propose a hierarchical Bayesian model with correlated residuals, employing an autoregressive parameter AR(1) for focusing on intraindividual variability. The hierarchical model possesses four individual random effects: intercept, slope, variability, and autocorrelation. The performance of the proposed Bayesian estimation is investigated in simulated longitudinal data with three different sample sizes (N = 100, 200, 500) and three different numbers of measurement points (T = 10, 20, 40). The initial simulation values are selected according to the results of the first 20 measurement occasions from a longitudinal study on working memory capacity in 9th graders. Within this simulation study, we investigate the root mean square error (RMSE), bias, relative percentage bias, and the 90% coverage probability of parameter estimates. Results indicate that more accurate estimates are associated with a larger sample size. One exception to this tendency is the autocorrelation parameter, which shows more sensitivity to an increasing number of time points.
Collapse
Affiliation(s)
| | - Alexander Robitzsch
- Federal Institute for Education Research, Innovation & Development of the Austrian Schooling System (BIFIE), Salzburg, Austria
| | | | | |
Collapse
|
40
|
Lüdtke O, Robitzsch A, Kenny DA, Trautwein U. A general and flexible approach to estimating the social relations model using Bayesian methods. Psychol Methods 2012; 18:101-19. [PMID: 22799626 DOI: 10.1037/a0029252] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The social relations model (SRM) is a conceptual, methodological, and analytical approach that is widely used to examine dyadic behaviors and interpersonal perception within groups. This article introduces a general and flexible approach to estimating the parameters of the SRM that is based on Bayesian methods using Markov chain Monte Carlo techniques. The Bayesian approach overcomes several statistical problems that have plagued SRM researchers. First, it provides a single unified approach to estimating SRM parameters that can be easily extended to more specialized models (e.g., measurement models, moderator variables, categorical outcome variables). Second, sampling-based Bayesian methods allow statistically reliable inferences to be made about variance components and correlations, even with small sample sizes. Third, the Bayesian approach is able to handle designs with missing data. In a simulation study, the statistical properties (bias, root-mean-square error, coverage rate) of the parameter estimates produced by the Bayesian approach are compared with those of the method of moment estimates that have been used in previous research. A data example is presented to illustrate how discrete person moderators can be included in SRM analyses using the Bayesian approach. Finally, further extensions of the SRM are discussed, and suggestions for applied research are made.
Collapse
Affiliation(s)
- Oliver Lüdtke
- Department of Psychology, Humboldt University, Berlin, Germany.
| | | | | | | |
Collapse
|
41
|
Abstract
Zusammenfassung. Der vorliegende Beitrag geht der Frage nach, inwieweit sich unterschiedliche inhaltsbezogene und prozessbezogene mathematische Kompetenzen analytisch trennen lassen. Stichproben von N = 10328 und N = 6638 Schülerinnen und Schülern aus 3. und 4. Grundschulklassen bearbeiteten umfangreiche Itempools, die sich fünf inhaltsbezogenen Kompetenzen (Zahlen und Operationen, Raum und Form, Muster und Struktur, Größen und Messen sowie Daten, Häufigkeit und Wahrscheinlichkeit) und sechs prozessbezogenen Kompetenzen (Grundfertigkeiten, Problemlösen, Kommunizieren, Argumentieren, Modellieren sowie Darstellen) zuordnen lassen. Dimensionsanalysen belegen, dass ein Modell mit fünf inhaltsbezogenen Faktoren die Daten am besten abbildet. Die inhaltsbezogenen Skalen erweisen sich als hoch reliabel und Korrelationen mit anderen Instrumenten (z.B. DEMAT 3 und 4) belegen die hohe Validität. Analysen zum Zusammenhang mit Tests zur Erfassung kognitiver Grundfähigkeiten zeigen, dass mathematische Kompetenzen und kognitive Grundfähigkeiten jeweils distinkte Faktoren darstellen. Die Befunde werden im Hinblick auf die Frage diskutiert, welche Konstrukte mit Schulleistungstests erfasst werden.
Collapse
|
42
|
Lüdtke O, Marsh HW, Robitzsch A, Trautwein U. A 2 × 2 taxonomy of multilevel latent contextual models: Accuracy–bias trade-offs in full and partial error correction models. Psychol Methods 2011; 16:444-67. [DOI: 10.1037/a0024376] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
43
|
Hülür G, Wilhelm O, Robitzsch A. Multivariate Veränderungsmodelle für Schulnoten und Schülerleistungen in Deutsch und Mathematik. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie 2011. [DOI: 10.1026/0049-8637/a000051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Zusammenfassung. Schülerleistungen in standardisierten Leistungstests weisen einen mittleren Zusammenhang zu Schulnoten im gleichen Fach auf. Über die Fächergrenzen hinweg sind die Zusammenhänge geringer aber positiv. Über längsschnittliche Zusammenhänge zwischen Schulnoten und Schülerleistungen ist wenig bekannt. In dieser Studie untersuchen wir längsschnittliche Zusammenhänge zwischen Schülerleistungen und Schulnoten in Deutsch und Mathematik. Die vorliegenden Analysen wurden an Prä- und Postmessungen einer Gymnasialstichprobe (N=168) vorgenommen, die zu Studien- und Kontrollgruppen einer intensiv-längsschnittlichen Studie gehörte. Multigruppenanalysen weisen auf vollständige Messinvarianz zwischen beiden Gruppen hin, Testungseffekte zeigen sich nicht. Die längsschnittlichen Zusammenhänge wurden mit dem Change-Score-Modell analysiert. Die Veränderungen in Schülerleistungen in beiden Schulfächern korrelieren nicht signifikant, die Veränderungen in Schulnoten beider Schulfächer korrelieren positiv. In beiden Schulfächern zeigen Veränderungen der Schülerleistungen und der Schulnoten einen positiven Zusammenhang. Die Korrelation ist nur moderat, da Schulnoten aus Gründen wie Referenzrahmeneffekte auch querschnittlich nur moderat mit Schülerleistungen korrelieren. Der Leistungszuwachs lässt sich am effektivsten mit normreferenzierten Skalen quantifizieren.
Collapse
Affiliation(s)
| | | | - Alexander Robitzsch
- Bundesinstitut für Bildungsforschung, Innovation & Entwicklung des österreichischen Schulwesens (BIFIE Salzburg)
| |
Collapse
|
44
|
Robitzsch A, Dörfler T, Pfost M, Artelt C. Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie 2011. [DOI: 10.1026/0049-8637/a000052] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Zusammenfassung. In diesem Beitrag wird die Entwicklung der Lesekompetenz von Grundschülerinnen und Grundschülern anhand des ELFE-Tests ( Lenhard & Schneider, 2005 ) untersucht. Dabei wird die Bedeutung der Itemauswahl im verwendeten Test (sog. Item Sampling) als auch der Auswahl statistischer Modelle (sog. Multi Model Inferenz) im Hinblick auf Effektgrößen der Veränderung diskutiert und deren Variabilität quantifiziert. Es wird argumentiert, dass in einem Konzept der Generalisierbarkeit bei Tests (mindestens) drei Facetten eine wichtige Rolle spielen: die Stichprobenziehung oder Auswahl von Personen, Items und die Wahl statistischer Modelle. Die empirischen Befunde dieses Beitrags zeigen auf, dass die in Publikationen meist vernachlässigten Variationsquellen von Item Sampling und der Modellwahl gegenüber der Stichprobenziehung von Personen nicht vernachlässigbar sind.
Collapse
Affiliation(s)
- Alexander Robitzsch
- Bundesinstitut für Bildungsforschung, Innovation & Entwicklung des österreichischen Schulwesens (BIFIE Salzburg)
| | - Tobias Dörfler
- Lehrstuhl für Empirische Bildungsforschung, Otto-Friedrich-Universität Bamberg
| | - Maximilian Pfost
- Lehrstuhl für Empirische Bildungsforschung, Otto-Friedrich-Universität Bamberg
| | - Cordula Artelt
- Lehrstuhl für Empirische Bildungsforschung, Otto-Friedrich-Universität Bamberg
| |
Collapse
|
45
|
|
46
|
Abstract
According to the age differentiation hypothesis, cognitive abilities become more differentiated with increasing age during childhood. Using data from the German standardization of the SON-R 2½–7 intelligence test, we examined age-related differentiation of cognitive abilities from age 2½ to age 7. The SON-R 2½–7 is a nonverbal intelligence test for children and consists of six subtests. SON-R 2½–7 supposedly has a two-factorial structure, with a reasoning and a performance factor. We used age-weighted measurement models to describe the age gradients of model parameter estimates. In line with the differentiation hypothesis, we observed a decrease in the correlation between both factors with increasing age. We tested the significance of this observed decrease using a permutation test. Participants were allocated age randomly in 1,000 datasets. Age-weighted measurement models were estimated to observe the age gradients of the correlation between the two factors in these datasets. The results of the permutation test show that the decrease in the correlation observed in the real dataset is significant but of small magnitude. The findings provide some support for intelligence differentiation with increasing childhood age.
Collapse
Affiliation(s)
| | | | - Alexander Robitzsch
- Federal Institute for Education Research, Innovation & Development of the Austrian Schooling System (BIFIE Salzburg), Austria
| |
Collapse
|
47
|
Marsh HW, Lüdtke O, Robitzsch A, Trautwein U, Asparouhov T, Muthén B, Nagengast B. Doubly-Latent Models of School Contextual Effects: Integrating Multilevel and Structural Equation Approaches to Control Measurement and Sampling Error. Multivariate Behav Res 2009; 44:764-802. [PMID: 26801796 DOI: 10.1080/00273170903333665] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This article is a methodological-substantive synergy. Methodologically, we demonstrate latent-variable contextual models that integrate structural equation models (with multiple indicators) and multilevel models. These models simultaneously control for and unconfound measurement error due to sampling of items at the individual (L1) and group (L2) levels and sampling error due the sampling of persons in the aggregation of L1 characteristics to form L2 constructs. We consider a set of models that are latent or manifest in relation to sampling items (measurement error) and sampling of persons (sampling error) and discuss when different models might be most useful. We demonstrate the flexibility of these 4 core models by extending them to include random slopes, latent (single-level or cross-level) interactions, and latent quadratic effects. Substantively we use these models to test the big-fish-little-pond effect (BFLPE), showing that individual student levels of academic self-concept (L1-ASC) are positively associated with individual level achievement (L1-ACH) and negatively associated with school-average achievement (L2-ACH)-a finding with important policy implications for the way schools are structured. Extending tests of the BFLPE in new directions, we show that the nonlinear effects of the L1-ACH (a latent quadratic effect) and the interaction between gender and L1-ACH (an L1 × L1 latent interaction) are not significant. Although random-slope models show no significant school-to-school variation in relations between L1-ACH and L1-ASC, the negative effects of L2-ACH (the BFLPE) do vary somewhat with individual L1-ACH. We conclude with implications for diverse applications of the set of latent contextual models, including recommendations about their implementation, effect size estimates (and confidence intervals) appropriate to multilevel models, and directions for further research in contextual effect analysis.
Collapse
Affiliation(s)
| | - Oliver Lüdtke
- b Max Planck Institute for Human Development , Berlin , Germany
| | | | | | | | | | | |
Collapse
|
48
|
Lüdtke O, Robitzsch A, Trautwein U, Kunter M. Assessing the impact of learning environments: How to use student ratings of classroom or school characteristics in multilevel modeling. Contemporary Educational Psychology 2009. [DOI: 10.1016/j.cedpsych.2008.12.001] [Citation(s) in RCA: 264] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
49
|
Lüdtke O, Marsh HW, Robitzsch A, Trautwein U, Asparouhov T, Muthén B. The multilevel latent covariate model: a new, more reliable approach to group-level effects in contextual studies. Psychol Methods 2008; 13:203-29. [PMID: 18778152 DOI: 10.1037/a0012869] [Citation(s) in RCA: 260] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In multilevel modeling (MLM), group-level (L2) characteristics are often measured by aggregating individual-level (L1) characteristics within each group so as to assess contextual effects (e.g., group-average effects of socioeconomic status, achievement, climate). Most previous applications have used a multilevel manifest covariate (MMC) approach, in which the observed (manifest) group mean is assumed to be perfectly reliable. This article demonstrates mathematically and with simulation results that this MMC approach can result in substantially biased estimates of contextual effects and can substantially underestimate the associated standard errors, depending on the number of L1 individuals per group, the number of groups, the intraclass correlation, the sampling ratio (the percentage of cases within each group sampled), and the nature of the data. To address this pervasive problem, the authors introduce a new multilevel latent covariate (MLC) approach that corrects for unreliability at L2 and results in unbiased estimates of L2 constructs under appropriate conditions. However, under some circumstances when the sampling ratio approaches 100%, the MMC approach provides more accurate estimates. Based on 3 simulations and 2 real-data applications, the authors evaluate the MMC and MLC approaches and suggest when researchers should most appropriately use one, the other, or a combination of both approaches.
Collapse
Affiliation(s)
- Oliver Lüdtke
- Center for Educational Research, Max Planck Institute for Human Development, Berlin, Germany.
| | | | | | | | | | | |
Collapse
|
50
|
Lüdtke O, Robitzsch A, Trautwein U, Köller O. Steht Transparenz einer adäquaten Datenauswertung im Wege? Psychologische Rundschau 2008. [DOI: 10.1026/0012-1924.59.3.180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|