1
|
Palmer AR, Stallworthy IC, DeJoseph ML, Berry D. Early childhood measurement invariance of the Strengths and Difficulties Questionnaire across age, race, sex, and socioeconomic status. Psychol Assess 2025; 37:201-213. [PMID: 40063397 PMCID: PMC12011538 DOI: 10.1037/pas0001372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2025]
Abstract
Research suggests there are differences in children's internalizing and externalizing symptoms as a function of age, race, sex, and socioeconomic status (SES). Males, Black children, and children experiencing lower SES have been rated as having more externalizing problems. Female and older children have been rated as having higher internalizing symptoms. The validity of these findings rests on the assumption that the measures mean the same thing across groups and developmental time (i.e., measurement invariance [MI]). Without assuring MI, results may represent differences in measurement and not true differences in the underlying construct. The Strengths and Difficulties Questionnaire (SDQ) is a widely used tool to measure internalizing and externalizing symptoms. Papers have evaluated MI of the SDQ in school-aged children. However, to our knowledge, no studies of young children have examined MI across Black and White families from diverse SES backgrounds. Data from the Family Life Project were used to evaluate MI of the SDQ across child age (35 to 90 months), race, sex, and SES. Using moderated nonlinear factor analysis (MNLFA), multiple SDQ items demonstrated measurement noninvariance as a function of child demographic variables. Results suggest that it is important to test and adjust for noninvariance with the SDQ when applied to early childhood populations comprising Black and White children from diverse SES backgrounds. An MNLFA approach improves our ability to validly measure and compare symptoms of psychopathology in diverse early childhood populations. This could have implications for our understanding of rates of mental health challenges and treatment in early childhood. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
Collapse
Affiliation(s)
- Alyssa R. Palmer
- Institute of Child Development, University of Minnesota
- Department of Psychiatry and Biobehavioral Sciences, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles
| | - Isabella C. Stallworthy
- Institute of Child Development, University of Minnesota
- Department of Bioengineering, University of Pennsylvania
| | - Meriah L. DeJoseph
- Institute of Child Development, University of Minnesota
- Graduate School of Education, Stanford University
| | - Daniel Berry
- Institute of Child Development, University of Minnesota
| |
Collapse
|
2
|
McNeish D. Practical Implications of Sum Scores Being Psychometrics' Greatest Accomplishment. PSYCHOMETRIKA 2024; 89:1148-1169. [PMID: 39031300 DOI: 10.1007/s11336-024-09988-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Indexed: 07/22/2024]
Abstract
This paper reflects on some practical implications of the excellent treatment of sum scoring and classical test theory (CTT) by Sijtsma et al. (Psychometrika 89(1):84-117, 2024). I have no major disagreements about the content they present and found it to be an informative clarification of the properties and possible extensions of CTT. In this paper, I focus on whether sum scores-despite their mathematical justification-are positioned to improve psychometric practice in empirical studies in psychology, education, and adjacent areas. First, I summarize recent reviews of psychometric practice in empirical studies, subsequent calls for greater psychometric transparency and validity, and how sum scores may or may not be positioned to adhere to such calls. Second, I consider limitations of sum scores for prediction, especially in the presence of common features like ordinal or Likert response scales, multidimensional constructs, and moderated or heterogeneous associations. Third, I review previous research outlining potential limitations of using sum scores as outcomes in subsequent analyses where rank ordering is not always sufficient to successfully characterize group differences or change over time. Fourth, I cover potential challenges for providing validity evidence for whether sum scores represent a single construct, particularly if one wishes to maintain minimal CTT assumptions. I conclude with thoughts about whether sum scores-even if mathematically justified-are positioned to improve psychometric practice in empirical studies.
Collapse
Affiliation(s)
- Daniel McNeish
- Department of Psychology, Arizona State University, PO Box 871104, Tempe, AZ, 85287, USA.
| |
Collapse
|
3
|
Aitken M, Plamondon A, Krzeczkowski J, Kil H, Andrade BF. Systematic Integration of Multi-Informant Externalizing Ratings in Clinical Settings. Res Child Adolesc Psychopathol 2024; 52:635-644. [PMID: 37787879 DOI: 10.1007/s10802-023-01119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2023] [Indexed: 10/04/2023]
Abstract
Best practice clinical assessment of externalizing problems often necessitates collection of information from parents, youth themselves, and teachers. The present study tested the predictive validity of a psychometrically-driven scoring procedure to integrate multi-informant, dimensional ratings of externalizing problems. Participants were 2264 clinic-referred youth ages 6-18. Parents, teachers, and youth completed questionnaire ratings of externalizing problems (hyperactivity-inattention, conduct problems, and oppositionality-defiance) prior to an initial clinical appointment. The predictive validity of simple (highest informant rating; and all informant ratings separately) and more complex (latent S-1 bifactor model with specific informant factors; and moderated nonlinear factor analysis accounting for child age and sex) methods of informant integration was tested in predicting impairment, comorbidity, and number of clinical encounters. A simple model, in which all informant ratings were included, showed the best predictive validity across outcomes, performing as well or better than the use of the highest informant ratings or more complex latent variable models. The addition of child age and sex as moderators in the factor model did not improve predictive validity. Each informant (parent, teacher, and youth) contributes important information to the prediction of clinically-relevant outcomes. There is insufficient evidence at present to suggest that complex latent variable models should be favored over simpler models that preserve each informant's ratings.
Collapse
Affiliation(s)
- Madison Aitken
- Centre for Addiction and Mental Health, Toronto, Canada.
- Department of Psychiatry, University of Toronto, Toronto, Canada.
- Department of Psychology, York University, Toronto, Canada.
| | - André Plamondon
- Département des Fondements et Pratiques en Éducation, Université Laval, Québec, Canada
| | - John Krzeczkowski
- Department of Health Sciences, Brock University, Saint Catherine's, Canada
| | - Hali Kil
- Department of Psychology, Simon Fraser University, Burnaby, Canada
| | - Brendan F Andrade
- Centre for Addiction and Mental Health, Toronto, Canada
- Department of Psychiatry, University of Toronto, Toronto, Canada
| |
Collapse
|
4
|
Fish GA, Leite WL. Unreliable Continuous Treatment Indicators in Propensity Score Analysis. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:187-205. [PMID: 37524119 DOI: 10.1080/00273171.2023.2235697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]
Abstract
Propensity score analyses (PSA) of continuous treatments often operationalize the treatment as a multi-indicator composite, and its composite reliability is unreported. Latent variables or factor scores accounting for this unreliability are seldom used as alternatives to composites. This study examines the effects of the unreliability of indicators of a latent treatment in PSA using the generalized propensity score (GPS). A Monte Carlo simulation study was conducted varying composite reliability, continuous treatment representation, variability of factor loadings, sample size, and number of treatment indicators to assess whether Average Treatment Effect (ATE) estimates differed in their relative bias, Root Mean Squared Error, and coverage rates. Results indicate that low composite reliability leads to underestimation of the ATE of latent continuous treatments, while the number of treatment indicators and variability of factor loadings show little effect on ATE estimates, after controlling for overall composite reliability. The results also show that, in correctly specified GPS models, the effects of low composite reliability can be somewhat ameliorated by using factor scores that were estimated including covariates. An illustrative example is provided using survey data to estimate the effect of teacher adoption of a workbook related to a virtual learning environment in the classroom.
Collapse
Affiliation(s)
- Gail A Fish
- Strategic Research Development, UF Research, University of Florida
| | - Walter L Leite
- School of Human Development and Organizational Studies, College of Education, University of Florida
| |
Collapse
|
5
|
McNeish D. Psychometric properties of sum scores and factor scores differ even when their correlation is 0.98: A response to Widaman and Revelle. Behav Res Methods 2023; 55:4269-4290. [PMID: 36394821 DOI: 10.3758/s13428-022-02016-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Commentary in Widaman and Revelle (2022) argued that sum scoring is justified as long as unidimensionality holds because sum score reliability is defined. My response begins with a review of the literature supporting the perspective we adopted in the original article. I then conduct simulation studies to assess the psychometric properties of sum scores created using Widaman and Revelle's justification relative to scores created by the weighted factor score approach in the original article. In my simulations, I generate data where sum and factor scores are correlated at 0.96 or 0.98 because high factor-sum score correlations are often used to support the contention that sum and factor scores have interchangeable psychometric properties. I explore (a) correlations between estimated scores and true scores, (b) classification accuracy of sum and factor scores, and (c) reliability of sum and factor scores. Results show that factor scores have (a) higher correlations with true scores (Δ = 0.02-0.04), (b) higher sensitivity (Δ = 4-8 percentage points), and (c) higher reliability (Δ = 0.04-0.07). Factor score performance metrics also have less sampling variability in most conditions. Psychometric properties of sum scores-even when highly correlated with factor scores-remain less desirable than those of factor scores. Additional considerations like models with multiple factors and measurement invariance are also discussed. Essentially, even if accepting Widaman and Revelle's justification for sum scoring, it is uncertain whether researchers generally would want to sum score after fitting a factor analysis unless sum and factor scores correlate at (and not merely close to) 1.00.
Collapse
Affiliation(s)
- Daniel McNeish
- Department of Psychology, Arizona State University, PO Box 871104, Tempe, AZ, 85287, USA.
| |
Collapse
|
6
|
Cole VT, Hussong AM, Gottfredson NC, Bauer DJ, Curran PJ. Informing Harmonization Decisions in Integrative Data Analysis: Exploring the Measurement Multiverse. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2023; 24:1595-1607. [PMID: 36441362 DOI: 10.1007/s11121-022-01466-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 11/29/2022]
Abstract
Combining datasets in an integrative data analysis (IDA) requires researchers to make a number of decisions about how best to harmonize item responses across datasets. This entails two sets of steps: logical harmonization, which involves combining items which appear similar across datasets, and analytic harmonization, which involves using psychometric models to find and account for cross-study differences in measurement. Embedded in logical and analytic harmonization are many decisions, from deciding whether items can be combined prima facie to how best to find covariate effects on specific items. Researchers may not have specific hypotheses about these decisions, and each individual choice may seem arbitrary, but the cumulative effects of these decisions are unknown. In the current study, we conducted an IDA of the relationship between alcohol use and delinquency using three datasets (total N = 2245). For analytic harmonization, we used moderated nonlinear factor analysis (MNLFA) to generate factor scores for delinquency. We conducted both logical and analytic harmonization 72 times, each time making a different set of decisions. We assessed the cumulative influence of these decisions on MNLFA parameter estimates, factor scores, and estimates of the relationship between delinquency and alcohol use. There were differences across paths in MNLFA parameter estimates, but fewer differences in estimates of factor scores and regression parameters linking delinquency to alcohol use. These results suggest that factor scores may be relatively robust to subtly different decisions in data harmonization, and measurement model parameters are less so.
Collapse
Affiliation(s)
- Veronica T Cole
- Department of Psychology, Wake Forest University, 1834 Wake Forest Road, Winston-Salem, NC, 27109, USA.
| | - Andrea M Hussong
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Nisha C Gottfredson
- Department of Health Behavior, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Daniel J Bauer
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Patrick J Curran
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
7
|
Bauer DJ. Enhancing measurement validity in diverse populations: Modern approaches to evaluating differential item functioning. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2023; 76:435-461. [PMID: 37431154 DOI: 10.1111/bmsp.12316] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/05/2023] [Accepted: 06/09/2023] [Indexed: 07/12/2023]
Abstract
When developing and evaluating psychometric measures, a key concern is to ensure that they accurately capture individual differences on the intended construct across the entire population of interest. Inaccurate assessments of individual differences can occur when responses to some items reflect not only the intended construct but also construct-irrelevant characteristics, like a person's race or sex. Unaccounted for, this item bias can lead to apparent differences on the scores that do not reflect true differences, invalidating comparisons between people with different backgrounds. Accordingly, empirically identifying which items manifest bias through the evaluation of differential item functioning (DIF) has been a longstanding focus of much psychometric research. The majority of this work has focused on evaluating DIF across two (or a few) groups. Modern conceptualizations of identity, however, emphasize its multi-determined and intersectional nature, with some aspects better represented as dimensional than categorical. Fortunately, many model-based approaches to modelling DIF now exist that allow for simultaneous evaluation of multiple background variables, including both continuous and categorical variables, and potential interactions among background variables. This paper provides a comparative, integrative review of these new approaches to modelling DIF and clarifies both the opportunities and challenges associated with their application in psychometric research.
Collapse
Affiliation(s)
- Daniel J Bauer
- Department of Psychology and Neuroscience, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
8
|
Janulis P, Luo J, Tang X, Schalet BD. Can severity of substance use be measured across drug classes? Estimating differential item functioning by drug class in two general measures of substance use severity. Drug Alcohol Depend 2023; 250:110877. [PMID: 37441960 PMCID: PMC10530475 DOI: 10.1016/j.drugalcdep.2023.110877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 05/31/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023]
Abstract
BACKGROUND Substance use severity is frequently measured using generic (i.e., non-drug specific) items. Yet, the measurement properties of these items must be evaluated for measurement invariance across inidividuals who use differing substances to ensure total scores can be compared across groups. METHOD This study used data from two independent samples (n1 = 474; n2 = 5183) and two measures of general substance use severity with generic items, the Patient Reported Outcomes Measurement Information System (PROMIS) Severity of Substance Use and DAST-10, to examine for differential item functioning (DIF) across substances (i.e., sedatives, opioids, amphetamines, cocaine, and cannabis). We utilized moderated nonlinear factor analysis to estimate DIF. Finally, we compared factor scores across estimation methods with and without accounting for DIF to examine the impact of DIF. RESULTS A minority of items showed statistically significant DIF in each scale (Items with DIF: PROMIS Sample 1: 5/37; PROMIS Sample 2: 7/20; DAST-10 Sample 2: 3/10). Factor scores across scoring methods showed extremely high correlations (0.994 - 0.999), estimates of mean differences across substance groups did not vary considerably across scoring methods, but measurement differences were correlated with factor scores. DISCUSSION These findings suggest that these two measures of substance use severity can be used across individuals using different substances. Factor scores appear similar across scoring methods and mean differences do not appear to be substantially biases. Measures with generic items may offer a parsimonious alternative to measures with drug specific items but more research is needed to evaluate the robustness of these findings.
Collapse
Affiliation(s)
- Patrick Janulis
- Northwestern University, Department of Medical Social Sciences, United States; Northwestern University, Institute for Sexual and Gender Minority Health and Wellbeing, United States.
| | - Jing Luo
- Northwestern University, Department of Medical Social Sciences, United States
| | - Xiaodan Tang
- Northwestern University, Department of Medical Social Sciences, United States
| | - Benjamin D Schalet
- Amsterdam University Medical Centers, Department of Epidemiology and Data Science, The Netherlands; Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, The Netherlands
| |
Collapse
|
9
|
Padrutt ER, DeJoseph ML, Wilson S, Mills-Koonce R, Berry D. Measurement invariance of maternal depressive symptoms across the first 2 years since birth and across racial group, education, income, primiparity, and age. Psychol Assess 2023; 35:646-658. [PMID: 37227837 PMCID: PMC10718185 DOI: 10.1037/pas0001242] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Up to 19% of postpartum mothers experience depressive symptoms, which are associated with infant development. Thus, research examining postpartum depression has implications for mothers' and infants' well-being. However, this research relies on the often-untested assumption of measurement invariance-that measures capture the same construct across time and sociodemographic characteristics. In the absence of invariance, measurement bias may confound differences across time and group, contributing to invalid inferences. In a sociodemographically diverse (40.7% African American, 58.9% White; 67.9% below two times the federal poverty line; 19.4% with less than high school education), rural, longitudinal sample (N = 1,275) of mothers, we used moderated nonlinear factor analysis (MNLFA) to examine measurement invariance of the Brief Symptom Inventory-18 (BSI-18) Depressive Symptoms subscale across time since birth, racial group, education, income, primiparity, and maternal age at childbirth. We identified evidence of differential item functioning (DIF; i.e., measurement noninvariance) as a function of racial group and education. Subsequent analyses indicated, however, that the DIF-induced bias had minimal impacts on substantive comparisons examining change over time since birth and group differences. Thus, the presence of measurement noninvariance does not appear to bias substantive comparisons using the BSI-18 Depressive Symptoms subscale across the first 2 years since birth in a sample comprising primarily African American and White mothers living in predominately rural, low-income communities. This study demonstrates the importance of assessing measurement invariance and highlights MNLFA for evaluating the impact of noninvariance as a preliminary step that increases confidence in the validity of substantive inferences. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | | | - Sylia Wilson
- Institute of Child Development, University of Minnesota
| | - Roger Mills-Koonce
- Department of Human Development and Family Studies, University of North Carolina at Chapel Hill
| | - Daniel Berry
- Institute of Child Development, University of Minnesota
| |
Collapse
|
10
|
Willoughby MT, Camerota M, King KM, Nduku T, Piper B. Leveraging item-level accuracy and reaction time to address ceiling effects in the measurement of inhibitory control in preschool-aged children. Front Psychol 2023; 14:861441. [PMID: 36818066 PMCID: PMC9937058 DOI: 10.3389/fpsyg.2023.861441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 01/18/2023] [Indexed: 02/05/2023] Open
Abstract
Preschool-aged children's performance on inhibitory control tasks is typically represented by the overall accuracy of their item responses (e.g., mean proportion correct). However, in settings where children vary widely in age or ability level, inhibitory control tasks are susceptible to ceiling effects, which undermine measurement precision. We have previously demonstrated a general approach for scoring inhibitory control tasks that combines item-level accuracy and reaction-time information to minimize ceiling effects. Here, we extend that approach by incorporating additional item-level reaction time data from an adjunct (simple reaction time) task. We contrast three approaches for scoring inhibitory control tasks, two of which rely exclusively on item accuracy information and a third which also considers item reaction time information. We demonstrate the impacts of these different approaches to scoring with two inhibitory control tasks that were included in a recent evaluation of the Red Light, Purple Light intervention in preprimary classrooms in Nairobi County, Kenya. We limited our study to children who met inclusion criteria at pre-test (N = 418; 51% male; mean age = 4.8 years) or post-test (N = 386; 51% male; mean age = 4.8 years). Children's performance on individual inhibitory control tasks was strongly correlated regardless of the scoring approach (rs = 0.73-0.97 across two tasks). However, the combined accuracy and reaction time scores eliminated ceiling effects that were common when only accuracy information was used. The combined accuracy and reaction time models also distinguished item-level RT into inhibitory control and processing speed components, which are distinct constructs. Results are discussed with respect to the challenges and nuances of the estimation and interpretation of inhibitory control task scores with children of varied ages and ability levels.
Collapse
Affiliation(s)
- Michael T. Willoughby
- Education and Workforce Development, RTI International, Research Triangle Park, NC, United States,*Correspondence: Michael T. Willoughby, ✉
| | - Marie Camerota
- Department of Psychiatry and Human Behavior, Alpert Medical School of Brown University, Providence, RI, United States
| | | | - Tabitha Nduku
- International Education, RTI International, Nairobi, Kenya
| | - Benjamin Piper
- International Education, RTI International, Nairobi, Kenya
| |
Collapse
|
11
|
Kush JM, Masyn KE, Amin-Esmaeili M, Susukida R, Wilcox HC, Musci RJ. Utilizing Moderated Non-linear Factor Analysis Models for Integrative Data Analysis: A Tutorial. STRUCTURAL EQUATION MODELING : A MULTIDISCIPLINARY JOURNAL 2022; 30:149-164. [PMID: 36818015 PMCID: PMC9937431 DOI: 10.1080/10705511.2022.2070753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 06/18/2023]
Abstract
Integrative data analysis (IDA) is an analytic tool that allows researchers to combine raw data across multiple, independent studies, providing improved measurement of latent constructs as compared to single study analysis or meta-analyses. This is often achieved through implementation of moderated nonlinear factor analysis (MNLFA), an advanced modeling approach that allows for covariate moderation of item and factor parameters. The current paper provides an overview of this modeling technique, highlighting distinct advantages most apt for IDA. We further illustrate the complex modeling building process involved in MNLFA by providing a tutorial using empirical data from five separate prevention trials. The code and data used for analyses are also provided.
Collapse
|
12
|
Barker DH, Dahabreh IJ, Steingrimsson JA, Houck C, Donenberg G, DiClemente R, Brown LK. Causally Interpretable Meta-analysis: Application in Adolescent HIV Prevention. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2022; 23:403-414. [PMID: 34241752 PMCID: PMC8742835 DOI: 10.1007/s11121-021-01270-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/15/2021] [Indexed: 12/30/2022]
Abstract
Endowing meta-analytic results with a causal interpretation is challenging when there are differences in the distribution of effect modifiers among the populations underlying the included trials and the target population where the results of the meta-analysis will be applied. Recent work on transportability methods has described identifiability conditions under which the collection of randomized trials in a meta-analysis can be used to draw causal inferences about the target population. When the conditions hold, the methods enable estimation of causal quantities such as the average treatment effect and conditional average treatment effect in target populations that differ from the populations underlying the trial samples. The methods also facilitate comparison of treatments not directly compared in a head-to-head trial and assessment of comparative effectiveness within subgroups of the target population. We briefly describe these methods and present a worked example using individual participant data from three HIV prevention trials among adolescents in mental health care. We describe practical challenges in defining the target population, obtaining individual participant data from included trials and a sample of the target population, and addressing systematic missing data across datasets. When fully realized, methods for causally interpretable meta-analysis can provide decision-makers valid estimates of how treatments will work in target populations of substantive interest as well as in subgroups of these populations.
Collapse
Affiliation(s)
- David H Barker
- Department of Psychiatry, Rhode Island Hospital, Providence, RI, USA.
- Department of Psychiatry and Human Behavior, The Warren Alpert Medical School of Brown University, Providence, RI, USA.
| | - Issa J Dahabreh
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Christopher Houck
- Department of Psychiatry, Rhode Island Hospital, Providence, RI, USA
- Department of Psychiatry and Human Behavior, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Geri Donenberg
- School of Public Health, University of Illinois At Chicago, Chicago, IL, USA
| | - Ralph DiClemente
- New York University College of Global Public Health, New York, NY, USA
| | - Larry K Brown
- Department of Psychiatry, Rhode Island Hospital, Providence, RI, USA
- Department of Psychiatry and Human Behavior, The Warren Alpert Medical School of Brown University, Providence, RI, USA
| |
Collapse
|
13
|
Detection of Fake Job Postings by Utilizing Machine Learning and Natural Language Processing Approaches. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10727-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Kruger ES, Serier KN, Pfund RA, McKay JR, Witkiewitz K. Integrative data analysis of self-efficacy in 4 clinical trials for alcohol use disorder. Alcohol Clin Exp Res 2021; 45:2347-2356. [PMID: 34523721 DOI: 10.1111/acer.14713] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 09/06/2021] [Accepted: 09/09/2021] [Indexed: 01/03/2023]
Abstract
BACKGROUND Self-efficacy has been proposed as a key predictor of alcohol treatment outcomes and a potential mechanism of success in achieving abstinence or drinking reductions following alcohol treatment. Integrative data analysis, where data from multiple studies are combined for analyses, can be used to synthesize analyses across multiple alcohol treatment trials by creating a commensurate measure and controlling for differential item functioning (DIF) to determine whether alcohol treatments improve self-efficacy. METHOD The current study used moderated nonlinear factor analysis (MNLFA) to examine the effect of treatment on self-efficacy across four different treatment studies (N = 3720; 72.5% male, 68.4% non-Hispanic white). Self-efficacy was measured using the Alcohol Abstinence Self-Efficacy Scale (AASE) in the COMBINE Study (n = 1383) and Project MATCH (n = 1726), and the Drug Taking Confidence Questionnaire (DTCQ) in two studies of Telephone Continuing Care (TEL Study 1: n = 303; TEL Study 2: n = 212). DIF was examined across time, study, treatment condition, marital status, age, and sex. RESULTS We identified 12 items from the AASE and DTCQ to create a commensurate measure of self-efficacy using MNLFA. All active treatments, including cognitive-behavioral treatment, a combined behavioral intervention, medication management, motivation enhancement treatment, telephone continuing care, twelve-step facilitation, and relapse prevention, were associated with significant increases in self-efficacy from baseline to posttreatment that were maintained for up to a year. Importantly, treatment as usual in community settings, which consisted of weekly group therapy that included addiction counseling and twelve-step recovery support, was not associated with significant increases in self-efficacy. CONCLUSIONS Alcohol self-efficacy increases following treatment and numerous evidence-based treatments are associated with significant increases in self-efficacy, which are maintained over time. Community treatment that focuses solely on addiction counseling and twelve-step support may not promote increases in self-efficacy.
Collapse
Affiliation(s)
- Eric S Kruger
- University of New Mexico, Albuquerque, New Mexico, USA
| | | | - Rory A Pfund
- University of New Mexico, Albuquerque, New Mexico, USA
| | - James R McKay
- University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Philadelphia VA Medical Center, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
15
|
Hawrilenko M, Masyn KE, Cerutti J, Dunn EC. Individual Differences in the Stability and Change of Childhood Depression: A Growth Mixture Model With Structured Residuals. Child Dev 2021; 92:e343-e363. [PMID: 33423273 DOI: 10.1111/cdev.13502] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Studies of developmental trajectories of depression are important for understanding depression etiology. Existing studies have been limited by short time frames and no studies have explored a key factor: differential patterns of responding to life events. This article introduces a novel analytic technique, growth mixture modeling with structured residuals, to examine the course of youth depression in a large, prospective cohort (N = 11,641, ages 4-16.5, 96% White). Age-specific critical points were identified at ages 8 and 13 where depression symptoms spiked for a minority of children. Most depression risk was due to dynamic responses to environmental events, drawn not from a small pool of persistently depressed children, but a larger pool of children who varied across higher and lower symptom levels.
Collapse
Affiliation(s)
| | | | | | - Erin C Dunn
- Massachusetts General Hospital.,Harvard Medical School.,Center on the Developing Child at Harvard University
| |
Collapse
|
16
|
DeJoseph ML, Sifre RD, Raver CC, Blair CB, Berry D. Capturing Environmental Dimensions of Adversity and Resources in the Context of Poverty Across Infancy Through Early Adolescence: A Moderated Nonlinear Factor Model. Child Dev 2021; 92:e457-e475. [PMID: 33411404 DOI: 10.1111/cdev.13504] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Income, education, and cumulative-risk indices likely obscure meaningful heterogeneity in the mechanisms through which poverty impacts child outcomes. This study draws from contemporary theory to specify multiple dimensions of poverty-related adversity and resources, with the aim of better capturing these nuances. Using data from the Family Life Project (N = 1,292), we leveraged moderated nonlinear factor analysis (Bauer, 2017) to establish group- and longitudinally invariant environmental measures from infancy to early adolescence. Results indicated three latent factors-material deprivation, psychosocial threat, and sociocognitive resources-were distinct from each other and from family income. Each was largely invariant across site, racial group, and development and showed convergent and discriminant relations with age-twelve criterion measures. Implications for ensuring socioculturally valid measurements of poverty are discussed.
Collapse
Affiliation(s)
| | | | | | - Clancy B Blair
- New York University.,New York University School of Medicine
| | | |
Collapse
|
17
|
Curran PJ, Georgeson AR, Bauer DJ, Hussong AM. Psychometric Models for Scoring Multiple Reporter Assessments: Applications to Integrative Data Analysis in Prevention Science and Beyond. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021; 45:40-50. [PMID: 33758447 DOI: 10.1177/0165025419896620] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Conducting valid and reliable empirical research in the prevention sciences is an inherently difficult and challenging task. Chief among these is the need to obtain numerical scores of underlying theoretical constructs for use in subsequent analysis. This challenge is further exacerbated by the increasingly common need to consider multiple reporter assessments, particularly when using integrative data analysis to fit models to data that have been pooled across two or more independent samples. The current paper uses both simulated and real data to examine the utility of a recently proposed psychometric model for multiple reporter data called the trifactor model (TFM) in settings that might be commonly found in prevention research. Results suggest that numerical scores obtained using the TFM are superior to more traditional methods, particularly when pooling samples that contribute different reporter perspectives.
Collapse
|
18
|
Yen KT, Cherng S. Secondary Prevention of Depressive Prodrome in Adolescents: Before and After Attending a Jogging Program on Campus. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E7705. [PMID: 33105575 PMCID: PMC7659965 DOI: 10.3390/ijerph17217705] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 09/27/2020] [Accepted: 10/20/2020] [Indexed: 12/13/2022]
Abstract
The adolescent depressive prodrome has been conceptualized as an early integrated sign of depressive symptoms, which may develop to a first episode of depression or return to normal for the adolescents. In this study, depressive prodrome presented the early self-rated depressive symptoms for the sample participants. By referring to the Kutcher Adolescent Depression Scale and the psychometric characteristics of the Adolescent Depression Scale (ADR), we proposed a self-rated questionnaire to assess the severity of the depressive symptoms in adolescents before and after attending the jogging program on a high school campus in Taiwan. With the parental co-signature and self-signed informed consent form, 284 high school students under the average age of 15 years, participated in this study in March 2019. Through the software of IBMSPSS 25, we used a binary logistic model, principal component analysis (PCA), multiple-dimensional analysis, and receiver operating characteristic curve (ROC) to analyze the severity of the depressive prodrome via the threshold severity score (SC) and false positive rate (FPR). Findings revealed that attending the 15-week jogging program (3 times a week, 45 min each) on campus can change the severity status and reduce the prevalence of moderate-severe depressive prodrome by 26%. The two-dimensional approach identified three symptoms, which were the crying spell, loss of pleasure doing daily activities, and feeling the decline in memory. They kept being invariant symptoms during the course of depressive prodrome assessment for sample participants. In this study, the campus jogging program appeared to be able to affect the FPR of the measure of depressive prodrome. Compared with the subthreshold depression, the depressive prodrome emphasized the assessment from the view of the secondary prevention by representing the change from a person's premorbid functioning up until the first onset of depression or returning to normal. However, the subthreshold depression is a form of minor depression according to DSM-5 criteria varying on the number of symptoms and duration required, highly prevalent in the concern of primary care.
Collapse
Affiliation(s)
- Ke Tien Yen
- Department of Leisure and Sports Management, Chengshiu University, Kaohsiung 83347, Taiwan;
- Center for Environmental Toxin and Emerging-Contaminant Research, Chengshiu University, Kaohsiung 83347, Taiwan
| | - Shen Cherng
- Department of Computer Science and Information Engineering, Chengshiu University, Kaohsiung 83347, Taiwan
| |
Collapse
|
19
|
Harmonizing altered measures in integrative data analysis: A methods analogue study. Behav Res Methods 2020; 53:1031-1045. [PMID: 32939683 DOI: 10.3758/s13428-020-01472-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In the current study, we used an analogue integrative data analysis (IDA) design to test optimal scoring strategies for harmonizing alcohol- and drug-use consequence measures with varying degrees of alteration across four study conditions. We evaluated performance of mean, confirmatory factor analysis (CFA), and moderated nonlinear factor analysis (MNLFA) scores based on traditional indices of reliability (test-retest, internal, and score recovery or parallel forms) and validity. Participants in the analogue study included 854 college students (46% male; 21% African American, 5% Hispanic/Latino, 56% European American) who completed two versions of the altered measures at two sessions, separated by 2 weeks. As expected, mean, CFA, and MNLFA scores all resulted in scales with lower reliability given increasing scale alteration (with less fidelity to formerly developed scales) and shorter scale length. MNLFA and CFA scores, however, showed greater validity than mean scores, demonstrating stronger relationships with external correlates. Implications for measurement harmonization in the context of IDA are discussed.
Collapse
|
20
|
Falk CF, Ju U. Estimation of Response Styles Using the Multidimensional Nominal Response Model: A Tutorial and Comparison With Sum Scores. Front Psychol 2020; 11:72. [PMID: 32116902 PMCID: PMC7017717 DOI: 10.3389/fpsyg.2020.00072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 01/10/2020] [Indexed: 11/16/2022] Open
Abstract
Recent years have seen a dramatic increase in item response models for measuring response styles on Likert-type items. These model-based approaches stand in contrast to traditional sum-score-based methods where researchers count the number of times that participants selected certain response options. The multidimensional nominal response model (MNRM) offers a flexible model-based approach that may be intuitive to those familiar with sum score approaches. This paper presents a tutorial on the model along with code for estimating it using three different software packages: flexMIRT®, mirt, and Mplus. We focus on specification and interpretation of response functions. In addition, we provide analytical details on how sum score to scale score conversion can be done with the MNRM. In the context of a real data example, three different scoring approaches are then compared. This example illustrates how sum-score-based approaches can sometimes yield scores that are confounded with substantive content. We expect that the current paper will facilitate further investigations as to whether different substantive conclusions are reached under alternative approaches to measuring response styles.
Collapse
Affiliation(s)
- Carl F Falk
- Department of Psychology, McGill University, Montreal, QC, Canada
| | - Unhee Ju
- Riverside Insights, Itasca, IL, United States
| |
Collapse
|
21
|
Bauer DJ, Belzak WCM, Cole V. Simplifying the Assessment of Measurement Invariance over Multiple Background Variables: Using Regularized Moderated Nonlinear Factor Analysis to Detect Differential Item Functioning. STRUCTURAL EQUATION MODELING : A MULTIDISCIPLINARY JOURNAL 2019; 27:43-55. [PMID: 33132679 PMCID: PMC7596881 DOI: 10.1080/10705511.2019.1642754] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Determining whether measures are equally valid for all individuals is a core component of psychometric analysis. Traditionally, the evaluation of measurement invariance (MI) involves comparing independent groups defined by a single categorical covariate (e.g., men and women) to determine if there are any items that display differential item functioning (DIF). More recently, Moderated Nonlinear Factor Analysis (MNLFA) has been advanced as an approach for evaluating MI/DIF simultaneously over multiple background variables, categorical and continuous. Unfortunately, conventional procedures for detecting DIF do not scale well to the more complex MNLFA. The current manuscript therefore proposes a regularization approach to MNLFA estimation that penalizes the likelihood for DIF parameters (i.e., rewarding sparse DIF). This procedure avoids the pitfalls of sequential inference tests, is automated for end users, and is shown to perform well in both a small-scale simulation and an empirical validation study.
Collapse
Affiliation(s)
- Daniel J Bauer
- Department of Psychology and Neuroscience, The University of North Carolina at Chapel Hill
- Center for Developmental Science, The University of North Carolina at Chapel Hill
| | - William C M Belzak
- Department of Psychology and Neuroscience, The University of North Carolina at Chapel Hill
| | - Veronica Cole
- Center for Developmental Science, The University of North Carolina at Chapel Hill
| |
Collapse
|
22
|
Ferrando PJ, Lorenzo-Seva U. An External Validity Approach for Assessing Essential Unidimensionality in Correlated-Factor Models. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2019; 79:437-461. [PMID: 31105318 PMCID: PMC6506987 DOI: 10.1177/0013164418824755] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Many psychometric measures yield data that are compatible with (a) an essentially unidimensional factor analysis solution and (b) a correlated-factor solution. Deciding which of these structures is the most appropriate and useful is of considerable importance, and various procedures have been proposed to help in this decision. The only fully developed procedures available to date, however, are internal, and they use only the information contained in the item scores. In contrast, this article proposes an external auxiliary procedure in which primary factor scores and general factor scores are related to relevant external variables. Our proposal consists of two groups of procedures. The procedures in the first group (differential validity procedures) assess the extent to which the primary factor scores relate differentially to the external variables. Procedures in the second group (incremental validity procedures) assess the extent to which the primary factor scores yield predictive validity increments with respect to the single general factor scores. Both groups of procedures are based on a second-order structural model with latent variables from which new methodological results are obtained. The functioning of the proposal is assessed by means of a simulation study, and its usefulness is illustrated with a real-data example in the personality domain.
Collapse
|
23
|
Hussong AM, Gottfredson NC, Bauer DJ, Curran PJ, Haroon M, Chandler R, Kahana SY, Delaney JAC, Altice FL, Beckwith CG, Feaster DJ, Flynn PM, Gordon MS, Knight K, Kuo I, Ouellet LJ, Quan VM, Seal DW, Springer SA. Approaches for creating comparable measures of alcohol use symptoms: Harmonization with eight studies of criminal justice populations. Drug Alcohol Depend 2019; 194:59-68. [PMID: 30412898 PMCID: PMC6312501 DOI: 10.1016/j.drugalcdep.2018.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Revised: 10/16/2018] [Accepted: 10/17/2018] [Indexed: 11/28/2022]
Abstract
BACKGROUND With increasing data archives comprised of studies with similar measurement, optimal methods for data harmonization and measurement scoring are a pressing need. We compare three methods for harmonizing and scoring the AUDIT as administered with minimal variation across 11 samples from eight study sites within the STTR (Seek-Test-Treat-Retain) Research Harmonization Initiative. Descriptive statistics and predictive validity results for cut-scores, sum scores, and Moderated Nonlinear Factor Analysis scores (MNLFA; a psychometric harmonization method) are presented. METHODS Across the eight study sites, sample sizes ranged from 50 to 2405 and target populations varied based on sampling frame, location, and inclusion/exclusion criteria. The pooled sample included 4667 participants (82% male, 52% Black, 24% White, 13% Hispanic, and 8% Asian/ Pacific Islander; mean age of 38.9 years). Participants completed the AUDIT at baseline in all studies. RESULTS After logical harmonization of items, we scored the AUDIT using three methods: published cut-scores, sum scores, and MNLFA. We found greater variation, fewer floor effects, and the ability to directly address missing data in MNLFA scores as compared to cut-scores and sum scores. MNLFA scores showed stronger associations with binge drinking and clearer study differences than did other scores. CONCLUSIONS MNLFA scores are a promising tool for data harmonization and scoring in pooled data analysis. Model complexity with large multi-study applications, however, may require new statistical advances to fully realize the benefits of this approach.
Collapse
Affiliation(s)
| | | | - Dan J Bauer
- University of North Carolina at Chapel Hill, United States.
| | | | - Maleeha Haroon
- University of North Carolina at Chapel Hill, United States.
| | - Redonna Chandler
- National Institute on Drug Abuse/National Institutes of Health, United States
| | - Shoshana Y Kahana
- National Institute on Drug Abuse/National Institutes of Health, United States.
| | | | | | | | | | | | | | | | - Irene Kuo
- The George Washington University, United States.
| | | | - Vu M Quan
- Johns Hopkins University, United States.
| | - David W Seal
- Tulane University School of Public Health and Tropical Medicine, United States.
| | | |
Collapse
|