1
|
Plessen CY, Fischer F, Hartmann C, Liegl G, Schalet B, Kaat AJ, Pesantez R, Joeris A, Heng M, Rose M. Differential item functioning between English, German, and Spanish PROMIS® physical function ceiling items. Qual Life Res 2025; 34:1377-1391. [PMID: 39680276 PMCID: PMC12064622 DOI: 10.1007/s11136-024-03866-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/25/2024] [Indexed: 12/17/2024]
Abstract
PURPOSE We investigated the validity of the German and Spanish translations of 35 new high functioning items added to the Patient Reported Outcomes Measurement Information System (PROMIS®) Physical Function item bank 2.0. We assessed differential item functioning (DIF) between three general population samples from Argentina, Germany, and the United States. METHODS PROMIS Physical Function data was collected in online panels from 3601 individuals (mean age, 41.6 years old; range, 18-88 years; 53.7% female). Of these, 1001 participants completed the Spanish version, 1000 completed the German version, and 1600 completed the English version. DIF was assessed by a multiverse analysis that systematically varied analytic choices across the entire range of plausible options within the logistic ordinal regression framework. RESULTS Translated items generally met the assumptions of unidimensionality, monotonicity, and local independence. The 272 different analyses suggest consistent DIF between languages in four items. Test characteristic curves suggested that the magnitude and impact of DIF on the test scores were negligible for all items at the test level. After correcting for potential DIF, we observed greater scoring for physical functioning in Argentina compared to the US, Cohen's d = 0.25, [0.17, 0.33], and Argentina compared to Germany, Cohen's d = 0.23, [0.15, 0.32]. CONCLUSIONS Our findings support the universal applicability of PROMIS Physical Function items across general populations in Argentina, Germany, and the U.S. The sensitivity analyses indicate that the identification of DIF items was robust for different data analytic decisions. Multiverse analysis is a promising approach to address lack of clear cutoffs in DIF identification.
Collapse
Affiliation(s)
- Constantin Yves Plessen
- Center for Patient-Centered Outcomes Research, Medizinische Klinik mit Schwerpunkt für Psychosomatik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10097, Berlin, Germany
| | - Felix Fischer
- Center for Patient-Centered Outcomes Research, Medizinische Klinik mit Schwerpunkt für Psychosomatik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10097, Berlin, Germany.
- German Center for Mental Health (DZPG), Berlin, Germany.
| | - Claudia Hartmann
- Center for Patient-Centered Outcomes Research, Medizinische Klinik mit Schwerpunkt für Psychosomatik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10097, Berlin, Germany
| | - Gregor Liegl
- Center for Patient-Centered Outcomes Research, Medizinische Klinik mit Schwerpunkt für Psychosomatik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10097, Berlin, Germany
| | - Ben Schalet
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Aaron J Kaat
- Department of Medical Social Science, Northwestern University Feinberg School of Medicine, Chicago IL, USA
| | | | - Alexander Joeris
- AO Innovation Translation Center, AO Foundation, Davos, Switzerland
| | - Marilyn Heng
- Department of Orthopaedics, University of Miami Miller School of Medicine, Miami FL, USA
| | - Matthias Rose
- Center for Patient-Centered Outcomes Research, Medizinische Klinik mit Schwerpunkt für Psychosomatik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitéplatz 1, 10097, Berlin, Germany
- German Center for Mental Health (DZPG), Berlin, Germany
| |
Collapse
|
2
|
Abel GA, Hays RD, Campbell JL, Elliott MN. The Use of External Anchors When Examining Differences in Scale Performance in Patient Experience Surveys. Med Care 2025; 63:311-316. [PMID: 39927873 DOI: 10.1097/mlr.0000000000002135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2025]
Abstract
OBJECTIVES To present an example of using vignettes as an external anchor to assess measurement equivalence for patient experience measures. BACKGROUND Evaluating measurement equivalence and differences in scale use is helpful for identifying disparities in patient experience based on patient surveys. External anchors, often in the form of scored vignettes, provide an attractive approach to examining differences in scale use but are not commonly used. METHODS We analyzed a UK dataset based on the General Practice Patient Survey and a U.S. dataset based on the Consumer Assessment of Healthcare Providers and Systems Clinician and Group survey. A total of 560 White British and 560 Pakistani adults were recruited from various locations across England; 575 Asian American and 505 non-Hispanic White patients were recruited from an internet panel in the United States. Patient encounters and rated the quality of communication using 5 General Practice Patient Survey questions and 3 Consumer Assessment of Healthcare Providers and Systems Clinician and Group questions. RESULTS Using an external anchor in both United States and UK data produced substantial evidence of differential item functioning (DIF). However, an "internal" DIF analysis (without an external anchor) produced little evidence of DIF. CONCLUSIONS Using an external anchor does not require the assumption made by internal methods that some items do not display between-group DIF. These assumptions may not hold for patient experience items if a single factor, such as extreme or negative response tendency, governs all items equally.
Collapse
Affiliation(s)
- Gary A Abel
- Department of Health and Community Sciences, University of Exeter, St Luke's Campus Exeter, UK
| | - Ron D Hays
- Department of Medicine, University of California, Los Angeles, Los Angeles, CA
| | - John L Campbell
- Department of Health and Community Sciences, University of Exeter, St Luke's Campus Exeter, UK
| | | |
Collapse
|
3
|
Kaptur DC, Liu Y, Kaptur B, Peterman N, Zhang J, Kern JL, Anderson C. Examining differential item functioning in self-reported health survey data: via multilevel modeling. Qual Life Res 2025:10.1007/s11136-025-03936-9. [PMID: 40021525 DOI: 10.1007/s11136-025-03936-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/19/2025] [Indexed: 03/03/2025]
Abstract
Few health-related constructs or measures have received a critical evaluation in terms of measurement equivalence, such as self-reported health survey data. Differential item functioning (DIF) analysis is crucial for evaluating measurement equivalence in self-reported health surveys, which are often hierarchical in structure. Traditional single-level DIF methods in this case fall short, making multilevel models a better alternative. We highlight the benefits of multilevel modeling for DIF analysis, when applying a health survey data set to multilevel binary logistic regression (for analyzing binary response data) and multilevel multinominal logistic regression (for analyzing polytomous response data), and comparing them with their single-level counterparts. Our findings show that multilevel models fit better and explain more variance than single-level models. This article is expected to raise awareness of multilevel modeling and help healthcare researchers and practitioners understand the use of multilevel modeling for DIF analysis.
Collapse
Affiliation(s)
| | | | | | | | - Jinming Zhang
- University of Illinois Urbana-Champaign, Champaign, IL, USA
| | - Justin L Kern
- University of Illinois Urbana-Champaign, Champaign, IL, USA
| | | |
Collapse
|
4
|
Sinharay S, Monroe S. Assessment of fit of item response theory models: A critical review of the status quo and some future directions. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2025. [PMID: 39760153 DOI: 10.1111/bmsp.12378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 11/29/2024] [Accepted: 12/03/2024] [Indexed: 01/07/2025]
Abstract
This paper provides a literature review of assessment of fit of item response theory models. Various types of fit procedures for item response theory models are reviewed, with a focus on their advantages and disadvantages. Real data examples are used to demonstrate some of the fit procedures. Recommendations are provided for researchers and practitioners who are interested in assessing the fit of item response theory models.
Collapse
Affiliation(s)
| | - Scott Monroe
- University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
5
|
Nørkær E, Halai AD, Woollams A, Lambon Ralph MA, Schumacher R. Establishing and evaluating the gradient of item naming difficulty in post-stroke aphasia and semantic dementia. Cortex 2024; 179:103-111. [PMID: 39167916 PMCID: PMC11413477 DOI: 10.1016/j.cortex.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 06/11/2024] [Accepted: 07/15/2024] [Indexed: 08/23/2024]
Abstract
Anomia is a common consequence following brain damage and a central symptom in semantic dementia (SD) and post-stroke aphasia (PSA), for instance. Picture naming tests are often used in clinical assessments and experience suggests that items vary systematically in their difficulty. Despite clinical intuitions and theoretical accounts, however, the existence and determinants of such a naming difficulty gradient remain to be empirically established and evaluated. Seizing the unique opportunity of two large-scale datasets of semantic dementia and post-stroke aphasia patients assessed with the same picture naming test, we applied an Item Response Theory (IRT) approach and we (a) established that an item naming difficulty gradient exists, which (b) partly differs between patient groups, and is (c) related in part to a limited number of psycholinguistic properties - frequency and familiarity for SD, frequency and word length for PSA. Our findings offer exciting future avenues for new, adaptive, time-efficient, and patient-tailored approaches to naming assessment and therapy.
Collapse
Affiliation(s)
- Erling Nørkær
- Department of Psychology, University of Copenhagen, Copenhagen, Denmark
| | - Ajay D Halai
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Anna Woollams
- Division of Neuroscience and Experimental Psychology, University of Manchester, Manchester, United Kingdom
| | - Matthew A Lambon Ralph
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom.
| | - Rahel Schumacher
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom; Clinic for Neurology and Neurorehabilitation, Luzerner Kantonsspital, University Teaching and Research Hospital, and University of Lucerne, Lucerne, Switzerland.
| |
Collapse
|
6
|
Wellhagen GJ, Yassen A, Garmann D, Bröker A, Solms A, Zhang Y, Kjellsson MC, Karlsson MO. Evaluation of covariate effects in item response theory models. CPT Pharmacometrics Syst Pharmacol 2024; 13:812-822. [PMID: 38436514 PMCID: PMC11098156 DOI: 10.1002/psp4.13120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/26/2024] [Accepted: 02/12/2024] [Indexed: 03/05/2024] Open
Abstract
Item response theory (IRT) models are usually the best way to analyze composite or rating scale data. Standard methods to evaluate covariate or treatment effects in IRT models do not allow to identify item-specific effects. Finding subgroups of patients who respond differently to certain items could be very important when designing inclusion or exclusion criteria for clinical trials, and aid in understanding different treatment responses in varying disease manifestations. We present a new method to investigate item-specific effects in IRT models, which is based on inspection of residuals. The method was investigated in a simulation exercise with a model for the Epworth Sleepiness Scale. We also provide a detailed discussion as a guidance on how to build a robust covariate IRT model.
Collapse
|
7
|
Langer ÁI, Ponce FP, Ordóñez-Carrasco JL, Fuentes-Ferrada R, Mac-Ginty S, Gaete J, Núñez D. Psychometric evidence of the Acceptance and Action Questionnaire-II (AAQ-II): an item response theory analysis in university students from Chile. BMC Psychol 2024; 12:111. [PMID: 38429801 PMCID: PMC10908082 DOI: 10.1186/s40359-024-01608-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 02/19/2024] [Indexed: 03/03/2024] Open
Abstract
BACKGROUND Experiential avoidance (EA) is a psychological mechanism associated with several mental health disorders and is regarded as a relevant target by third-generation cognitive behavioral therapies. It has been mainly assessed through self-report questionnaires, and the AAQ-II is the most used tool. Its psychometric evidence has been mostly tested through the classical test theory (CTT) and very scarcely assessed through Item Response Theory (IRT). METHODS We used the Graded Response Model to examine its psychometric properties in Spanish-speaking university students (n = 1503; women = 995 (66.2%), mean age = 19.29, SD = 2.45). We tested whether the empirical data fit the model's predictions and estimated the dispersion of persons and items along the experiential avoidance continuum. Moreover, we examined category probability curves to identify the response probability of each answer. Likewise, an item-person map was made where the measurement of persons and items, both on the same scale and along the experiential avoidance continuum, could be observed jointly. Finally, we tested the gender invariance of the scale. RESULTS We found that the values of the individuals and the items were in the established range to be considered an adequate measure of EA. Additionally, we observed high discrimination indices for all items. The current version with seven answer options could not be optimal and should be tested in future studies. Finally, we found evidence of differential functioning by gender in one of the seven items of the instrument. CONCLUSIONS Our results indicate that the AAQ-II is a suitable tool for measuring EA and accurately classifying and differentiating EA levels in university students.
Collapse
Affiliation(s)
- Álvaro I Langer
- Millennium Nucleus to Improve the Mental Health of Adolescents and Youths, Imhay, Santiago, Chile
- Faculty of Psychology and Humanities, Universidad San Sebastián, Valdivia, Chile
| | - Fernando P Ponce
- Faculty of Psychology, Universidad de Talca, s/n, Talca, Chile
- Millennium Nucleus on Intergenerational Mobility: From Modelling to Policy (MOVI), Santiago, Chile
| | | | - Reiner Fuentes-Ferrada
- Millennium Nucleus to Improve the Mental Health of Adolescents and Youths, Imhay, Santiago, Chile
- Faculty of Psychology and Humanities, Universidad San Sebastián, Valdivia, Chile
| | - Scarlett Mac-Ginty
- Millennium Nucleus to Improve the Mental Health of Adolescents and Youths, Imhay, Santiago, Chile
- Department of Health Service and Population Research, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Jorge Gaete
- Millennium Nucleus to Improve the Mental Health of Adolescents and Youths, Imhay, Santiago, Chile
- Research Center for Students Mental Health (ISME), Faculty of Education, Universidad de los Andes, Santiago, Chile
| | - Daniel Núñez
- Millennium Nucleus to Improve the Mental Health of Adolescents and Youths, Imhay, Santiago, Chile.
- Faculty of Psychology, Universidad de Talca, s/n, Talca, Chile.
| |
Collapse
|
8
|
Wallin G, Chen Y, Moustaki I. DIF Analysis with Unknown Groups and Anchor Items. PSYCHOMETRIKA 2024; 89:267-295. [PMID: 38383880 PMCID: PMC11062998 DOI: 10.1007/s11336-024-09948-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Indexed: 02/23/2024]
Abstract
Ensuring fairness in instruments like survey questionnaires or educational tests is crucial. One way to address this is by a Differential Item Functioning (DIF) analysis, which examines if different subgroups respond differently to a particular item, controlling for their overall latent construct level. DIF analysis is typically conducted to assess measurement invariance at the item level. Traditional DIF analysis methods require knowing the comparison groups (reference and focal groups) and anchor items (a subset of DIF-free items). Such prior knowledge may not always be available, and psychometric methods have been proposed for DIF analysis when one piece of information is unknown. More specifically, when the comparison groups are unknown while anchor items are known, latent DIF analysis methods have been proposed that estimate the unknown groups by latent classes. When anchor items are unknown while comparison groups are known, methods have also been proposed, typically under a sparsity assumption - the number of DIF items is not too large. However, DIF analysis when both pieces of information are unknown has not received much attention. This paper proposes a general statistical framework under this setting. In the proposed framework, we model the unknown groups by latent classes and introduce item-specific DIF parameters to capture the DIF effects. Assuming the number of DIF items is relatively small, an L 1 -regularised estimator is proposed to simultaneously identify the latent classes and the DIF items. A computationally efficient Expectation-Maximisation (EM) algorithm is developed to solve the non-smooth optimisation problem for the regularised estimator. The performance of the proposed method is evaluated by simulation studies and an application to item response data from a real-world educational test.
Collapse
Affiliation(s)
- Gabriel Wallin
- Department of Mathematics and Statistics, Lancaster University, Umeå, Sweden
| | - Yunxiao Chen
- Department of Statistics, London School of Economics and Political Science, Columbia House, Room 5.16 Houghton Street, London, WC2A 2AE, UK.
| | - Irini Moustaki
- Department of Statistics, London School of Economics and Political Science, Columbia House, Room 5.16 Houghton Street, London, WC2A 2AE, UK
| |
Collapse
|
9
|
Accuracy of mixture item response theory models for identifying sample heterogeneity in patient-reported outcomes: a simulation study. Qual Life Res 2022; 31:3423-3432. [PMID: 35716223 DOI: 10.1007/s11136-022-03169-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2022] [Indexed: 10/18/2022]
Abstract
PURPOSE Mixture item response theory (MixIRT) models can be used to uncover heterogeneity in responses to items that comprise patient-reported outcome measures (PROMs). This is accomplished by identifying relatively homogenous latent subgroups in heterogeneous populations. Misspecification of the number of latent subgroups may affect model accuracy. This study evaluated the impact of specifying too many latent subgroups on the accuracy of MixIRT models. METHODS Monte Carlo methods were used to assess MixIRT accuracy. Simulation conditions included number of items and latent classes, class size ratio, sample size, number of non-invariant items, and magnitude of between-class difference in item parameters. Bias and mean square error in item parameters and accuracy of latent class recovery were assessed. RESULTS When the number of latent classes was correctly specified, the average bias and MSE in model parameters decreased as the number of items and latent classes increased, but specification of too many latent classes resulted in modest decrease (i.e., < 10%) in the accuracy of latent class recovery. CONCLUSION The accuracy of MixIRT model is largely influenced by the overspecification of the number of latent classes. Appropriate choice of goodness-of-fit measures, study design considerations, and a priori contextual understanding of the degree of sample heterogeneity can guide model selection.
Collapse
|
10
|
Cheville AL, Basford JR. A View of the Development of Patient-Reported Outcomes Measures, Their Clinical Integration, Electronification, and Potential Impact on Rehabilitation Service Delivery. Arch Phys Med Rehabil 2022; 103:S24-S33. [PMID: 34896403 DOI: 10.1016/j.apmr.2021.10.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 10/20/2021] [Indexed: 11/25/2022]
Abstract
Recognition of the importance of a patient's perception of their status and experience has become central to medical care and its evaluation. This recognition has led to a growing reliance on the use of patient-reported outcome measures (PROMs). Nevertheless, although awareness of PROMs and acceptance of their utility has increased markedly, few of us have a good insight into their development; their utility relative to clinician-rated and performance measures such as the FIM and 6-minute walk test or how their "electronification" and incorporation into electronic health records (EHRs) may improve the individualization, value, and quality of medical care. In all, the goal of this commentary is to provide some insight into historical factors and technology developments that we believe have shaped modern clinical PROMs as they relate to medicine in general and to rehabilitation in particular. In addition, we speculate that while the growth of PROM use may have been triggered by an increased emphasis on the centrality of the patient in their care, future uptake will be shaped by their embedding in EHRs and used to improve clinical decision support though their integration with other sources of clinical and sociodemographic data.
Collapse
Affiliation(s)
- Andrea L Cheville
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, Minnesota
| | - Jeffrey R Basford
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, Minnesota.
| |
Collapse
|
11
|
Lim L, Chapman E. Validation of the Moral Reasoning Questionnaire against Rasch Measurement Theory. JOURNAL OF PACIFIC RIM PSYCHOLOGY 2022. [DOI: 10.1177/18344909221087418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
To support teachers in facilitating students’ moral reasoning development as specified within the Singapore Ministry of Education Character and Citizenship Education curriculum, the Moral Reasoning Questionnaire (MRQ) was developed and underwent preliminary validation. Based upon expert reviews, cognitive interviews and a classical test theory-based factor analytic approach, the development and preliminary validation found evidence (i.e., content appropriateness, response processes and internal structure) to support the validity and reliability of the MRQ. This study aims to extend the validation by examining the purported MRQ items and scale at a deeper level on the Rasch Measurement Theory, given that it is the only model that presents appropriate properties of interval measurement on a log-linear scale. The Rasch analysis found anomalies including differential item functioning and disordered thresholds in the initial set of items. Upon remediation and a second Rasch analysis, the MRQ responses were consistent with that expressed by the Rasch model (i.e., an item with an endorsability higher than what a respondent would tend to endorse would have a lower probability of being endorsed than an item exhibiting an endorsability below what that respondent would tend to endorse) and hence, there was sufficient evidence to support measurement invariance, and that MRQ scores could be concluded to characterise persons invariantly across a continuum.
Collapse
Affiliation(s)
- Lyndon Lim
- Teaching and Learning Centre, Singapore University of Social Sciences, Singapore
- Assessment and Research Group, Singapore Examinations and Assessment Board, Singapore
| | - Elaine Chapman
- Graduate School of Education, The University of Western Australia, Perth, Australia
| |
Collapse
|
12
|
Abstract
Patient-reported outcomes are recognized as essential for the evaluation of medical and public health interventions. Over the last 50 years, health-related quality of life (HRQoL) research has grown exponentially from 0 to more than 17,000 papers published annually. We provide an overview of generic HRQoL measures used widely in epidemiological studies, health services research, population studies, and randomized clinical trials [e.g., Medical Outcomes Study SF-36 and the Patient-Reported Outcomes Measurement Information System (PROMIS®)-29]. In addition, we review methods used for economic analysis and calculation of the quality-adjusted life year (QALY). These include the EQ-5D, the Health Utilities Index (HUI), the self-administered Quality of Well-being Scale (QWB-SA), and the Health and Activities Limitation Index (HALex). Furthermore, we consider hybrid measures such as the SF-6D and the PROMIS-Preference (PROPr). The plethora of HRQoL measures has impeded cumulative science because incomparable measures have been used in different studies. Linking among different measures and consensus on standard HRQoL measurement should now be prioritized. In addition, enabling widespread access to common measures is necessary to accelerate future progress. Expected final online publication date for the Annual Review of Public Health, Volume 43 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Robert M Kaplan
- Clinical Excellence Research Center, Department of Medicine, Stanford University, Stanford, California, USA;
| | - Ron D Hays
- Division of General Internal Medicine, Department of Medicine, University of California, Los Angeles, California, USA
| |
Collapse
|
13
|
De Boeck P, Cho SJ. Not all DIF is shaped similarly. PSYCHOMETRIKA 2021; 86:712-716. [PMID: 34089430 DOI: 10.1007/s11336-021-09772-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 04/18/2021] [Indexed: 06/12/2023]
Abstract
In response to the target article by Teresi et al. (2021), we explain why the article is useful and we also present a different approach. An alternative category of differential item functioning (DIF) is presented with a corresponding way of modeling DIF, based on random person and random item effects and explanatory covariates.
Collapse
Affiliation(s)
- Paul De Boeck
- Department of Psychology, The Ohio State University, 240 Lazenby Hall, 1827 Neil Avenue, Columbus, OH, 43210, USA.
| | | |
Collapse
|
14
|
Reeve BB, Hays RD. Guest Editors' Introduction to the Invited Special Section. PSYCHOMETRIKA 2021; 86:671-673. [PMID: 34390454 DOI: 10.1007/s11336-021-09795-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Affiliation(s)
- Bryce B Reeve
- Duke University School of Medicine, Durham, NC, USA.
| | | |
Collapse
|
15
|
Examination of the measurement equivalence of the Functional Assessment in Acute Care MCAT (FAMCAT) mobility item bank using differential item functioning analyses. Arch Phys Med Rehabil 2021; 103:S84-S107.e38. [PMID: 34146534 DOI: 10.1016/j.apmr.2021.03.044] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 03/20/2021] [Accepted: 03/25/2021] [Indexed: 11/23/2022]
Abstract
OBJECTIVE To assess differential item functioning (DIF) in an item pool measuring the mobility of hospitalized patients across educational, age, and gender groups. DESIGN Measurement evaluation cohort study. Content experts generated DIF hypotheses to guide the interpretation. The graded response item response theory (IRT) model was used. Primary DIF tests were Wald statistics; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude and impact were evaluated by examining group differences in expected item and scale score functions. SETTING Hospital-based rehabilitation PARTICIPANTS: 2216 hospitalized patients MAIN OUTCOME MEASURES: 111 self-reported mobility items RESULTS: Two linking items among those used to set the metric across forms evidenced DIF for gender and age: 'difficulty climbing stairs step-over-step without a handrail (alternating feet)' and 'difficulty climbing 3 to 5 steps without a handrail'. Conditional on the mobility state, the items were more difficult for women and older people (aged 65 and over). An additional 18 items were identified with DIF. Items with both high DIF magnitude and hypotheses related to age were difficulty: 'crossing road at a 4-lane traffic light with curbs'; 'jumping/landing on one leg'; 'strenuous activities'; 'descending 3-5 steps with no handrail'. Although DIF of higher magnitude was observed for several items, the scale-level impact was relatively small and the exposure rate for the most problematic items was low (0.35, 0.27 and 0.20). CONCLUSIONS This was the first study to evaluate measurement equivalence of the hospital-based rehabilitation mobility item bank. Although 20 items evidenced high magnitude DIF, five related to stairs, the scale-level impact was minimal; however, it is recommended that such items be avoided in the development of short-form measures. No items with salient DIF were removed from calibrations, supporting the use of the item bank across groups differing in education, age, and gender. The bank may thus be useful to assist clinical assessment and decision-making regarding risk for specific mobility restrictions at discharge as well as identifying mobility-related functions targeted for post- discharge interventions. Additionally, with the goal of avoiding long and burdensome assessments for patients and clinical staff; these results could be informative for those using the item bank to construct short forms.
Collapse
|