1
|
Amorim G, Tao R, Lotspeich S, Shaw PA, Lumley T, Patel RC, Shepherd BE. Three-phase generalized raking and multiple imputation estimators to address error-prone data. Stat Med 2024; 43:379-394. [PMID: 37987515 PMCID: PMC10842111 DOI: 10.1002/sim.9967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 09/23/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023]
Abstract
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.
Collapse
Affiliation(s)
- Gustavo Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Sarah Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, North Carolina, USA
| | - Pamela A Shaw
- Biostatistcs Division, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Rena C Patel
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Bryan E Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
2
|
Lotspeich SC, Shepherd BE, Amorim GGC, Shaw PA, Tao R. Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics 2022; 78:1674-1685. [PMID: 34213008 PMCID: PMC8720323 DOI: 10.1111/biom.13512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 05/19/2021] [Accepted: 06/17/2021] [Indexed: 12/30/2022]
Abstract
Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error-prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost-effective solution is the two-phase design, under which the error-prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error-prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error-prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.
Collapse
Affiliation(s)
- Sarah C. Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| | - Gustavo G. C. Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| |
Collapse
|
3
|
Tao R, Lotspeich SC, Amorim G, Shaw PA, Shepherd BE. Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors. Stat Med 2021; 40:725-738. [PMID: 33145800 PMCID: PMC8214478 DOI: 10.1002/sim.8799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 09/07/2020] [Accepted: 10/20/2020] [Indexed: 11/07/2022]
Abstract
In modern observational studies using electronic health records or other routinely collected data, both the outcome and covariates of interest can be error-prone and their errors often correlated. A cost-effective solution is the two-phase design, under which the error-prone outcome and covariates are observed for all subjects during the first phase and that information is used to select a validation subsample for accurate measurements of these variables in the second phase. Previous research on two-phase measurement error problems largely focused on scenarios where there are errors in covariates only or the validation sample is a simple random sample of study subjects. Herein, we propose a semiparametric approach to general two-phase measurement error problems with a quantitative outcome, allowing for correlated errors in the outcome and covariates and arbitrary second-phase selection. We devise a computationally efficient and numerically stable expectation-maximization algorithm to maximize the nonparametric likelihood function. The resulting estimators possess desired statistical properties. We demonstrate the superiority of the proposed methods over existing approaches through extensive simulation studies, and we illustrate their use in an observational HIV study.
Collapse
Affiliation(s)
- Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Sarah C. Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Gustavo Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
4
|
Lotspeich SC, Giganti MJ, Maia M, Vieira R, Machado DM, Succi RC, Ribeiro S, Pereira MS, Rodriguez MF, Julmiste G, Luque MT, Caro-Vega Y, Mejia F, Shepherd BE, McGowan CC, Duda SN. Self-audits as alternatives to travel-audits for improving data quality in the Caribbean, Central and South America network for HIV epidemiology. J Clin Transl Sci 2020; 4:125-132. [PMID: 32313702 PMCID: PMC7159809 DOI: 10.1017/cts.2019.442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 11/19/2019] [Accepted: 11/25/2019] [Indexed: 11/25/2022] Open
Abstract
INTRODUCTION Audits play a critical role in maintaining the integrity of observational cohort data. While previous work has validated the audit process, sending trained auditors to sites ("travel-audits") can be costly. We investigate the efficacy of training sites to conduct "self-audits." METHODS In 2017, eight research groups in the Caribbean, Central, and South America network for HIV Epidemiology each audited a subset of their patient records randomly selected by the data coordinating center at Vanderbilt. Designated investigators at each site compared abstracted research data to the original clinical source documents and captured audit findings electronically. Additionally, two Vanderbilt investigators performed on-site travel-audits at three randomly selected sites (one adult and two pediatric) in late summer 2017. RESULTS Self- and travel-auditors, respectively, reported that 93% and 92% of 8919 data entries, captured across 28 unique clinical variables on 65 patients, were entered correctly. Across all entries, 8409 (94%) received the same assessment from self- and travel-auditors (7988 correct and 421 incorrect). Of 421 entries mutually assessed as "incorrect," 304 (82%) were corrected by both self- and travel-auditors and 250 of these (72%) received the same corrections. Reason for changing antiretroviral therapy (ART) regimen, ART end date, viral load value, CD4%, and HIV diagnosis date had the most mismatched corrections. CONCLUSIONS With similar overall error rates, findings suggest that data audits conducted by trained local investigators could provide an alternative to on-site audits by external auditors to ensure continued data quality. However, discrepancies observed between corrections illustrate challenges in determining correct values even with audits.
Collapse
Affiliation(s)
- Sarah C. Lotspeich
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Mark J. Giganti
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Marcelle Maia
- Departamento de Pediatria, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Renalice Vieira
- Departamento de Pediatria, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Daisy Maria Machado
- Departamento de Pediatria, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Regina Célia Succi
- Departamento de Pediatria, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Sayonara Ribeiro
- Instituto Nacional de Infectologia Evandro Chagas, Rio de Janeiro, Brazil
| | | | | | - Gaetane Julmiste
- Le Groupe Haïtien d’Etude du Sarcome de Kaposi et des Infections Opportunistes, Port-au-Prince, Haiti
| | - Marco Tulio Luque
- Instituto Hondureño de Seguridad Social and Hospital Escuela Universitario, Tegucigalpa, Honduras
| | - Yanink Caro-Vega
- Departamento de Enfermedades Infecciosas, El Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
| | - Fernando Mejia
- Instituto de Medicina Tropical Alexander von Humboldt, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Catherine C. McGowan
- Division of Infectious Diseases, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Stephany N. Duda
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
5
|
Jobson G, Murphy J, van Huyssteen M, Myburgh H, Hurter T, Grobbelaar CJ, Struthers HE, McIntyre JA, Peters RPH. Understanding health worker data use in a South African antiretroviral therapy register. Trop Med Int Health 2018; 23:1207-1212. [PMID: 30176094 DOI: 10.1111/tmi.13146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
OBJECTIVE To evaluate how electronic data management systems affect data use practices in antiretroviral therapy (ART) programs within local health districts, and individual health facilities. METHODS We used a data quality audit to establish a baseline of the quality of data in the electronic register alongside in-depth interviews with health workers and managers, to understand perceptions of data quality, data use by facility staff and challenges affecting data use. RESULTS The findings provide a four-level continuum of data use that can be applied to other settings and recommendations for optimising facility-level data use. CONCLUSION By defining four levels of data use our findings suggest the potential to encourage a structured process of moving from passive data use, to more active and engaged data use, where data could be used to anticipate patient behaviour and link that behaviour to differentiated care plans.
Collapse
Affiliation(s)
| | | | - Mea van Huyssteen
- Faculty of Natural Science, School of Pharmacy, University of the Western Cape, Bellville, South Africa
| | | | | | | | - Helen E Struthers
- Anova Health Institute, Johannesburg, South Africa.,Division of Infectious Diseases & HIV Medicine, Department of Medicine, University of Cape Town, Cape Town, South Africa
| | - James A McIntyre
- Anova Health Institute, Johannesburg, South Africa.,School of Public Health and Family Medicine, University of Cape Town, Cape Town, South Africa
| | - Remco P H Peters
- Anova Health Institute, Johannesburg, South Africa.,Department of Medical Microbiology, University of Pretoria, Pretoria, South Africa
| |
Collapse
|