1
|
Ross RK, Cole SR, Edwards JK, Zivich PN, Westreich D, Daniels JL, Price JT, Stringer JSA. Leveraging External Validation Data: The Challenges of Transporting Measurement Error Parameters. Epidemiology 2024; 35:196-207. [PMID: 38079241 PMCID: PMC10841744 DOI: 10.1097/ede.0000000000001701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Approaches to address measurement error frequently rely on validation data to estimate measurement error parameters (e.g., sensitivity and specificity). Acquisition of validation data can be costly, thus secondary use of existing data for validation is attractive. To use these external validation data, however, we may need to address systematic differences between these data and the main study sample. Here, we derive estimators of the risk and the risk difference that leverage external validation data to account for outcome misclassification. If misclassification is differential with respect to covariates that themselves are differentially distributed in the validation and study samples, the misclassification parameters are not immediately transportable. We introduce two ways to account for such covariates: (1) standardize by these covariates or (2) iteratively model the outcome. If conditioning on a covariate for transporting the misclassification parameters induces bias of the causal effect (e.g., M-bias), the former but not the latter approach is biased. We provide proof of identification, describe estimation using parametric models, and assess performance in simulations. We also illustrate implementation to estimate the risk of preterm birth and the effect of maternal HIV infection on preterm birth. Measurement error should not be ignored and it can be addressed using external validation data via transportability methods.
Collapse
Affiliation(s)
- Rachael K Ross
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Stephen R Cole
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Jessie K Edwards
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Paul N Zivich
- Institute of Global Health and Infectious Diseases, School of Medicine, University of North Carolina at Chapel Hill, NC
| | - Daniel Westreich
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Julie L Daniels
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
| | - Joan T Price
- Department of Obstetrics and Gynecology, School of Medicine, University of North Carolina, Chapel Hill, NC
| | - Jeffrey S A Stringer
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC
- Department of Obstetrics and Gynecology, School of Medicine, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
2
|
Lanes S, Beachler DC. Validation to correct for outcome misclassification bias. Pharmacoepidemiol Drug Saf 2023; 32:700-703. [PMID: 36751117 DOI: 10.1002/pds.5601] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/27/2023] [Accepted: 02/03/2023] [Indexed: 02/09/2023]
Affiliation(s)
- Stephan Lanes
- Department of Safety and Epidemiology, HealthCore, Wilmington, Delaware, USA
| | - Daniel C Beachler
- Department of Safety and Epidemiology, HealthCore, Wilmington, Delaware, USA
| |
Collapse
|
3
|
Wu Q, Zhang Z, Ma T, Waltz J, Milton D, Chen S. Link predictions for incomplete network data with outcome misclassification. Stat Med 2021; 40:1519-1534. [PMID: 33482688 DOI: 10.1002/sim.8856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 10/05/2020] [Accepted: 11/24/2020] [Indexed: 11/09/2022]
Abstract
Link prediction is a fundamental problem in network analysis. In a complex network, links can be unreported and/or under detection limits due to heterogeneous sources of noise and technical challenges during data collection. The incomplete network data can lead to an inaccurate inference of network based data analysis. We propose a parametric link prediction model and consider latent links as misclassified binary outcomes. We develop new algorithms to optimize model parameters and yield robust predictions of unobserved links. Theoretical properties of the predictive model are also discussed. We apply the new method to a partially observed social network data and incomplete brain network data. The results demonstrate that our method outperforms the existing latent-link prediction methods.
Collapse
Affiliation(s)
- Qiong Wu
- Department of Mathematics, University of Maryland, College Park, Maryland, USA
| | - Zhen Zhang
- Department of Accounting, College of Business and Economics, Towson University, Towson, Maryland, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland School of Public Health, College Park, Maryland, USA
| | - James Waltz
- Maryland Psychiatric Research Center, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Donald Milton
- Maryland Institute for Applied Environmental Health, University of Maryland School of Public Health, College Park, Maryland, USA
| | - Shuo Chen
- Maryland Psychiatric Research Center, University of Maryland School of Medicine, Baltimore, Maryland, USA.,Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, United States, USA
| |
Collapse
|
4
|
Beesley LJ, Mukherjee B. Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. Biometrics 2020; 78:214-226. [PMID: 33179768 DOI: 10.1111/biom.13400] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 10/26/2020] [Accepted: 10/29/2020] [Indexed: 12/27/2022]
Abstract
Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-varying factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting. Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies. For all methods proposed, we derive valid standard error estimators and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative, a longitudinal EHR-linked biorepository.
Collapse
Affiliation(s)
- Lauren J Beesley
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
5
|
Hall GC, Lanes S, Bollaerts K, Zhou X, Ferreira G, Gini R. Outcome misclassification: Impact, usual practice in pharmacoepidemiology database studies and an online aid to correct biased estimates of risk ratio or cumulative incidence. Pharmacoepidemiol Drug Saf 2020; 29:1450-1455. [PMID: 32860317 DOI: 10.1002/pds.5109] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 07/09/2020] [Accepted: 07/30/2020] [Indexed: 11/06/2022]
Abstract
PURPOSE It is well documented that outcome misclassification can bias a point estimate. We aimed to understand current practice in addressing this bias in pharmacoepidemiology database studies and to develop an open source application (app) from existing methodology to demonstrate the impact and mechanism of this bias on results. METHODS Studies of an exposure and a clinical outcome were selected from all Pharmacoepidemiology and Drug Safety publications during 2017 and any reference to outcome misclassification described. An app to correct risk ratio (RR) and cumulative incidence for outcome misclassification was developed from a published methodology and used to demonstrate the impact of correction on point estimates. RESULTS Eight (19%) of 43 papers selected reported estimates of outcome ascertainment accuracy with positive predictive value (PPV) the most commonly reported measure (7 of 8 studies). Three studies (7%) corrected for the bias, 1 by exposure strata, and 5 (12%) restricted analyses to confirmed cases. The app (app http://apps.p-95.com/ISPE/) uses values of PPV and sensitivity (or a range of possible values) in each exposure strata and returns corrected point estimates and confidence intervals. The app demonstrates that small differences between comparison groups in PPV or sensitivity can introduce bias even when accuracy estimates are high. CONCLUSIONS Outcome misclassification is not usually corrected in pharmacoepidemiology database studies although correction methods using routinely measured indices are available. Error indices are needed for each comparison group to correct RR estimates for these errors. The app should encourage understanding of this bias and increase adjustment.
Collapse
Affiliation(s)
| | | | | | - Xiaofeng Zhou
- Global Medical Epidemiology, Pfizer Inc., New York, US
| | | | - Rosa Gini
- Osservatorio di Epidemiologia, Agenzia Regionale di Sanità Della Toscana, Florence, Italy
| |
Collapse
|
6
|
Desai RJ, Levin R, Lin KJ, Patorno E. Bias Implications of Outcome Misclassification in Observational Studies Evaluating Association Between Treatments and All-Cause or Cardiovascular Mortality Using Administrative Claims. J Am Heart Assoc 2020; 9:e016906. [PMID: 32844711 PMCID: PMC7660765 DOI: 10.1161/jaha.120.016906] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background The bias implications of outcome misclassification arising from imperfect capture of mortality in claims‐based studies are not well understood. Methods and Results We identified 2 cohorts of patients: (1) type 2 diabetes mellitus (n=8.6 million), and (2) heart failure (n=3.1 million), from Medicare claims (2012–2016). Within the 2 cohorts, mortality was identified from claims using the following approaches: (1) all‐place all‐cause mortality, (2) in‐hospital all‐cause mortality, (3) all‐place cardiovascular mortality (based on diagnosis codes for a major cardiovascular event within 30 days of death date), or (4) in‐hospital cardiovascular mortality, and compared against National Death Index identified mortality. Empirically identified sensitivity and specificity based on observed values in the 2 cohorts were used to conduct Monte Carlo simulations for treatment effect estimation under differential and nondifferential misclassification scenarios. From National Death Index, 1 544 805 deaths (549 996 [35.6%] cardiovascular deaths) in the type 2 diabetes mellitus cohort and 1 175 202 deaths (523 430 [44.5%] cardiovascular deaths) in the heart failure cohort were included. Sensitivity was 99.997% and 99.207% for the all‐place all‐cause mortality approach, whereas it was 27.71% and 33.71% for the in‐hospital all‐cause mortality approach in the type 2 diabetes mellitus and heart failure cohorts, respectively, with perfect positive predicted values. For all‐place cardiovascular mortality, sensitivity was 52.01% in the type 2 diabetes mellitus cohort and 53.83% in the heart failure cohort with positive predicted values of 49.98% and 54.45%, respectively. Simulations suggested a possibility for substantial bias in treatment effects. Conclusions Approaches to identify mortality from claims had variable performance compared with the National Death Index. Investigators should anticipate the potential for bias from outcome misclassification when using administrative claims to capture mortality.
Collapse
Affiliation(s)
- Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women's Hospital & Harvard Medical School Boston MA
| | - Raisa Levin
- Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women's Hospital & Harvard Medical School Boston MA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women's Hospital & Harvard Medical School Boston MA
| | - Elisabetta Patorno
- Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women's Hospital & Harvard Medical School Boston MA
| |
Collapse
|
7
|
Beesley LJ, Fritsche LG, Mukherjee B. An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records. Stat Med 2020; 39:1965-1979. [PMID: 32198773 DOI: 10.1002/sim.8524] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 02/14/2020] [Accepted: 02/14/2020] [Indexed: 12/17/2022]
Abstract
Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.
Collapse
Affiliation(s)
- Lauren J Beesley
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Lars G Fritsche
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
8
|
French B, Sadakane A, Cologne J, Mabuchi K, Ozasa K, Preston DL. Misclassification of primary liver cancer in the Life Span Study of atomic bomb survivors. Int J Cancer 2020; 147:1294-1299. [PMID: 31985032 DOI: 10.1002/ijc.32887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 12/23/2019] [Accepted: 01/21/2020] [Indexed: 12/24/2022]
Abstract
Primary liver cancer is difficult to diagnose accurately at death, due to metastases from nearby organs and to concomitant diseases, such as chronic hepatitis and cirrhosis. Trends in diagnostic accuracy could affect radiation risk estimates for incident liver cancer by altering background rates or by impacting risk modification by sex and age. We quantified the potential impact of death-certificate inaccuracies on radiation risk estimates for liver cancer in the Life Span Study of atomic bomb survivors. True-positive and false-negative rates were obtained from a previous study that compared death-certificate causes of death with those based on pathological review, from 1958 to 1987. We assumed various scenarios for misclassification rates after 1987. We obtained estimated true positives and estimated false negatives by stratified sampling from binomial distributions with probabilities given by the true-positive and false-negative rates, respectively. Poisson regression methods were applied to highly stratified person-year tables of corrected case counts and accrued person years. During the study period (1958-2009), there were 1,885 cases of liver cancer, which included 383 death-certificate-only (DCO) cases; 1,283 cases with chronic liver disease as the underlying cause of death; and 150 DCO cases of pancreatic cancer among 105,444 study participants. Across the range of scenarios considered, radiation risk estimates based on corrected case counts were attenuated, on average, by 13-30%. Our results indicated that radiation risk estimates for liver cancer were potentially sensitive to death-certificate inaccuracies. Additional data are needed to inform misclassification rates in recent years.
Collapse
Affiliation(s)
- Benjamin French
- Department of Statistics, Radiation Effects Research Foundation, Hiroshima, Japan
| | - Atsuko Sadakane
- Department of Epidemiology, Radiation Effects Research Foundation, Hiroshima, Japan
| | - John Cologne
- Department of Statistics, Radiation Effects Research Foundation, Hiroshima, Japan
| | - Kiyohiko Mabuchi
- Radiation Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Kotaro Ozasa
- Department of Epidemiology, Radiation Effects Research Foundation, Hiroshima, Japan
| | | |
Collapse
|
9
|
Falasinnu T, Rossides M, Chaichian Y, Simard JF. Do Death Certificates Underestimate the Burden of Rare Diseases? The Example of Systemic Lupus Erythematosus Mortality, Sweden, 2001-2013. Public Health Rep 2018; 133:481-488. [PMID: 29928843 PMCID: PMC6055290 DOI: 10.1177/0033354918777253 10.1177/0033354918777253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/19/2023] Open
Abstract
OBJECTIVES Mortality due to rare diseases, which are substantial sources of premature mortality, is underreported in mortality studies. The objective of this study was to determine the completeness of reporting systemic lupus erythematosus (SLE) as a cause of death. METHODS In 2017, we linked data on a Swedish population-based cohort (the Swedish Lupus Linkage, 2001-2013) comprising people with SLE (n = 8560) and their matched general population comparators (n = 37 717) to data from the Cause of Death Register. We reviewed death records of deceased people from the cohort (n = 5110) and extracted data on patient demographic characteristics and causes of death. We estimated odds ratios (ORs) and 95% confidence intervals (CIs) for not reporting SLE as a cause of death by using multivariable-adjusted logistic regression models. RESULTS Of 1802 deaths among SLE patients in the study, 1071 (59%) did not have SLE reported on their death records. Most SLE decedents were aged 75-84 at death (n = 584, 32%), female (n = 1462, 81%), and born in Nordic countries (n = 1730, 96%). Decedents aged ≥85 at death were more likely to have SLE not reported on their death records than were decedents aged <50 (OR = 2.34; 95% CI, 1.48-3.68). Having renal failure listed as a cause of death decreased the likelihood of SLE not being reported on the death record (OR = 0.54; 95% CI, 0.40-0.73), whereas having cancer listed as a cause of death increased this likelihood (OR = 2.39; 95% CI, 1.85-3.07). CONCLUSIONS SLE was greatly underreported as a cause of mortality on death records of SLE patients, particularly in older decedents and those with cancer, thereby underestimating the true burden of this disease. Public health resources need to focus on improving the recording of rare diseases in order to enhance the epidemiological utility of mortality data.
Collapse
Affiliation(s)
- Titilola Falasinnu
- Department of Health Research and Policy, Stanford School of Medicine, Stanford, CA, USA
| | - Marios Rossides
- Clinical Epidemiology Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden
| | - Yashaar Chaichian
- Department of Medicine, Division of Immunology & Rheumatology, Stanford School of Medicine, Stanford, CA, USA
| | - Julia F. Simard
- Department of Health Research and Policy, Stanford School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Immunology & Rheumatology, Stanford School of Medicine, Stanford, CA, USA
| |
Collapse
|
10
|
Falasinnu T, Rossides M, Chaichian Y, Simard JF. Do Death Certificates Underestimate the Burden of Rare Diseases? The Example of Systemic Lupus Erythematosus Mortality, Sweden, 2001-2013. Public Health Rep 2018; 133:481-488. [PMID: 29928843 PMCID: PMC6055290 DOI: 10.1177/0033354918777253] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
OBJECTIVES Mortality due to rare diseases, which are substantial sources of premature mortality, is underreported in mortality studies. The objective of this study was to determine the completeness of reporting systemic lupus erythematosus (SLE) as a cause of death. METHODS In 2017, we linked data on a Swedish population-based cohort (the Swedish Lupus Linkage, 2001-2013) comprising people with SLE (n = 8560) and their matched general population comparators (n = 37 717) to data from the Cause of Death Register. We reviewed death records of deceased people from the cohort (n = 5110) and extracted data on patient demographic characteristics and causes of death. We estimated odds ratios (ORs) and 95% confidence intervals (CIs) for not reporting SLE as a cause of death by using multivariable-adjusted logistic regression models. RESULTS Of 1802 deaths among SLE patients in the study, 1071 (59%) did not have SLE reported on their death records. Most SLE decedents were aged 75-84 at death (n = 584, 32%), female (n = 1462, 81%), and born in Nordic countries (n = 1730, 96%). Decedents aged ≥85 at death were more likely to have SLE not reported on their death records than were decedents aged <50 (OR = 2.34; 95% CI, 1.48-3.68). Having renal failure listed as a cause of death decreased the likelihood of SLE not being reported on the death record (OR = 0.54; 95% CI, 0.40-0.73), whereas having cancer listed as a cause of death increased this likelihood (OR = 2.39; 95% CI, 1.85-3.07). CONCLUSIONS SLE was greatly underreported as a cause of mortality on death records of SLE patients, particularly in older decedents and those with cancer, thereby underestimating the true burden of this disease. Public health resources need to focus on improving the recording of rare diseases in order to enhance the epidemiological utility of mortality data.
Collapse
Affiliation(s)
- Titilola Falasinnu
- Department of Health Research and Policy, Stanford School of Medicine, Stanford, CA, USA
| | - Marios Rossides
- Clinical Epidemiology Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden
| | - Yashaar Chaichian
- Department of Medicine, Division of Immunology & Rheumatology, Stanford School of Medicine, Stanford, CA, USA
| | - Julia F. Simard
- Department of Health Research and Policy, Stanford School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Immunology & Rheumatology, Stanford School of Medicine, Stanford, CA, USA
| |
Collapse
|
11
|
Högg T, Petkau J, Zhao Y, Gustafson P, Wijnands JM, Tremlett H. Bayesian analysis of pair-matched case-control studies subject to outcome misclassification. Stat Med 2017; 36:4196-4213. [PMID: 28783882 DOI: 10.1002/sim.7427] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 05/03/2017] [Accepted: 06/29/2017] [Indexed: 11/06/2022]
Abstract
We examine the impact of nondifferential outcome misclassification on odds ratios estimated from pair-matched case-control studies and propose a Bayesian model to adjust these estimates for misclassification bias. The model relies on access to a validation subgroup with confirmed outcome status for all case-control pairs as well as prior knowledge about the positive and negative predictive value of the classification mechanism. We illustrate the model's performance on simulated data and apply it to a database study examining the presence of ten morbidities in the prodromal phase of multiple sclerosis.
Collapse
Affiliation(s)
- Tanja Högg
- Department of Statistics, University of British Columbia, 2207 Main Mall, Vancouver, V6T 1Z4, British Columbia, Canada
| | - John Petkau
- Department of Statistics, University of British Columbia, 2207 Main Mall, Vancouver, V6T 1Z4, British Columbia, Canada
| | - Yinshan Zhao
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.,BC Centre for Improved Cardiovascular Health, Vancouver, British Columbia, Canada
| | - Paul Gustafson
- Department of Statistics, University of British Columbia, 2207 Main Mall, Vancouver, V6T 1Z4, British Columbia, Canada
| | - José Ma Wijnands
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Helen Tremlett
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
12
|
Wang L, Damrauer SM, Zhang H, Zhang AX, Xiao R, Moore JH, Chen J. Phenotype validation in electronic health records based genetic association studies. Genet Epidemiol 2017; 41:790-800. [PMID: 29023970 DOI: 10.1002/gepi.22080] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 06/30/2017] [Accepted: 08/01/2017] [Indexed: 12/13/2022]
Abstract
The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.
Collapse
Affiliation(s)
- Lu Wang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott M Damrauer
- Division of Vascular Surgery and Endovascular Therapy, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania, United States of America
| | - Hong Zhang
- Institute of Biostatistics, Fudan University, Shanghai, P.R. China
| | - Alan X Zhang
- Sidwell Friends School, Washington, DC, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
13
|
Nakata K, Fujieda M, Miki H, Fukushima W, Ohfuji S, Maeda A, Kase T, Hirota Y. Detection of influenza vaccine effectiveness among nursery school children: Lesson from a season with cocirculating respiratory syncytial virus. Hum Vaccin Immunother 2015; 11:545-52. [PMID: 25714791 DOI: 10.1080/21645515.2015.1011982] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
In the winter influenza epidemic season, patients with respiratory illnesses including respiratory syncytial virus (RSV) infections increase among young children. Therefore, we evaluated the effectiveness of influenza vaccine against influenza-like illness (ILI) using a technique to identify outbreaks of RSV infection and to distinguish those patients from ILI patients. The study subjects were 101 children aged 12 to 84 months attending nursery school. We classified the cases into 6 levels based on the definitions of ILI for outcomes. We established observation periods according to information obtained from regional surveillance and rapid diagnostic tests among children. Multivariate odds ratios (ORs) for each case classification were obtained using a logistic regression model for each observation period. For the entire observation period, ORs for cases with fever plus respiratory symptoms were reduced marginally significantly. For the local influenza epidemic period, only the OR for the most serious cases was significantly decreased (0.20 [95%CI: 0.04-0.94]). During the influenza outbreak among the nursery school children, multivariate ORs for fever plus respiratory symptoms decreased significantly (≥ 38.0°C plus ≥ one symptoms: 0.23 [0.06-0.91), ≥ 38.0°C plus ≥ 2 symptoms: 0.21 [0.05-0.85], ≥ 39.0°C plus ≥ one symptoms: 0.18 [0.04-0.93] and ≥ 39.0°C plus ≥ 2 symptoms: 0.16 [0.03-0.87]). These results suggest that confining observation to the peak influenza epidemic period and adoption of a strict case classification system can minimize outcome misclassification when evaluating the effectiveness of influenza vaccine against ILI, even if influenza and RSV cocirculate in the same season.
Collapse
Affiliation(s)
- Keiko Nakata
- a Department of Public Health ; Osaka City University Graduate School of Medicine ; Osaka , Japan
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Sinnott JA, Dai W, Liao KP, Shaw SY, Ananthakrishnan AN, Gainer VS, Karlson EW, Churchill S, Szolovits P, Murphy S, Kohane I, Plenge R, Cai T. Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records. Hum Genet 2014; 133:1369-82. [PMID: 25062868 PMCID: PMC4185241 DOI: 10.1007/s00439-014-1466-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 06/29/2014] [Indexed: 01/04/2023]
Abstract
To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case-control status, and this estimated case-control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies.
Collapse
Affiliation(s)
- Jennifer A Sinnott
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, 02115, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Sal Y Rosas VG, Hughes JP. Nonparametric and Semiparametric Analysis of Current Status Data Subject to Outcome Misclassification. Stat Commun Infect Dis 2010; 2010:364. [PMID: 22408713 PMCID: PMC3298195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In this article, we present nonparametric and semiparametric methods to analyze current status data subject to outcome misclassification. Our methods use nonparametric maximum likelihood estimation (NPMLE) to estimate the distribution function of the failure time when sensitivity and specificity are known and may vary among subgroups. A nonparametric test is proposed for the two sample hypothesis testing. In regression analysis, we apply the Cox proportional hazard model and likelihood ratio based confidence intervals for the regression coefficients are proposed. Our methods are motivated and demonstrated by data collected from an infectious disease study in Seattle, WA.
Collapse
|