1
|
Boe LA, Lumley T, Shaw PA. Practical Considerations for Sandwich Variance Estimation in 2-Stage Regression Settings. Am J Epidemiol 2024; 193:798-810. [PMID: 38012109 PMCID: PMC11484631 DOI: 10.1093/aje/kwad234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 11/29/2023] Open
Abstract
In this paper, we present a practical approach for computing the sandwich variance estimator in 2-stage regression model settings. As a motivating example for 2-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has rarely been applied in regression calibration, despite its requiring less computation time than popular resampling approaches for variance estimation, specifically the bootstrap. This is probably because it requires specialized statistical coding. Here we first outline the steps needed to compute the sandwich variance estimator. We then develop a convenient method of computation in R for sandwich variance estimation, which leverages standard regression model outputs and existing R functions and can be applied in the case of a simple random sample or complex survey design. We use a simulation study to compare the sandwich estimator to a resampling variance approach for both settings. Finally, we further compare these 2 variance estimation approaches in data examples from the Women's Health Initiative (1993-2005) and the Hispanic Community Health Study/Study of Latinos (2008-2011). In our simulations, the sandwich variance estimator typically had good numerical performance, but simple Wald bootstrap confidence intervals were unstable or overcovered in certain settings, particularly when there was high correlation between covariates or large measurement error.
Collapse
Affiliation(s)
- Lillian A Boe
- Correspondence to Dr. Lillian A. Boe, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 633 3rd Avenue, 3rd Floor, New York, NY 10017 (e-mail: )
| | | | | |
Collapse
|
2
|
Zhang Y, Dai R, Huang Y, Prentice R, Zheng C. USING SIMULTANEOUS REGRESSION CALIBRATION TO STUDY THE EFFECT OF MULTIPLE ERROR-PRONE EXPOSURES ON DISEASE RISK UTILIZING BIOMARKERS DEVELOPED FROM A CONTROLLED FEEDING STUDY. Ann Appl Stat 2024; 18:125-143. [PMID: 38313601 PMCID: PMC10836829 DOI: 10.1214/23-aoas1782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.
Collapse
Affiliation(s)
- Yiwen Zhang
- Zilber School of Public Health, University of Wisconsin-Milwaukee
| | - Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center
| | - Ying Huang
- Public Health Science Division, Fred Hutchinson Cancer Research Center
| | - Ross Prentice
- Public Health Science Division, Fred Hutchinson Cancer Research Center
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center
| |
Collapse
|
3
|
Zhang Y, Dai R, Huang Y, Prentice RL, Zheng C. Regression calibration utilizing biomarkers developed from high-dimensional metabolites. Front Nutr 2023; 10:1215768. [PMID: 37599686 PMCID: PMC10433218 DOI: 10.3389/fnut.2023.1215768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 07/17/2023] [Indexed: 08/22/2023] Open
Abstract
Addressing systematic measurement errors in self-reported data is a critical challenge in association studies of dietary intake and chronic disease risk. The regression calibration method has been utilized for error correction when an objectively measured biomarker is available; however, biomarkers for only a few dietary components have been developed. This paper proposes to use high-dimensional objective measurements to construct biomarkers for many more dietary components and to estimate the diet disease associations. It also discusses the challenges in variance estimation in high-dimensional regression methods and presents a variety of techniques to address this issue, including cross-validation, degrees-of-freedom corrected estimators, and refitted cross-validation (RCV). Extensive simulation is performed to study the finite sample performance of the proposed estimators. The proposed method is applied to the Women's Health Initiative cohort data to examine the associations between the sodium/potassium intake ratio and the total cardiovascular disease.
Collapse
Affiliation(s)
- Yiwen Zhang
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, United States
| | - Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE, United States
| | - Ying Huang
- Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, United States
| | - Ross L. Prentice
- Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, United States
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
4
|
Using Controlled Feeding Study for Biomarker Development in Regression Calibration for Disease Association Estimation. STATISTICS IN BIOSCIENCES 2022. [DOI: 10.1007/s12561-022-09349-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
5
|
Boe LA, Tinker LF, Shaw PA. An approximate quasi-likelihood approach for error-prone failure time outcomes and exposures. Stat Med 2021; 40:5006-5024. [PMID: 34519082 PMCID: PMC8963256 DOI: 10.1002/sim.9108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 04/21/2021] [Accepted: 06/03/2021] [Indexed: 11/08/2022]
Abstract
Measurement error arises commonly in clinical research settings that rely on data from electronic health records or large observational cohorts. In particular, self-reported outcomes are typical in cohort studies for chronic diseases such as diabetes in order to avoid the burden of expensive diagnostic tests. Dietary intake, which is also commonly collected by self-report and subject to measurement error, is a major factor linked to diabetes and other chronic diseases. These errors can bias exposure-disease associations that ultimately can mislead clinical decision-making. We have extended an existing semiparametric likelihood-based method for handling error-prone, discrete failure time outcomes to also address covariate error. We conduct an extensive numerical study to compare the proposed method to the naive approach that ignores measurement error in terms of bias and efficiency in the estimation of the regression parameter of interest. In all settings considered, the proposed method showed minimal bias and maintained coverage probability, thus outperforming the naive analysis which showed extreme bias and low coverage. This method is applied to data from the Women's Health Initiative to assess the association between energy and protein intake and the risk of incident diabetes mellitus. Our results show that correcting for errors in both the self-reported outcome and dietary exposures leads to considerably different hazard ratio estimates than those from analyses that ignore measurement error, which demonstrates the importance of correcting for both outcome and covariate error.
Collapse
Affiliation(s)
- Lillian A. Boe
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Lesley F. Tinker
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| |
Collapse
|
6
|
Oh EJ, Shepherd BE, Lumley T, Shaw PA. Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error. Stat Med 2021; 40:631-649. [PMID: 33140432 PMCID: PMC7874496 DOI: 10.1002/sim.8793] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 08/05/2020] [Accepted: 10/11/2020] [Indexed: 11/11/2022]
Abstract
Medical studies that depend on electronic health records (EHR) data are often subject to measurement error, as the data are not collected to support research questions under study. These data errors, if not accounted for in study analyses, can obscure or cause spurious associations between patient exposures and disease risk. Methodology to address covariate measurement error has been well developed; however, time-to-event error has also been shown to cause significant bias, but methods to address it are relatively underdeveloped. More generally, it is possible to observe errors in both the covariate and the time-to-event outcome that are correlated. We propose regression calibration (RC) estimators to simultaneously address correlated error in the covariates and the censored event time. Although RC can perform well in many settings with covariate measurement error, it is biased for nonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimators which are consistent estimators of the parameter defined by the population estimating equation. Raking can improve upon RC in certain settings with failure-time data, require no explicit modeling of the error structure, and can be utilized under outcome-dependent sampling designs. We discuss features of the underlying estimation problem that affect the degree of improvement the raking estimator has over the RC approach. Detailed simulation studies are presented to examine the performance of the proposed estimators under varying levels of signal, error, and censoring. The methodology is illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
Collapse
Affiliation(s)
- Eric J. Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
7
|
Shepherd BE, Shaw PA. Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities. STATISTICAL COMMUNICATIONS IN INFECTIOUS DISEASES 2020; 12:20190015. [PMID: 35880997 PMCID: PMC9204761 DOI: 10.1515/scid-2019-0015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 08/21/2020] [Indexed: 06/15/2023]
Abstract
Objectives: Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data.Methods: Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study.Results/Conclusion: We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
Collapse
Affiliation(s)
- Bryan E. Shepherd
- Biostatistics, Vanderbilt University, 2525 West End, Suite 11000, 37203Nashville, Tennessee, USA
| | - Pamela A. Shaw
- Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
8
|
Chen B, Yuan A, Yi GY. Variable selection for proportional hazards models with high‐dimensional covariates subject to measurement error. CAN J STAT 2020. [DOI: 10.1002/cjs.11568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Baojiang Chen
- Department of Biostatistics and Data Sciences University of Texas Health Science Center at Houston, School of Public Health in Austin Austin TX U.S.A
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics and Biomathematics Georgetown University Washington DC U.S.A
| | - Grace Y. Yi
- Department of Statistical and Actuarial Sciences Department of Computer Science, University of Western Ontario London Ontario Canada
| |
Collapse
|
9
|
Keogh RH, Shaw PA, Gustafson P, Carroll RJ, Deffner V, Dodd KW, Küchenhoff H, Tooze JA, Wallace MP, Kipnis V, Freedman LS. STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology: Part 1-Basic theory and simple methods of adjustment. Stat Med 2020; 39:2197-2231. [PMID: 32246539 PMCID: PMC7450672 DOI: 10.1002/sim.8532] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/25/2020] [Accepted: 02/28/2020] [Indexed: 11/11/2022]
Abstract
Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.
Collapse
Affiliation(s)
- Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Pamela A Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Paul Gustafson
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas, USA
- School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway, New South Wales, Australia
| | - Veronika Deffner
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
| | - Kevin W Dodd
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Helmut Küchenhoff
- Department of Statistics, Statistical Consulting Unit StaBLab, Ludwig-Maximilians-Universität, Munich, Germany
| | - Janet A Tooze
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Michael P Wallace
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Victor Kipnis
- Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, USA
| | - Laurence S Freedman
- Biostatistics and Biomathematics Unit, Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel
- Information Management Services Inc., Rockville, Maryland, USA
| |
Collapse
|
10
|
Giganti MJ, Shaw PA, Chen G, Bebawy SS, Turner MM, Sterling TR, Shepherd BE. ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION. Ann Appl Stat 2020; 14:1045-1061. [PMID: 32999698 PMCID: PMC7523695 DOI: 10.1214/20-aoas1343] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Data from electronic health records (EHR) are prone to errors, which are often correlated across multiple variables. The error structure is further complicated when analysis variables are derived as functions of two or more error-prone variables. Such errors can substantially impact estimates, yet we are unaware of methods that simultaneously account for errors in covariates and time-to-event outcomes. Using EHR data from 4217 patients, the hazard ratio for an AIDS-defining event associated with a 100 cell/mm3 increase in CD4 count at ART initiation was 0.74 (95%CI: 0.68-0.80) using unvalidated data and 0.60 (95%CI: 0.53-0.68) using fully validated data. Our goal is to obtain unbiased and efficient estimates after validating a random subset of records. We propose fitting discrete failure time models to the validated subsample and then multiply imputing values for unvalidated records. We demonstrate how this approach simultaneously addresses dependent errors in predictors, time-to-event outcomes, and inclusion criteria. Using the fully validated dataset as a gold standard, we compare the mean squared error of our estimates with those from the unvalidated dataset and the corresponding subsample-only dataset for various subsample sizes. By incorporating reasonably sized validated subsamples and appropriate imputation models, our approach had improved estimation over both the naive analysis and the analysis using only the validation subsample.
Collapse
Affiliation(s)
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin
| | | | | | | | | |
Collapse
|
11
|
Korth AL, Bhutani S, Neuhouser ML, Beresford SA, Snetselaar L, Tinker LF, Schoeller DA. Comparison of Methods Used to Correct Self-Reported Protein Intake for Systematic Variation in Reported Energy Intake Using Quantitative Biomarkers of Dietary Intake. J Nutr 2020; 150:1330-1336. [PMID: 32030414 PMCID: PMC7198304 DOI: 10.1093/jn/nxaa007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 09/30/2019] [Accepted: 01/08/2020] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Multiple methods of correcting nutrient intake for misreported energy intake have been proposed but have not been extensively compared. The availability of the Women's Health Initiative (WHI) data set, which includes several objective recovery biomarkers, offers an opportunity to compare these corrections with respect to protein intake. OBJECTIVE We compared 5 energy-correction methods for self-reported dietary protein against urinary nitrogen-derived protein intake. METHODS As part of the WHI Nutritional Biomarkers Study (NBS) 544 participants (50- to 80-y-old women) completed a FFQ and biomarker assessments using doubly labeled water (DLW) for total energy expenditure (TEE) and 24-h urinary nitrogen. Correction methods evaluated were as follows: 1) DLW-TEE; 2) the Institute of Medicine's (IOM's) estimated energy requirement (EER) TEE prediction equation based on sex, height, weight, and age; 3) published NBS total energy TEE prediction (WHI-NBS-TEE) using age, BMI, race, and income; 4) reported protein versus reported energy linear regression-based residual method; and 5) a Goldberg cutoff to exclude subjects reporting energy intakes <1.35 times their basal metabolic rate. Efficacy was evaluated using correlations obtained by regressing corrected protein against biomarker protein (6.25 × urinary nitrogen/0.81). RESULTS Unadjusted self-reported protein intake from the FFQ (mean = 66.7 g) correlated weakly (r = 0.31) with biomarker protein (mean = 74.9 g). DLW-TEE-corrected self-reported protein intake (mean = 90.7 g) had the strongest correlation with biomarker protein (r = 0.47). Other energy corrections yielded lower, but still significant correlations: EER, r = 0.44 (mean = 92.1 g); WHI-NBS-TEE, r = 0.37 (mean = 90.4 g); Goldberg cutoff, r = 0.36 (mean = 88.4 g); and residual method, r = 0.35 (mean = 66.7 g). CONCLUSIONS Our data indicate that proportional correction of reported protein intake using a measure of energy requirement from DLW-TEE or IOM-EER performed modestly better than other methods in this cohort. These energy adjustments, however, yielded corrected protein exceeding the biomarker protein, indicating that energy adjustment alone does not eliminate all self-reported protein reporting bias.
Collapse
Affiliation(s)
- Amy L Korth
- Department of Nutritional Sciences, University of Wisconsin, Madison, WI, USA
- School of Medicine and Public Health, University of Wisconsin, Madison, WI, USA
| | - Surabhi Bhutani
- Department of Nutritional Sciences, University of Wisconsin, Madison, WI, USA
- School of Exercise and Nutritional Sciences, San Diego State University, San Diego, CA, USA
| | - Marian L Neuhouser
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - Linda Snetselaar
- Department of Epidemiology, University of Iowa, Iowa City, IA, USA
| | - Lesley F Tinker
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Dale A Schoeller
- Department of Nutritional Sciences, University of Wisconsin, Madison, WI, USA
| |
Collapse
|
12
|
Gu X, Ma Y, Balasubramanian R. SEMIPARAMETRIC TIME TO EVENT MODELS IN THE PRESENCE OF ERROR-PRONE, SELF-REPORTED OUTCOMES-WITH APPLICATION TO THE WOMEN'S HEALTH INITIATIVE. Ann Appl Stat 2015; 9:714-730. [PMID: 26834908 PMCID: PMC4729390 DOI: 10.1214/15-aoas810] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The onset of several silent, chronic diseases such as diabetes can be detected only through diagnostic tests. Due to cost considerations, self-reported outcomes are routinely collected in lieu of expensive diagnostic tests in large-scale prospective investigations such as the Women's Health Initiative. However, self-reported outcomes are subject to imperfect sensitivity and specificity. Using a semiparametric likelihood-based approach, we present time to event models to estimate the association of one or more covariates with a error-prone, self-reported outcome. We present simulation studies to assess the effect of error in self-reported outcomes with regard to bias in the estimation of the regression parameter of interest. We apply the proposed methods to prospective data from 152,830 women enrolled in the Women's Health Initiative to evaluate the effect of statin use with the risk of incident diabetes mellitus among postmenopausal women. The current analysis is based on follow-up through 2010, with a median duration of follow-up of 12.1 years. The methods proposed in this paper are readily implemented using our freely available R software package icensmis, which is available at the Comprehensive R Archive Network (CRAN) website.
Collapse
Affiliation(s)
- Xiangdong Gu
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Yunsheng Ma
- Department of Medicine, Division of Preventive and Behavioral Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01655, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, Massachusetts 01003, USA
| |
Collapse
|
13
|
Abstract
INTRODUCTION AND AIMS To provide a current perspective on nutrition and physical activity influence on breast cancer. METHODS AND RESULTS A comprehensive literature review was conducted and selective presentation of findings follows. While some observational studies have associated higher dietary fat intake with higher breast cancer incidence, two full-scale randomized, clinical trials of dietary fat intake reduction programs were negative. However, a lifestyle intervention targeting fat intake reduction in the Women's Intervention Nutrition Study (WINS), resulted in weight loss and also reduced breast cancer recurrences in women with early stage disease. Observational studies evaluating specific nutrient intakes and dietary supplements have provided mixed results. Several observational studies find women with early stage breast cancer with lower 25-hydroxyvitamin D levels at higher recurrence risk, a finding requiring cautious interpretation. The lifestyle factor most strongly and consistently associated with both breast cancer incidence and breast cancer recurrence risk is physical activity. A meta-analyses of observational studies supports the concept that moderate recreational physical activity (about 3-4 h walking per week) may reduce breast cancer incidence and that women with early stage breast cancer who increased or maintain their physical activity may have lower recurrence risk as well. Feasibility of achieving increased physical activity and weight loss in women with early-stage breast cancer has been established. Two full-scale randomized clinical trials are evaluating weight loss/maintenance and increased physical activity in relation to recurrence risk in women with early-stage, resected breast cancer. DISCUSSION/CONCLUSIONS Dietary intake may influence breast cancer but influence is difficult to separate from influence of body weight. A consistent body of observational study evidence suggests higher physical activity has favorable influence on breast cancer incidence and outcome. While awaiting definitive evidence from ongoing randomized trials, breast cancer patients can reasonably be counseled to avoid weight gain and reduce body weight if overweight or obese and increase or maintain a moderate level of physical activity.
Collapse
Affiliation(s)
- Rowan T Chlebowski
- Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, 1124 W. Carson Street, Building J-3, Torrance, CA 90502, USA.
| |
Collapse
|
14
|
Prentice RL, Pettinger M, Tinker LF, Huang Y, Thomson CA, Johnson KC, Beasley J, Anderson G, Shikany JM, Chlebowski RT, Neuhouser ML. Regression calibration in nutritional epidemiology: example of fat density and total energy in relationship to postmenopausal breast cancer. Am J Epidemiol 2013; 178:1663-72. [PMID: 24064741 DOI: 10.1093/aje/kwt198] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Regression calibration using biomarkers provides an attractive approach to strengthening nutritional epidemiology. We consider this approach to assessing the relationship of fat and total energy consumption with postmenopausal breast cancer. In analyses that included fat density data, biomarker-calibrated total energy was positively associated with postmenopausal breast cancer incidence in cohorts of the US Women's Health Initiative from 1994-2010. The estimated hazard ratio for a 20% increment in calibrated food frequency questionnaire (FFQ) energy was 1.22 (95% confidence interval (CI): 1.15, 1.30). This association was not evident without biomarker calibration, and it ceased to be apparent following control for body mass index (weight (kg)/height (m)(2)), suggesting that the association is mediated by body fat deposition over time. The hazard ratio for a corresponding 40% increment in FFQ fat density was 1.05 (95% CI: 1.00, 1.09). A stronger fat density association, with a hazard ratio of 1.19 (95% CI: 1.00, 1.41), emerged from analyses that used 4-day food records for dietary assessment. FFQ-based analyses were also carried out by using a second dietary assessment in place of the biomarker for calibration. This type of calibration did not correct for systematic bias in energy assessment, but may be able to accommodate the "noise" component of dietary measurement error. Implications for epidemiologic applications more generally are described.
Collapse
|