Borkowf CB, Albert PS. Efficient estimation of the risk of a disease by quantile-categories of a key predictor variable using generalized additive models.
Stat Med 2005;
24:623-45. [PMID:
15678439 DOI:
10.1002/sim.2041]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Suppose that one wishes to make inference to the risk of a disease by the population quartile-categories of a key continuous predictor variable. When one collects data on a prospective cohort, the standard method is simply to categorize the key predictor variable by the empirical quartiles. One may then include indicator variables for these empirical quartile-categories as predictors, along with other covariates, in a generalized linear model (GLM), with the observed health status of each subject as the response. The standard GLM method, however, is relatively inefficient, because it treats all observations that fall in the same quartile-category of the predictor variable identically, regardless of whether they lie in the centre or near the boundaries of that category. Alternatively, one may include the key predictor variable, along with other covariates, in a generalized additive model (GAM), again with the observed health status of each subject as the response. The alternative GAM method non-parametrically estimates the functional relationship between the key predictor variable and the response. One may then compute statistics of interest, such as proportions and odds ratios, from the fitted GAM equation using the empirical quartile-categories. Simulations show that both the GLM and GAM methods are nearly unbiased, but the latter method produces smaller variances and narrower bootstrap confidence intervals. An example from nutritional epidemiology illustrates the use of these methods.
Collapse