1
|
Amorim G, Tao R, Lotspeich S, Shaw PA, Lumley T, Patel RC, Shepherd BE. Three-phase generalized raking and multiple imputation estimators to address error-prone data. Stat Med 2024; 43:379-394. [PMID: 37987515 PMCID: PMC10842111 DOI: 10.1002/sim.9967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 09/23/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023]
Abstract
Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to inefficient estimators since the information available from intermediate validation steps is only partially considered or even completely ignored. In this paper, we present two novel extensions of multiple imputation and generalized raking estimators that make full use of all available data. We show through simulations that incorporating information from intermediate steps can lead to substantial gains in efficiency. This work is motivated by and illustrated in a study of contraceptive effectiveness among 83 671 women living with HIV, whose data were originally extracted from electronic medical records, of whom 4732 had their charts reviewed, and a subsequent 1210 also had a telephone interview to validate key study variables.
Collapse
Affiliation(s)
- Gustavo Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Sarah Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, North Carolina, USA
| | - Pamela A Shaw
- Biostatistcs Division, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Rena C Patel
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Bryan E Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
2
|
Schuermans A, Nakao T, Uddin MM, Hornsby W, Ganesh S, Shadyab AH, Liu S, Haring B, Shufelt CL, Taub MA, Mathias RA, Kooperberg C, Reiner AP, Bick AG, Manson JE, Natarajan P, Honigberg MC. Age at Menopause, Leukocyte Telomere Length, and Coronary Artery Disease in Postmenopausal Women. Circ Res 2023; 133:376-386. [PMID: 37489536 PMCID: PMC10528840 DOI: 10.1161/circresaha.123.322984] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/14/2023] [Indexed: 07/26/2023]
Abstract
BACKGROUND Premature menopause is a risk factor for accelerated cardiovascular aging, but underlying mechanisms remain incompletely understood. This study investigated the role of leukocyte telomere length (LTL), a marker of cellular aging and genomic instability, in the association of premature menopause with cardiovascular disease. METHODS Participants from the UK Biobank and Women's Health Initiative with complete reproductive history and LTL measurements were included. Primary analyses tested the association between age at menopause and LTL using multivariable-adjusted linear regression. Secondary analyses stratified women by history of gynecologic surgery. Mendelian randomization was used to infer causal relationships between LTL and age at natural menopause. Multivariable-adjusted Cox regression and mediation analyses tested the joint associations of premature menopause and LTL with incident coronary artery disease. RESULTS This study included 130 254 postmenopausal women (UK Biobank: n=122 224; Women's Health Initiative: n=8030), of whom 4809 (3.7%) had experienced menopause before age 40. Earlier menopause was associated with shorter LTL (meta-analyzed ß=-0.02 SD/5 years of earlier menopause [95% CI, -0.02 to -0.01]; P=7.2×10-12). This association was stronger and significant in both cohorts for women with natural/spontaneous menopause (meta-analyzed ß=-0.04 SD/5 years of earlier menopause [95% CI, -0.04 to -0.03]; P<2.2×10-16) and was independent of hormone therapy use. Mendelian randomization supported a causal association of shorter genetically predicted LTL with earlier age at natural menopause. LTL and age at menopause were independently associated with incident coronary artery disease, and mediation analyses indicated small but significant mediation effects of LTL in the association of menopausal age with coronary artery disease. CONCLUSIONS Earlier age at menopause is associated with shorter LTL, especially among women with natural menopause. Accelerated telomere shortening may contribute to the heightened cardiovascular risk associated with premature menopause.
Collapse
Affiliation(s)
- Art Schuermans
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Faculty of Medicine, KU Leuven, Leuven, Belgium
| | - Tetsushi Nakao
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Md Mesbah Uddin
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Whitney Hornsby
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Shriie Ganesh
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Aladdin H. Shadyab
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Simin Liu
- Department of Epidemiology and Brown Center for Global Cardiometabolic Health, Brown University, Providence, RI, USA
| | - Bernhard Haring
- Department of Medicine III, Saarland University Medical Center, Homburg, Saarland, Germany
- Department of Medicine I, University of Wuerzburg, Bavaria, Germany
| | - Chrisandra L. Shufelt
- Division of Internal Medicine, Women’s Health Research Center, Mayo Clinic, Jacksonville, Florida
| | - Margaret A. Taub
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Rasika A. Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Alexander P. Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Alexander G. Bick
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - JoAnn E. Manson
- Division of Preventive Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cardiology Division, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Michael C. Honigberg
- Program in Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cardiology Division, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
3
|
Rivera-Rodriguez C, Clark TC, Fleming T, Archer D, Crengle S, Peiris-John R, Lewycka S. National estimates from the Youth '19 Rangatahi smart survey: A survey calibration approach. PLoS One 2021; 16:e0251177. [PMID: 33989300 PMCID: PMC8121344 DOI: 10.1371/journal.pone.0251177] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 04/21/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Significant progress has been made addressing adolescent health needs in New Zealand, but some areas, such as mental health issues remain, particularly for rangatahi Māori (indigenous Māori young people). Little is known about how contemporary Māori whānau (families) and communities influence health outcomes, health literacy and access to services. Previous nationally representative secondary school surveys were conducted in New Zealand in 2001, 2007 and 2012, as part of the Youth2000 survey series. This paper focuses on a fourth survey conducted in 2019 (https://www.youth19.ac.nz/). In 2019, the survey also included kura kaupapa Māori schools (Māori language immersion schools), and questions exploring the role of family connections in health and wellbeing. This paper presents the overall study methodology, and a weighting and calibration framework in order to provide estimates that reflect the national student population, and enable comparisons with the previous surveys to monitor trends. METHODS Youth19 was a cross sectional, self-administered health and wellbeing survey of New Zealand high school students. The target population was the adolescent population of New Zealand (school years 9-13). The study population was drawn from three education regions: Auckland, Tai Tokerau (Northland) and Waikato. These are the most ethnically diverse regions in New Zealand. The sampling design was two-stage clustered stratified, where schools were the clusters, and strata were defined by kura schools and educational regions. There were four strata, formed as follows: kura schools (Tai Tokerau, Auckland and Waikato regions combined), mainstream-Auckland, mainstream-Tai Tokerau and mainstream-Waikato. From each stratum, 50% of the schools were randomly sampled and then 30% of students from the selected schools were invited to participate. All students in the kura kaupapa schools were invited to participate. In order to make more precise estimates and adjust for differential non-response, as well as to make nationally relevant estimates and allow comparisons with the previous national surveys, we calibrated the sampling weights to reflect the national secondary school student population. RESULTS There were 45 mainstream and 4 kura schools included in the final sample, and 7,374 mainstream and 347 kura students participated in the survey. There were differences between the sampled population and the national secondary school student population, particularly in terms of sex and ethnicity, with a higher proportion of females and Asian students in the study sample than in the national student population. We calculated estimates of the totals and proportions for key variables that describe risk and protective factors or health and wellbeing factors. Rates of risk-taking behaviours were lower in the sampled population than what would be expected nationally, based on the demographic profile of the national student population. For the regional estimates, calibrated weights yield standard errors lower than those obtained with the unadjusted sampling weights. This leads to significantly narrower confidence intervals for all the variables in the analysis. The calibrated estimates of national quantities provide similar results. Additionally, the national estimates for 2019 serve as a tool to compare to previous surveys, where the sampling population was national. CONCLUSIONS One of the main goals of this paper is to improve the estimates at the regional level using calibrated weights to adjust for oversampling of some groups, or non-response bias. Additionally, we also recommend the use of calibrated estimators as they provide nationally adjusted estimates, which allow inferences about the whole adolescent population of New Zealand. They also yield confidence intervals that are significantly narrower than those obtained using the original sampling weights.
Collapse
Affiliation(s)
- C. Rivera-Rodriguez
- Department of Statistics, The University of Auckland, Auckland, New Zealand
- * E-mail: ,
| | - T. C. Clark
- School of Nursing, University of Auckland, Auckland, New Zealand
| | - T. Fleming
- School of Health, Victoria University of Wellington, Wellington, New Zealand
| | - D. Archer
- School of Health, Victoria University of Wellington, Wellington, New Zealand
| | - S. Crengle
- Department of Preventive and Social Medicine, University of Otago, Dunedin, New Zealand
| | - R. Peiris-John
- Department of Epidemiology and Biostatistics, The University of Auckland, Auckland, New Zealand
| | - S. Lewycka
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| |
Collapse
|
4
|
Rivera-Rodriguez C, Cheung G, Cullum S. Using Big Data to Estimate Dementia Prevalence in New Zealand: Protocol for an Observational Study. JMIR Res Protoc 2021; 10:e20225. [PMID: 33404510 PMCID: PMC7817360 DOI: 10.2196/20225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/06/2020] [Accepted: 11/24/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Dementia describes a cluster of symptoms that includes memory loss; difficulties with thinking, problem solving, or language; and functional impairment. Dementia can be caused by a number of neurodegenerative diseases, such as Alzheimer disease and cerebrovascular disease. Currently in New Zealand, most of the systematically collected and detailed information on dementia is obtained through a suite of International Residential Assessment Instrument (interRAI) assessments, including the home care, contact assessment, and long-term care facility versions. These versions of interRAI are standardized comprehensive geriatric assessments. Patients are referred to have an interRAI assessment by the Needs Assessment and Service Coordination (NASC) services after a series of screening processes. Previous estimates of the prevalence and costs of dementia in New Zealand have been based on international studies with different populations and health and social care systems. This new local knowledge will have implications for estimating the demographic distribution and socioeconomic impact of dementia in New Zealand. OBJECTIVE This study investigates the prevalence of dementia, risk factors for dementia, and drivers of the informal cost of dementia among people registered in the NASC database in New Zealand. METHODS This study aims to analyze secondary data routinely collected by the NASC and interRAI (home care and contact assessment versions) databases between July 1, 2014, and July 1, 2019, in New Zealand. The databases will be linked to produce an integrated data set, which will be used to (1) investigate the sociodemographic and clinical risk factors associated with dementia and other neurological conditions, (2) estimate the prevalence of dementia using weighting methods for complex samples, and (3) identify the cost of informal care per client (in number of hours of care provided by unpaid carers) and the drivers of such costs. We will use design-based survey methods for the estimation of prevalence and generalized estimating equations for regression models and correlated and longitudinal data. RESULTS The results will provide much needed statistics regarding dementia prevalence and risk factors and the cost of informal care for people living with dementia in New Zealand. Potential health inequities for different ethnic groups will be highlighted, which can then be used by decision makers to inform the development of policy and practice. CONCLUSIONS As of November 2020, there were no dementia prevalence studies or studies on informal care costs of dementia using national data from New Zealand. All existing studies have used data from other populations with substantially different demographic distributions. This study will give insight into the actual prevalence, risk factors, and informal care costs of dementia for the population with support needs in New Zealand. It will provide valuable information to improve health outcomes and better inform policy and planning. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/20225.
Collapse
Affiliation(s)
| | - Gary Cheung
- Department of Psychological Medicine, University of Auckland, Auckland, New Zealand
| | - Sarah Cullum
- Department of Psychological Medicine, University of Auckland, Auckland, New Zealand
| |
Collapse
|
5
|
Honigberg MC, Zekavat SM, Niroula A, Griffin GK, Bick AG, Pirruccello JP, Nakao T, Whitsel EA, Farland LV, Laurie C, Kooperberg C, Manson JE, Gabriel S, Libby P, Reiner AP, Ebert BL, Natarajan P. Premature Menopause, Clonal Hematopoiesis, and Coronary Artery Disease in Postmenopausal Women. Circulation 2020; 143:410-423. [PMID: 33161765 DOI: 10.1161/circulationaha.120.051775] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
BACKGROUND Premature menopause is an independent risk factor for cardiovascular disease in women, but mechanisms underlying this association remain unclear. Clonal hematopoiesis of indeterminate potential (CHIP), the age-related expansion of hematopoietic cells with leukemogenic mutations without detectable malignancy, is associated with accelerated atherosclerosis. Whether premature menopause is associated with CHIP is unknown. METHODS We included postmenopausal women from the UK Biobank (n=11 495) aged 40 to 70 years with whole exome sequences and from the Women's Health Initiative (n=8111) aged 50 to 79 years with whole genome sequences. Premature menopause was defined as natural or surgical menopause occurring before age 40 years. Co-primary outcomes were the presence of any CHIP and CHIP with variant allele frequency >0.1. Logistic regression tested the association of premature menopause with CHIP, adjusted for age, race, the first 10 principal components of ancestry, smoking, diabetes, and hormone therapy use. Secondary analyses considered natural versus surgical premature menopause and gene-specific CHIP subtypes. Multivariable-adjusted Cox models tested the association between CHIP and incident coronary artery disease. RESULTS The sample included 19 606 women, including 418 (2.1%) with natural premature menopause and 887 (4.5%) with surgical premature menopause. Across cohorts, CHIP prevalence in postmenopausal women with versus without a history of premature menopause was 8.8% versus 5.5% (P<0.001), respectively. After multivariable adjustment, premature menopause was independently associated with CHIP (all CHIP: odds ratio, 1.36 [95% 1.10-1.68]; P=0.004; CHIP with variant allele frequency >0.1: odds ratio, 1.40 [95% CI, 1.10-1.79]; P=0.007). Associations were larger for natural premature menopause (all CHIP: odds ratio, 1.73 [95% CI, 1.23-2.44]; P=0.001; CHIP with variant allele frequency >0.1: odds ratio, 1.91 [95% CI, 1.30-2.80]; P<0.001) but smaller and nonsignificant for surgical premature menopause. In gene-specific analyses, only DNMT3A CHIP was significantly associated with premature menopause. Among postmenopausal middle-aged women, CHIP was independently associated with incident coronary artery disease (hazard ratio associated with all CHIP: 1.36 [95% CI, 1.07-1.73]; P=0.012; hazard ratio associated with CHIP with variant allele frequency >0.1: 1.48 [95% CI, 1.13-1.94]; P=0.005). CONCLUSIONS Premature menopause, especially natural premature menopause, is independently associated with CHIP among postmenopausal women. Natural premature menopause may serve as a risk signal for predilection to develop CHIP and CHIP-associated cardiovascular disease.
Collapse
Affiliation(s)
- Michael C Honigberg
- Cardiology Division (M.C.H., J.P.P., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Department of Medicine (M.C.H., J.P.P., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Cardiovascular Research Center and Center for Genomic Medicine (M.C.H., J.P.P., T.N., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.)
| | - Seyedeh M Zekavat
- Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.).,Yale University School of Medicine, New Haven, CT (S.M.Z.)
| | - Abhishek Niroula
- Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.).,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA (A.N., T.N., B.L.E.)
| | - Gabriel K Griffin
- Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.).,Department of Pathology (G.K.G., T.N.), Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Alexander G Bick
- Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.).,Division of Genetic Medicine, Department of Medicine, Vanderbilt University, Nashville, TN (A.G.B.)
| | - James P Pirruccello
- Cardiology Division (M.C.H., J.P.P., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Department of Medicine (M.C.H., J.P.P., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Cardiovascular Research Center and Center for Genomic Medicine (M.C.H., J.P.P., T.N., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.)
| | - Tetsushi Nakao
- Cardiovascular Research Center and Center for Genomic Medicine (M.C.H., J.P.P., T.N., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.).,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA (A.N., T.N., B.L.E.).,Department of Pathology (G.K.G., T.N.), Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Eric A Whitsel
- Gillings School of Global Public Health and School of Medicine, University of Chapel Hill, NC (E.A.W.)
| | - Leslie V Farland
- Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson (L.V.F.)
| | - Cecelia Laurie
- Department of Biostatistics, University of Washington, Seattle (C.L.)
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA (C.K., A.P.R.)
| | - JoAnn E Manson
- Division of Preventive Medicine (J.E.M.), Brigham and Women's Hospital, Harvard Medical School, Boston, MA.,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA (J.E.M.)
| | - Stacey Gabriel
- Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.)
| | - Peter Libby
- Cardiovascular Division, Brigham and Women's Hospital Heart & Vascular Center, Boston, MA (P.L.)
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA (C.K., A.P.R.)
| | - Benjamin L Ebert
- Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.).,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA (A.N., T.N., B.L.E.)
| | | | - Pradeep Natarajan
- Cardiology Division (M.C.H., J.P.P., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Department of Medicine (M.C.H., J.P.P., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Cardiovascular Research Center and Center for Genomic Medicine (M.C.H., J.P.P., T.N., P.N.), Massachusetts General Hospital, Harvard Medical School, Boston.,Broad Institute of Harvard and MIT, Cambridge, MA (M.C.H., S.M.Z., A.N., G.K.G., A.G.B., J.P.P., T.N., S.G., B.L.E., P.N.)
| |
Collapse
|
6
|
Rivera-Rodriguez C, Spiegelman D, Haneuse S. On the analysis of two-phase designs in cluster-correlated data settings. Stat Med 2019; 38:4611-4624. [PMID: 31359448 PMCID: PMC6736737 DOI: 10.1002/sim.8321] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Revised: 06/04/2019] [Accepted: 06/21/2019] [Indexed: 11/06/2022]
Abstract
In public health research, information that is readily available may be insufficient to address the primary question(s) of interest. One cost-efficient way forward, especially in resource-limited settings, is to conduct a two-phase study in which the population is initially stratified, at phase I, by the outcome and/or some categorical risk factor(s). At phase II detailed covariate data is ascertained on a subsample within each phase I strata. While analysis methods for two-phase designs are well established, they have focused exclusively on settings in which participants are assumed to be independent. As such, when participants are naturally clustered (eg, patients within clinics) these methods may yield invalid inference. To address this, we develop a novel analysis approach based on inverse-probability weighting that permits researchers to specify some working covariance structure and appropriately accounts for the sampling design and ensures valid inference via a robust sandwich estimator for which a closed-form expression is provided. To enhance statistical efficiency, we propose a calibrated inverse-probability weighting estimator that makes use of information available at phase I but not used in the design. In addition to describing the technique, practical guidance is provided for the cluster-correlated data settings that we consider. A comprehensive simulation study is conducted to evaluate small-sample operating characteristics, including the impact of using naïve methods that ignore correlation due to clustering, as well as to investigate design considerations. Finally, the methods are illustrated using data from a one-time survey of the national antiretroviral treatment program in Malawi.
Collapse
Affiliation(s)
| | - D. Spiegelman
- Center on Methods for Implementation and Dissemination Science, Department of Biostatistics, Yale University School of Public Health, CT, USA
- Department of Epidemiology, Harvard School of Public Health, MA, USA
- Department of Biostatistics, Harvard School of Public Health, MA, USA
| | - S. Haneuse
- Department of Biostatistics, Harvard School of Public Health, MA, USA
| |
Collapse
|
7
|
Nelson JC, Ulloa-Pérez E, Bobb JF, Maro JC. Leveraging the entire cohort in drug safety monitoring: part 1 methods for sequential surveillance that use regression adjustment or weighting to control confounding in a multisite, rare event, distributed data setting. J Clin Epidemiol 2019; 112:77-86. [PMID: 31108199 DOI: 10.1016/j.jclinepi.2019.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 03/01/2019] [Accepted: 04/04/2019] [Indexed: 10/26/2022]
Abstract
OBJECTIVE Study designs involving self-controlled or exposure-matched samples are commonly used to monitor postmarket vaccine and drug safety, and they use a subset of the available larger cohort. This article overviews group sequential methods designed for observational data safety monitoring that use the whole exposed and unexposed cohorts by implementing regression adjustment or weighting to control confounding. METHODS We summarize what is known about the performance of "whole cohort" methods in multisite health plan data networks such as the Sentinel System of the Food and Drug Administration, where outcomes are rare, individual-level patient data cannot be pooled across sites, site heterogeneity is large, and data are dynamically updated over time. RESULTS Group sequential estimation and testing methods that use regression or weighting can flexibly handle electronic health care data's unpredictability, including an uncertain rate of new product uptake, variable composition of the population over time, and data changes due to dynamic administrative updates. Regression and weighting methods generally have higher power, faster signal detection, and fewer practical challenges compared with some design-based confounder adjustment methods. CONCLUSION Group sequential regression adjustment and weighting approaches are feasible and underused in practice. They leverage more information than designs that involved sampling and increase power to detect rare adverse effects without increasing bias.
Collapse
Affiliation(s)
- Jennifer C Nelson
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA; Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Ernesto Ulloa-Pérez
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA; Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jennifer F Bobb
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA; Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Judith C Maro
- Department of Population Medicine, Harvard Medical School, Boston, MA, USA; Harvard Pilgrim Health Care Institute, Boston, MA, USA
| |
Collapse
|
8
|
A review of analysis methods for secondary outcomes in case-control studies. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2019. [DOI: 10.29220/csam.2019.26.2.103] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
9
|
Improved calibration estimators for the total cost of health programs and application to immunization in Brazil. PLoS One 2019; 14:e0212401. [PMID: 30840645 PMCID: PMC6402677 DOI: 10.1371/journal.pone.0212401] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 02/03/2019] [Indexed: 11/19/2022] Open
Abstract
Multi-stage/level sampling designs have been widely used by survey statisticians as a means of obtaining reliable and efficient estimates at a reasonable implementation cost. This method has been particularly useful in National country-wide surveys to assess the costs of delivering public health programs, which are generally originated in different levels of service management and delivery. Unbiased and efficient estimates of costs are essential to adequately allocate resources and inform policy and planning. In recent years, the global health community has become increasingly interested in estimating the costs of immunization programs. In such programs, part of the cost correspond to vaccines and it is in most countries procured at the central level, while the rest of the costs are incurred in states, municipalities and health facilities, respectively. As such, total program cost is a result of adding these costs, and its variance should account for the relation between the totals at the different levels. An additional challenge is the missing information at the various levels. A variety of methods have been developed to compensate for this missing data. Weighting adjustments are often used to make the estimates consistent with readily-available information. For estimation of total program costs this implies adjusting the estimates at each level to comply with the characteristics of the country. In 2014, A National study to estimate the costs of the Brazilian National Immunization Program was initiated, requested by the Ministry of Health and with the support of international partners. We formulate a quick and useful way to compute the variance and deal with missing values at the various levels. Our approach involves calibrating the weights at each level using additional readily-available information such as the total number of doses administered. Taking the Brazilian immunization costing study as an example, this approach results in substantial gains in both efficiency and precision of the cost estimate.
Collapse
|
10
|
Rivera-Rodriguez C, Haneuse S, Wang M, Spiegelman D. Augmented pseudo-likelihood estimation for two-phase studies. Stat Methods Med Res 2019; 29:344-358. [PMID: 30834815 DOI: 10.1177/0962280219833415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In many public health and medical research settings, information on key covariates may not be readily available or too expensive to gather for all individuals in the study. In such settings, the two-phase design provides a way forward by first stratifying an initial (large) phase I sample on the basis of covariates readily available (including, possibly, the outcome), and sub-sampling participants at phase II to collect the expensive measure(s). When the outcome of interest is binary, several methods have been proposed for estimation and inference for the parameters of a logistic regression model, including weighted likelihood, pseudo-likelihood and maximum likelihood. Although these methods yield consistent estimation and valid inference, they do so solely on the basis of the phase I stratification and the detailed covariate information obtained at phase II. Moreover, they ignore any additional information that is readily available at phase I but was not used as part of the stratified sampling design. Motivated by the potential for efficiency gains, especially concerning parameters corresponding to the additional phase I covariates, we propose a novel augmented pseudo-likelihood estimator for two-phase studies that makes use of all available information. In contrast to recently-proposed weighted likelihood-based methods that calibrate to the influence function of the model of interest, the methods we propose do not require the development of additional models and, therefore, enjoy a degree of robustness. In addition, we expand the broader framework for pseudo-likelihood based estimation and inference to permit link functions for binary regression other than the logit link. Comprehensive simulations, based on a one-time cross sectional survey of 82,887 patients undergoing anti-retroviral therapy in Malawi between 2005 and 2007, illustrate finite sample properties of the proposed methods and compare their performance competing approaches. The proposed method yields the lowest standard errors when the model is correctly specified. Finally, the methods are applied to a large implementation science project examining the effect of an enhanced community health worker program to improve adherence to WHO guidelines for at least four antenatal visits, in Dar es Salaam, Tanzania.
Collapse
Affiliation(s)
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T.H. Chan School of Public Health Boston, MA, USA
| | - Molin Wang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna Spiegelman
- Department of Biostatistics, Harvard T.H. Chan School of Public Health Boston, MA, USA.,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,Department of Biostatistics, Center on Methods for Implementation and Dissemination Science, Yale University School of Public Health, Yale, CT, USA
| |
Collapse
|
11
|
Enders D, Kollhorst B, Engel S, Linder R, Pigeot I. Comparison of multiple imputation and two-phase logistic regression to analyse two-phase case–control studies with rich phase 1: a simulation study. J STAT COMPUT SIM 2018. [DOI: 10.1080/00949655.2018.1452926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Dirk Enders
- Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
| | - Bianca Kollhorst
- Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
| | - Susanne Engel
- Scientific Institute of TK for Benefit and Efficiency in Health Care – WINEG, Hamburg, Germany
| | - Roland Linder
- Scientific Institute of TK for Benefit and Efficiency in Health Care – WINEG, Hamburg, Germany
| | - Iris Pigeot
- Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
- Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
12
|
Noma H, Tanaka S. Analysis of case-cohort designs with binary outcomes: Improving efficiency using whole-cohort auxiliary information. Stat Methods Med Res 2014; 26:691-706. [PMID: 25348675 DOI: 10.1177/0962280214556175] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The case-cohort design has been widely adopted for reducing the cost of covariate measurements in large prospective cohort studies. Under the case-cohort design, complete covariate data are collected only on randomly sampled cases and a subcohort randomly selected from the whole cohort. For the analysis of case-cohort studies with binary outcomes, logistic regression analysis has been routinely used. However, in many applications, certain covariates are readily measured on all samples from the whole cohort, and the case-cohort design may be regarded as a two-phase sampling design. Using this auxiliary covariate information, estimators for the regression parameters can be substantially improved. In this article, we discuss the theoretical basis of the case-cohort design derived from the formulation of the two-phase design and the improved estimators using whole-cohort auxiliary variable information. In particular, we show that the sampling scheme of the case-cohort design is substantially equivalent to that of conventional two-phase case-control studies (also known as two-stage case-control studies for epidemiologists), i.e., the methodologies of two-phase case-control studies can be directly applied to case-cohort data. Under this framework, we review and apply the following improved estimators to the case-cohort design with binary outcomes: (i) weighted estimators, (ii) a semiparametric maximum likelihood estimator, and (iii) a multiple imputation estimator. In addition, based on the framework of the two-phase design, we can obtain risk ratio and risk difference estimators without the rare-disease assumption. We illustrate these methodologies via simulations and the National Wilms Tumor Study data.
Collapse
Affiliation(s)
- Hisashi Noma
- 1 Department of Data Science, The Institute of Statistical Mathematics, Tokyo, Japan.,2 Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, Tokyo, Japan
| | - Shiro Tanaka
- 3 Department of Pharmacoepidemiology, Kyoto University School of Public Health, Kyoto, Japan
| |
Collapse
|