1
|
Samartsidis P, Seaman SR, Harrison A, Alexopoulos A, Hughes GJ, Rawlinson C, Anderson C, Charlett A, Oliver I, De Angelis D. A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes. Biostatistics 2023:kxad030. [PMID: 38058013 DOI: 10.1093/biostatistics/kxad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/15/2023] [Accepted: 10/16/2023] [Indexed: 12/08/2023] Open
Abstract
Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.
Collapse
Affiliation(s)
- Pantelis Samartsidis
- MRC Biostatistics Unit, East Forvie Building, Cambridge Biomedical Campus, Cambridge, CB2 0SR, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, East Forvie Building, Cambridge Biomedical Campus, Cambridge, CB2 0SR, UK
| | | | - Angelos Alexopoulos
- MRC Biostatistics Unit, East Forvie Building, Cambridge Biomedical Campus, Cambridge, CB2 0SR, UK
- Department of Economics, Athens University of Economics and Business, Athens, 104 34, Greece
| | | | | | | | | | | | - Daniela De Angelis
- MRC Biostatistics Unit, East Forvie Building, Cambridge Biomedical Campus, Cambridge, CB2 0SR, UK
- UK Health Security Agency, London, E14 4PU, UK
| |
Collapse
|
2
|
Pascall DJ, Vink E, Blacow R, Bulteel N, Campbell A, Campbell R, Clifford S, Davis C, da Silva Filipe A, El Sakka N, Fjodorova L, Forrest R, Goldstein E, Gunson R, Haughney J, Holden MTG, Honour P, Hughes J, James E, Lewis T, MacLean O, McHugh M, Mollett G, Nyberg T, Onishi Y, Parcell B, Ray S, Robertson DL, Seaman SR, Shabaan S, Shepherd JG, Smollett K, Templeton K, Wastnedge E, Wilkie C, Williams T, Thomson EC. Directions of change in intrinsic case severity across successive SARS-CoV-2 variant waves have been inconsistent. J Infect 2023; 87:128-135. [PMID: 37270070 PMCID: PMC10234362 DOI: 10.1016/j.jinf.2023.05.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 03/27/2023] [Accepted: 05/24/2023] [Indexed: 06/05/2023]
Abstract
OBJECTIVES To determine how the intrinsic severity of successively dominant SARS-CoV-2 variants changed over the course of the pandemic. METHODS A retrospective cohort analysis in the NHS Greater Glasgow and Clyde (NHS GGC) Health Board. All sequenced non-nosocomial adult COVID-19 cases in NHS GGC with relevant SARS-CoV-2 lineages (B.1.177/Alpha, Alpha/Delta, AY.4.2 Delta/non-AY.4.2 Delta, non-AY.4.2 Delta/Omicron, and BA.1 Omicron/BA.2 Omicron) during analysis periods were included. Outcome measures were hospital admission, ICU admission, or death within 28 days of positive COVID-19 test. We report the cumulative odds ratio; the ratio of the odds that an individual experiences a severity event of a given level vs all lower severity levels for the resident and the replacement variant after adjustment. RESULTS After adjustment for covariates, the cumulative odds ratio was 1.51 (95% CI: 1.08-2.11) for Alpha versus B.1.177, 2.09 (95% CI: 1.42-3.08) for Delta versus Alpha, 0.99 (95% CI: 0.76-1.27) for AY.4.2 Delta versus non-AY.4.2 Delta, 0.49 (95% CI: 0.22-1.06) for Omicron versus non-AY.4.2 Delta, and 0.86 (95% CI: 0.68-1.09) for BA.2 Omicron versus BA.1 Omicron. CONCLUSIONS The direction of change in intrinsic severity between successively emerging SARS-CoV-2 variants was inconsistent, reminding us that the intrinsic severity of future SARS-CoV-2 variants remains uncertain.
Collapse
Affiliation(s)
- David J Pascall
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom; Joint Universities Pandemic and Epidemiological Research (JUNIPER) Consortium, United Kingdom.
| | - Elen Vink
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom; NHS Lothian, Edinburgh EH1 3EG, United Kingdom.
| | - Rachel Blacow
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom; NHS Greater Glasgow and Clyde, Glasgow G12 0XH, United Kingdom.
| | | | | | | | | | - Chris Davis
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | - Ana da Silva Filipe
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | | | | | | | - Emily Goldstein
- NHS Greater Glasgow and Clyde, Glasgow G12 0XH, United Kingdom.
| | - Rory Gunson
- NHS Greater Glasgow and Clyde, Glasgow G12 0XH, United Kingdom.
| | - John Haughney
- NHS Greater Glasgow and Clyde, Glasgow G12 0XH, United Kingdom.
| | - Matthew T G Holden
- Public Health Scotland, Edinburgh EH12 9EB, United Kingdom; School of Medicine, University of St Andrews, St Andrews KY16 9TF, United Kingdom.
| | | | - Joseph Hughes
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | | | - Tim Lewis
- NHS Lothian, Edinburgh EH1 3EG, United Kingdom.
| | - Oscar MacLean
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | | | - Guy Mollett
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom; NHS Greater Glasgow and Clyde, Glasgow G12 0XH, United Kingdom.
| | - Tommy Nyberg
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom.
| | | | - Ben Parcell
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom.
| | - Surajit Ray
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8TA, United Kingdom.
| | - David L Robertson
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom.
| | - Sharif Shabaan
- Public Health Scotland, Edinburgh EH12 9EB, United Kingdom.
| | - James G Shepherd
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | - Katherine Smollett
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom.
| | | | | | - Craig Wilkie
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8TA, United Kingdom.
| | - Thomas Williams
- NHS Lothian, Edinburgh EH1 3EG, United Kingdom; Royal Hospital for Children and Young People, University of Edinburgh, Edinburgh EH16 4TJ, United Kingdom.
| | - Emma C Thomson
- MRC-University of Glasgow Centre for Virus Research (CVR), Glasgow G61 1QH, United Kingdom; NHS Greater Glasgow and Clyde, Glasgow G12 0XH, United Kingdom; London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom.
| |
Collapse
|
3
|
Keogh RH, Gran JM, Seaman SR, Davies G, Vansteelandt S. Causal inference in survival analysis using longitudinal observational data: Sequential trials and marginal structural models. Stat Med 2023; 42:2191-2225. [PMID: 37086186 PMCID: PMC7614580 DOI: 10.1002/sim.9718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 01/26/2023] [Accepted: 03/14/2023] [Indexed: 04/23/2023]
Abstract
Longitudinal observational data on patients can be used to investigate causal effects of time-varying treatments on time-to-event outcomes. Several methods have been developed for estimating such effects by controlling for the time-dependent confounding that typically occurs. The most commonly used is marginal structural models (MSM) estimated using inverse probability of treatment weights (IPTW) (MSM-IPTW). An alternative, the sequential trials approach, is increasingly popular, and involves creating a sequence of "trials" from new time origins and comparing treatment initiators and non-initiators. Individuals are censored when they deviate from their treatment assignment at the start of each "trial" (initiator or noninitiator), which is accounted for using inverse probability of censoring weights. The analysis uses data combined across trials. We show that the sequential trials approach can estimate the parameters of a particular MSM. The causal estimand that we focus on is the marginal risk difference between the sustained treatment strategies of "always treat" vs "never treat." We compare how the sequential trials approach and MSM-IPTW estimate this estimand, and discuss their assumptions and how data are used differently. The performance of the two approaches is compared in a simulation study. The sequential trials approach, which tends to involve less extreme weights than MSM-IPTW, results in greater efficiency for estimating the marginal risk difference at most follow-up times, but this can, in certain scenarios, be reversed at later time points and relies on modelling assumptions. We apply the methods to longitudinal observational data from the UK Cystic Fibrosis Registry to estimate the effect of dornase alfa on survival.
Collapse
Affiliation(s)
- Ruth H. Keogh
- Department of Medical Statistics and Centre for Statistical MethodologyLondon School of Hygiene and Tropical MedicineKeppel StreetLondonWC1E 7HTUK
| | - Jon Michael Gran
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical SciencesUniversity of OsloP.O. Box 1122 BlindernOslo0317Norway
| | - Shaun R. Seaman
- MRC Biostatistics UnitUniversity of CambridgeEast Forvie Building, Forvie Site, Robinson WayCambridgeCB2 0SRUK
| | - Gwyneth Davies
- Population, Policy and Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child HealthUniversity College LondonWC1N 1EHLondonUK
| | - Stijn Vansteelandt
- Department of Medical Statistics and Centre for Statistical MethodologyLondon School of Hygiene and Tropical MedicineKeppel StreetLondonWC1E 7HTUK
- Department of Applied Mathematics, Computer Science and StatisticsGhent University9000GhentBelgium
| |
Collapse
|
4
|
Deakin CT, De Stavola BL, Littlejohn G, Griffiths H, Ciciriello S, Youssef P, Mathers D, Bird P, Smith T, O'Sullivan C, Freeman T, Segelov D, Hoffman D, Seaman SR. Comparative Effectiveness of Adalimumab vs Tofacitinib in Patients With Rheumatoid Arthritis in Australia. JAMA Netw Open 2023; 6:e2320851. [PMID: 37382956 DOI: 10.1001/jamanetworkopen.2023.20851] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/30/2023] Open
Abstract
Importance There is a need for observational studies to supplement evidence from clinical trials, and the target trial emulation (TTE) framework can help avoid biases that can be introduced when treatments are compared crudely using observational data by applying design principles for randomized clinical trials. Adalimumab (ADA) and tofacitinib (TOF) were shown to be equivalent in patients with rheumatoid arthritis (RA) in a randomized clinical trial, but to our knowledge, these drugs have not been compared head-to-head using routinely collected clinical data and the TTE framework. Objective To emulate a randomized clinical trial comparing ADA vs TOF in patients with RA who were new users of a biologic or targeted synthetic disease-modifying antirheumatic drug (b/tsDMARD). Design, Setting, and Participants This comparative effectiveness study emulating a randomized clinical trial of ADA vs TOF included Australian adults aged 18 years or older with RA in the Optimising Patient Outcomes in Australian Rheumatology (OPAL) data set. Patients were included if they initiated ADA or TOF between October 1, 2015, and April 1, 2021; were new b/tsDMARD users; and had at least 1 component of the disease activity score in 28 joints using C-reactive protein (DAS28-CRP) recorded at baseline or during follow-up. Intervention Treatment with either ADA (40 mg every 14 days) or TOF (10 mg daily). Main Outcomes and Measures The main outcome was the estimated average treatment effect, defined as the difference in mean DAS28-CRP among patients receiving TOF compared with those receiving ADA at 3 and 9 months after initiating treatment. Missing DAS28-CRP data were multiply imputed. Stable balancing weights were used to account for nonrandomized treatment assignment. Results A total of 842 patients were identified, including 569 treated with ADA (387 [68.0%] female; median age, 56 years [IQR, 47-66 years]) and 273 treated with TOF (201 [73.6%] female; median age, 59 years [IQR, 51-68 years]). After applying stable balancing weights, mean DAS28-CRP in the ADA group was 5.3 (95% CI, 5.2-5.4) at baseline, 2.6 (95% CI, 2.5-2.7) at 3 months, and 2.3 (95% CI, 2.2-2.4) at 9 months; in the TOF group, it was 5.3 (95% CI, 5.2-5.4) at baseline, 2.4 (95% CI, 2.2-2.5) at 3 months, and 2.3 (95% CI, 2.1-2.4) at 9 months. The estimated average treatment effect was -0.2 (95% CI, -0.4 to -0.03; P = .02) at 3 months and -0.03 (95% CI, -0.2 to 0.1; P = .60) at 9 months. Conclusions and Relevance In this study, there was a modest but statistically significant reduction in DAS28-CRP at 3 months for patients receiving TOF compared with those receiving ADA and no difference between treatment groups at 9 months. Three months of treatment with either drug led to clinically relevant average reductions in mean DAS28-CRP, consistent with remission.
Collapse
Affiliation(s)
- Claire T Deakin
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- Centre for Adolescent Rheumatology Versus Arthritis at University College London, University College London Hospitals, Great Ormond Street Hospital and University College London, London, United Kingdom
- National Institute of Health Research Biomedical Centre at Great Ormond Street Hospital, London, United Kingdom
| | - Bianca L De Stavola
- Population, Policy and Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, London, United Kingdom
| | - Geoffrey Littlejohn
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- Department of Medicine, Monash University, Clayton, Victoria, Australia
| | - Hedley Griffiths
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- Barwon Rheumatology Service, Geelong, Victoria, Australia
| | - Sabina Ciciriello
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Peter Youssef
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- Royal Prince Alfred Hospital, Sydney, New South Wales, Australia
- University of Sydney, Sydney, New South Wales, Australia
| | - David Mathers
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- Georgetown Arthritis, Newcastle, New South Wales, Australia
| | - Paul Bird
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
- University of New South Wales, Kensington, New South Wales, Australia
| | - Tegan Smith
- OPAL Rheumatology Ltd, Sydney, New South Wales, Australia
| | | | - Tim Freeman
- Software for Specialists Pty Ltd, Sydney, New South Wales, Australia
| | - Dana Segelov
- Software for Specialists Pty Ltd, Sydney, New South Wales, Australia
| | - David Hoffman
- Software for Specialists Pty Ltd, Sydney, New South Wales, Australia
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
5
|
Seaman SR, Nyberg T, Overton CE, Pascall DJ, Presanis AM, De Angelis D. Adjusting for time of infection or positive test when estimating the risk of a post-infection outcome in an epidemic. Stat Methods Med Res 2022; 31:1942-1958. [PMID: 35695245 PMCID: PMC7613654 DOI: 10.1177/09622802221107105] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
When comparing the risk of a post-infection binary outcome, for example, hospitalisation, for two variants of an infectious pathogen, it is important to adjust for calendar time of infection. Typically, the infection time is unknown and positive test time used as a proxy for it. Positive test time may also be used when assessing how risk of the outcome changes over calendar time. We show that if time from infection to positive test is correlated with the outcome, the risk conditional on positive test time is a function of the trajectory of infection incidence. Hence, a risk ratio adjusted for positive test time can be quite different from the risk ratio adjusted for infection time. We propose a simple sensitivity analysis that indicates how risk ratios adjusted for positive test time and infection time may differ. This involves adjusting for a shifted positive test time, shifted to make the difference between it and infection time uncorrelated with the outcome. We illustrate this method by reanalysing published results on the relative risk of hospitalisation following infection with the Alpha versus pre-existing variants of SARS-CoV-2. Results indicate the relative risk adjusted for infection time may be lower than that adjusted for positive test time.
Collapse
Affiliation(s)
- Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- Shaun Seaman, MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge, CB2 0SR, UK.
| | - Tommy Nyberg
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Christopher E Overton
- Department of Mathematics, University of Manchester, UK
- Clinical Data Science Unit, Manchester University NHS Foundation Trust, UK
- Joint Universities Pandemic and Epidemiological Research (JUNIPER) consortium, Cambridge, UK
| | - David J Pascall
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- Joint Universities Pandemic and Epidemiological Research (JUNIPER) consortium, Cambridge, UK
| | - Anne M Presanis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Daniela De Angelis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- Joint Universities Pandemic and Epidemiological Research (JUNIPER) consortium, Cambridge, UK
- Statistics, Modelling and Economics Department, UKHSA, London, UK
| |
Collapse
|
6
|
Jackson CH, Tom BD, Kirwan PD, Mandal S, Seaman SR, Kunzmann K, Presanis AM, De Angelis D. A comparison of two frameworks for multi-state modelling, applied to outcomes after hospital admissions with COVID-19. Stat Methods Med Res 2022; 31:1656-1674. [PMID: 35837731 PMCID: PMC9294033 DOI: 10.1177/09622802221106720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
We compare two multi-state modelling frameworks that can be used to represent dates of events following hospital admission for people infected during an epidemic. The methods are applied to data from people admitted to hospital with COVID-19, to estimate the probability of admission to intensive care unit, the probability of death in hospital for patients before and after intensive care unit admission, the lengths of stay in hospital, and how all these vary with age and gender. One modelling framework is based on defining transition-specific hazard functions for competing risks. A less commonly used framework defines partially-latent subpopulations who will experience each subsequent event, and uses a mixture model to estimate the probability that an individual will experience each event, and the distribution of the time to the event given that it occurs. We compare the advantages and disadvantages of these two frameworks, in the context of the COVID-19 example. The issues include the interpretation of the model parameters, the computational efficiency of estimating the quantities of interest, implementation in software and assessing goodness of fit. In the example, we find that some groups appear to be at very low risk of some events, in particular intensive care unit admission, and these are best represented by using 'cure-rate' models to define transition-specific hazards. We provide general-purpose software to implement all the models we describe in the flexsurv R package, which allows arbitrarily flexible distributions to be used to represent the cause-specific hazards or times to events.
Collapse
Affiliation(s)
| | - Brian Dm Tom
- 47959MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Peter D Kirwan
- 47959MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- Public Health England, London, UK
| | | | - Shaun R Seaman
- 47959MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Kevin Kunzmann
- 47959MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Anne M Presanis
- 47959MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Daniela De Angelis
- 47959MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- Public Health England, London, UK
| |
Collapse
|
7
|
Seaman SR, Samartsidis P, Kall M, De Angelis D. Nowcasting COVID-19 deaths in England by age and region. J R Stat Soc Ser C Appl Stat 2022; 71:RSSC12576. [PMID: 35942006 PMCID: PMC9349735 DOI: 10.1111/rssc.12576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 05/11/2022] [Indexed: 11/29/2022]
Abstract
Understanding the trajectory of the daily number of COVID-19 deaths is essential to decisions on how to respond to the pandemic, but estimating this trajectory is complicated by the delay between deaths occurring and being reported. In England the delay is typically several days, but it can be weeks. This causes considerable uncertainty about how many deaths occurred in recent days. Here we estimate the deaths per day in five age strata within seven English regions, using a Bayesian model that accounts for reporting-day effects and longer-term changes in the delay distribution. We show how the model can be computationally efficiently fitted when the delay distribution is the same in multiple strata, for example, over a wide range of ages.
Collapse
Affiliation(s)
- Shaun R. Seaman
- MRC Biostatistics UnitUniversity of CambridgeCambridgeCambridgeshireUK
| | | | - Meaghan Kall
- COVID‐19 National Epidemiology CellUK Health Security AgencyLondonUK
| | - Daniela De Angelis
- MRC Biostatistics UnitUniversity of CambridgeCambridgeCambridgeshireUK
- Statistics, Modelling and Economics Department, Data, Analytics and SurveillanceUK Health Security AgencyLondonUK
| |
Collapse
|
8
|
Su L, Seaman SR, Yiu S. Sensitivity analysis for calibrated inverse probability-of-censoring weighted estimators under non-ignorable dropout. Stat Methods Med Res 2022; 31:1374-1391. [PMID: 35410545 PMCID: PMC9253927 DOI: 10.1177/09622802221090763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Inverse probability of censoring weighting is a popular approach to handling
dropout in longitudinal studies. However, inverse probability-of-censoring
weighted estimators (IPCWEs) can be inefficient and unstable if the weights are
estimated by maximum likelihood. To alleviate these problems, calibrated IPCWEs
have been proposed, which use calibrated weights that directly optimize
covariate balance in finite samples rather than the weights from maximum
likelihood. However, the existing calibrated IPCWEs are all based on the
unverifiable assumption of sequential ignorability and sensitivity analysis
strategies under non-ignorable dropout are lacking. In this paper, we fill this
gap by developing an approach to sensitivity analysis for calibrated IPCWEs
under non-ignorable dropout. A simple technique is proposed to speed up the
computation of bootstrap and jackknife confidence intervals and thus facilitate
sensitivity analyses. We evaluate the finite-sample performance of the proposed
methods using simulations and apply our methods to data from an international
inception cohort study of systemic lupus erythematosus. An R Markdown tutorial
to demonstrate the implementation of the proposed methods is provided.
Collapse
Affiliation(s)
- Li Su
- MRC Biostatistics Unit, School of Clinical Medicine, 12204University of Cambridge, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, School of Clinical Medicine, 12204University of Cambridge, UK
| | - Sean Yiu
- MRC Biostatistics Unit, School of Clinical Medicine, 12204University of Cambridge, UK
| |
Collapse
|
9
|
Nyberg T, Ferguson NM, Nash SG, Webster HH, Flaxman S, Andrews N, Hinsley W, Bernal JL, Kall M, Bhatt S, Blomquist P, Zaidi A, Volz E, Aziz NA, Harman K, Funk S, Abbott S, Hope R, Charlett A, Chand M, Ghani AC, Seaman SR, Dabrera G, De Angelis D, Presanis AM, Thelwall S. Comparative analysis of the risks of hospitalisation and death associated with SARS-CoV-2 omicron (B.1.1.529) and delta (B.1.617.2) variants in England: a cohort study. Lancet 2022; 399:1303-1312. [PMID: 35305296 PMCID: PMC8926413 DOI: 10.1016/s0140-6736(22)00462-7] [Citation(s) in RCA: 694] [Impact Index Per Article: 347.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 02/17/2022] [Accepted: 02/25/2022] [Indexed: 02/06/2023]
Abstract
BACKGROUND The omicron variant (B.1.1.529) of SARS-CoV-2 has demonstrated partial vaccine escape and high transmissibility, with early studies indicating lower severity of infection than that of the delta variant (B.1.617.2). We aimed to better characterise omicron severity relative to delta by assessing the relative risk of hospital attendance, hospital admission, or death in a large national cohort. METHODS Individual-level data on laboratory-confirmed COVID-19 cases resident in England between Nov 29, 2021, and Jan 9, 2022, were linked to routine datasets on vaccination status, hospital attendance and admission, and mortality. The relative risk of hospital attendance or admission within 14 days, or death within 28 days after confirmed infection, was estimated using proportional hazards regression. Analyses were stratified by test date, 10-year age band, ethnicity, residential region, and vaccination status, and were further adjusted for sex, index of multiple deprivation decile, evidence of a previous infection, and year of age within each age band. A secondary analysis estimated variant-specific and vaccine-specific vaccine effectiveness and the intrinsic relative severity of omicron infection compared with delta (ie, the relative risk in unvaccinated cases). FINDINGS The adjusted hazard ratio (HR) of hospital attendance (not necessarily resulting in admission) with omicron compared with delta was 0·56 (95% CI 0·54-0·58); for hospital admission and death, HR estimates were 0·41 (0·39-0·43) and 0·31 (0·26-0·37), respectively. Omicron versus delta HR estimates varied with age for all endpoints examined. The adjusted HR for hospital admission was 1·10 (0·85-1·42) in those younger than 10 years, decreasing to 0·25 (0·21-0·30) in 60-69-year-olds, and then increasing to 0·47 (0·40-0·56) in those aged at least 80 years. For both variants, past infection gave some protection against death both in vaccinated (HR 0·47 [0·32-0·68]) and unvaccinated (0·18 [0·06-0·57]) cases. In vaccinated cases, past infection offered no additional protection against hospital admission beyond that provided by vaccination (HR 0·96 [0·88-1·04]); however, for unvaccinated cases, past infection gave moderate protection (HR 0·55 [0·48-0·63]). Omicron versus delta HR estimates were lower for hospital admission (0·30 [0·28-0·32]) in unvaccinated cases than the corresponding HR estimated for all cases in the primary analysis. Booster vaccination with an mRNA vaccine was highly protective against hospitalisation and death in omicron cases (HR for hospital admission 8-11 weeks post-booster vs unvaccinated: 0·22 [0·20-0·24]), with the protection afforded after a booster not being affected by the vaccine used for doses 1 and 2. INTERPRETATION The risk of severe outcomes following SARS-CoV-2 infection is substantially lower for omicron than for delta, with higher reductions for more severe endpoints and significant variation with age. Underlying the observed risks is a larger reduction in intrinsic severity (in unvaccinated individuals) counterbalanced by a reduction in vaccine effectiveness. Documented previous SARS-CoV-2 infection offered some protection against hospitalisation and high protection against death in unvaccinated individuals, but only offered additional protection in vaccinated individuals for the death endpoint. Booster vaccination with mRNA vaccines maintains over 70% protection against hospitalisation and death in breakthrough confirmed omicron infections. FUNDING Medical Research Council, UK Research and Innovation, Department of Health and Social Care, National Institute for Health Research, Community Jameel, and Engineering and Physical Sciences Research Council.
Collapse
Affiliation(s)
- Tommy Nyberg
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK.
| | - Neil M Ferguson
- NIHR Health Protection Research Unit for Modelling and Health Economics, MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, London, UK.
| | - Sophie G Nash
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Harriet H Webster
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Seth Flaxman
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Nick Andrews
- COVID-19 Surveillance Cell, UK Health Security Agency, London, UK
| | - Wes Hinsley
- NIHR Health Protection Research Unit for Modelling and Health Economics, MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, London, UK
| | - Jamie Lopez Bernal
- NIHR Health Protection Research Unit for Respiratory Infections, Imperial College London, London, UK; COVID-19 Surveillance Cell, UK Health Security Agency, London, UK
| | - Meaghan Kall
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Samir Bhatt
- NIHR Health Protection Research Unit for Modelling and Health Economics, MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, London, UK
| | - Paula Blomquist
- Outbreak Surveillance Team, UK Health Security Agency, London, UK
| | - Asad Zaidi
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Erik Volz
- NIHR Health Protection Research Unit for Modelling and Health Economics, MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, London, UK
| | - Nurin Abdul Aziz
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Katie Harman
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Sebastian Funk
- Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Sam Abbott
- Centre for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Russell Hope
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Andre Charlett
- NIHR Health Protection Research Unit for Modelling and Health Economics, MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, London, UK; Statistics, Modelling and Economics Department, UK Health Security Agency, London, UK; Joint Modelling Team, UK Health Security Agency, London, UK; NIHR Health Protection Research Unit for Behavioural Science and Evaluation at the University of Bristol, University of the West of England, and University of Cambridge, Bristol, UK
| | - Meera Chand
- COVID-19 Genomics Cell, UK Health Security Agency, London, UK
| | - Azra C Ghani
- NIHR Health Protection Research Unit for Modelling and Health Economics, MRC Centre for Global Infectious Disease Analysis, Jameel Institute, Imperial College London, London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Gavin Dabrera
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| | - Daniela De Angelis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK; Statistics, Modelling and Economics Department, UK Health Security Agency, London, UK; Joint Modelling Team, UK Health Security Agency, London, UK; NIHR Health Protection Research Unit for Behavioural Science and Evaluation at the University of Bristol, University of the West of England, and University of Cambridge, Bristol, UK
| | - Anne M Presanis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Simon Thelwall
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, UK
| |
Collapse
|
10
|
Nyberg T, Harman K, Zaidi A, Seaman SR, Andrews N, Nash SG, Charlett A, Lopez Bernal J, Myers R, Groves N, Gallagher E, Gharbia S, Chand M, Thelwall S, De Angelis D, Dabrera G, Presanis AM. Hospitalisation and mortality risk for COVID-19 cases with SARS-CoV-2 AY.4.2 (VUI-21OCT-01) compared to non-AY.4.2 Delta variant sub-lineages. J Infect Dis 2022; 226:808-811. [PMID: 35184201 PMCID: PMC8903446 DOI: 10.1093/infdis/jiac063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 02/16/2022] [Indexed: 11/28/2022] Open
Abstract
To investigate if the AY.4.2 sublineage of the SARS-CoV-2 delta variant is associated with hospitalization and mortality risks that differ from non-AY.4.2 delta risks, we performed a retrospective cohort study of sequencing-confirmed COVID-19 cases in England based on linkage of routine health care datasets. Using stratified Cox regression, we estimated adjusted hazard ratios (aHR) of hospital admission (aHR = 0.85; 95% confidence interval [CI], .77–.94), hospital admission or emergency care attendance (aHR = 0.87; 95% CI, .81–.94), and COVID-19 mortality (aHR = 0.85; 95% CI, .71–1.03). The results indicate that the risks of hospitalization and mortality are similar or lower for AY.4.2 compared to cases with other delta sublineages.
Collapse
Affiliation(s)
- Tommy Nyberg
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, CB2 0SR, United Kingdom
| | - Katie Harman
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Asad Zaidi
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, CB2 0SR, United Kingdom
| | - Nick Andrews
- Immunisation and Countermeasures Division, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Sophie G Nash
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Andre Charlett
- National Infection Service, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Jamie Lopez Bernal
- Immunisation and Countermeasures Division, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Richard Myers
- Genomics Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Natalie Groves
- Genomics Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Eileen Gallagher
- Genomics Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Saheer Gharbia
- Genomics Programme, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Meera Chand
- Genomics Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Simon Thelwall
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Daniela De Angelis
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, CB2 0SR, United Kingdom
| | - Gavin Dabrera
- COVID-19 National Epidemiology Cell, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Anne M Presanis
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, CB2 0SR, United Kingdom
| |
Collapse
|
11
|
Abstract
We offer a natural and extensible measure-theoretic treatment of missingness at random. Within the standard missing-data framework, we give a novel characterization of the observed data as a stopping-set sigma algebra. We demonstrate that the usual missingness-at-random conditions are equivalent to requiring particular stochastic processes to be adapted to a set-indexed filtration. These measurability conditions ensure the usual factorization of likelihood ratios. We illustrate how the theory can be extended easily to incorporate explanatory variables, to describe longitudinal data in continuous time, and to admit more general coarsening of observations.
Collapse
Affiliation(s)
- D M Farewell
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff CF14 4YS, U.K
| | - R M Daniel
- Division of Population Medicine, School of Medicine, College of Biomedical and Life Sciences, Cardiff University, Cardiff CF14 4YS, U.K
| | - S R Seaman
- MRC Biostatistics Unit, University of Cambridge, Robinson Way, Cambridge CB2 0SR, U.K
| |
Collapse
|
12
|
Twohig KA, Nyberg T, Zaidi A, Thelwall S, Sinnathamby MA, Aliabadi S, Seaman SR, Harris RJ, Hope R, Lopez-Bernal J, Gallagher E, Charlett A, De Angelis D, Presanis AM, Dabrera G. Hospital admission and emergency care attendance risk for SARS-CoV-2 delta (B.1.617.2) compared with alpha (B.1.1.7) variants of concern: a cohort study. Lancet Infect Dis 2022; 22:35-42. [PMID: 34461056 PMCID: PMC8397301 DOI: 10.1016/s1473-3099(21)00475-8] [Citation(s) in RCA: 474] [Impact Index Per Article: 237.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/19/2021] [Accepted: 07/23/2021] [Indexed: 01/19/2023]
Abstract
BACKGROUND The SARS-CoV-2 delta (B.1.617.2) variant was first detected in England in March, 2021. It has since rapidly become the predominant lineage, owing to high transmissibility. It is suspected that the delta variant is associated with more severe disease than the previously dominant alpha (B.1.1.7) variant. We aimed to characterise the severity of the delta variant compared with the alpha variant by determining the relative risk of hospital attendance outcomes. METHODS This cohort study was done among all patients with COVID-19 in England between March 29 and May 23, 2021, who were identified as being infected with either the alpha or delta SARS-CoV-2 variant through whole-genome sequencing. Individual-level data on these patients were linked to routine health-care datasets on vaccination, emergency care attendance, hospital admission, and mortality (data from Public Health England's Second Generation Surveillance System and COVID-19-associated deaths dataset; the National Immunisation Management System; and NHS Digital Secondary Uses Services and Emergency Care Data Set). The risk for hospital admission and emergency care attendance were compared between patients with sequencing-confirmed delta and alpha variants for the whole cohort and by vaccination status subgroups. Stratified Cox regression was used to adjust for age, sex, ethnicity, deprivation, recent international travel, area of residence, calendar week, and vaccination status. FINDINGS Individual-level data on 43 338 COVID-19-positive patients (8682 with the delta variant, 34 656 with the alpha variant; median age 31 years [IQR 17-43]) were included in our analysis. 196 (2·3%) patients with the delta variant versus 764 (2·2%) patients with the alpha variant were admitted to hospital within 14 days after the specimen was taken (adjusted hazard ratio [HR] 2·26 [95% CI 1·32-3·89]). 498 (5·7%) patients with the delta variant versus 1448 (4·2%) patients with the alpha variant were admitted to hospital or attended emergency care within 14 days (adjusted HR 1·45 [1·08-1·95]). Most patients were unvaccinated (32 078 [74·0%] across both groups). The HRs for vaccinated patients with the delta variant versus the alpha variant (adjusted HR for hospital admission 1·94 [95% CI 0·47-8·05] and for hospital admission or emergency care attendance 1·58 [0·69-3·61]) were similar to the HRs for unvaccinated patients (2·32 [1·29-4·16] and 1·43 [1·04-1·97]; p=0·82 for both) but the precision for the vaccinated subgroup was low. INTERPRETATION This large national study found a higher hospital admission or emergency care attendance risk for patients with COVID-19 infected with the delta variant compared with the alpha variant. Results suggest that outbreaks of the delta variant in unvaccinated populations might lead to a greater burden on health-care services than the alpha variant. FUNDING Medical Research Council; UK Research and Innovation; Department of Health and Social Care; and National Institute for Health Research.
Collapse
Affiliation(s)
| | - Tommy Nyberg
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Asad Zaidi
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | - Simon Thelwall
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | | | - Shirin Aliabadi
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Ross J Harris
- Statistics, Modelling and Economics Department, Public Health England, London, UK
| | - Russell Hope
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | - Jamie Lopez-Bernal
- Immunisation and Countermeasures Division, Public Health England, London, UK
| | - Eileen Gallagher
- COVID-19 Genomic Analysis Cell, Public Health England, London, UK
| | - Andre Charlett
- Statistics, Modelling and Economics Department, Public Health England, London, UK; Joint Modelling Team, Public Health England, London, UK
| | - Daniela De Angelis
- Statistics, Modelling and Economics Department, Public Health England, London, UK; Joint Modelling Team, Public Health England, London, UK; MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Anne M Presanis
- Joint Modelling Team, Public Health England, London, UK; MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Gavin Dabrera
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| |
Collapse
|
13
|
Abstract
Time-to-event data are right-truncated if only individuals who have experienced
the event by a certain time can be included in the sample. For example, we may
be interested in estimating the distribution of time from onset of disease
symptoms to death and only have data on individuals who have died. This may be
the case, for example, at the beginning of an epidemic. Right truncation causes
the distribution of times to event in the sample to be biased towards shorter
times compared to the population distribution, and appropriate statistical
methods should be used to account for this bias. This article is a review of
such methods, particularly in the context of an infectious disease epidemic,
like COVID-19. We consider methods for estimating the marginal time-to-event
distribution, and compare their efficiencies. (Non-)identifiability of the
distribution is an important issue with right-truncated data, particularly at
the beginning of an epidemic, and this is discussed in detail. We also review
methods for estimating the effects of covariates on the time to event. An
illustration of the application of many of these methods is provided, using data
on individuals who had died with coronavirus disease by 5 April 2020.
Collapse
Affiliation(s)
- Shaun R Seaman
- 47959MRC Biostatistics Unit, University of Cambridge, UK
| | - Anne Presanis
- 47959MRC Biostatistics Unit, University of Cambridge, UK
| | | |
Collapse
|
14
|
Keogh RH, Seaman SR, Gran JM, Vansteelandt S. Simulating longitudinal data from marginal structural models using the additive hazard model. Biom J 2021; 63:1526-1541. [PMID: 33983641 PMCID: PMC7612178 DOI: 10.1002/bimj.202000040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 12/28/2020] [Accepted: 01/05/2021] [Indexed: 12/05/2022]
Abstract
Observational longitudinal data on treatments and covariates are increasingly used to investigate treatment effects, but are often subject to time-dependent confounding. Marginal structural models (MSMs), estimated using inverse probability of treatment weighting or the g-formula, are popular for handling this problem. With increasing development of advanced causal inference methods, it is important to be able to assess their performance in different scenarios to guide their application. Simulation studies are a key tool for this, but their use to evaluate causal inference methods has been limited. This paper focuses on the use of simulations for evaluations involving MSMs in studies with a time-to-event outcome. In a simulation, it is important to be able to generate the data in such a way that the correct forms of any models to be fitted to those data are known. However, this is not straightforward in the longitudinal setting because it is natural for data to be generated in a sequential conditional manner, whereas MSMs involve fitting marginal rather than conditional hazard models. We provide general results that enable the form of the correctly specified MSM to be derived based on a conditional data generating procedure, and show how the results can be applied when the conditional hazard model is an Aalen additive hazard or Cox model. Using conditional additive hazard models is advantageous because they imply additive MSMs that can be fitted using standard software. We describe and illustrate a simulation algorithm. Our results will help researchers to effectively evaluate causal inference methods via simulation.
Collapse
Affiliation(s)
- Ruth H. Keogh
- Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, UK
| | - Shaun R. Seaman
- MRC Biostatistics Unit, University of Cambridge, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, UK
| | - Jon Michael Gran
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Blindern, Oslo, Norway
| | - Stijn Vansteelandt
- Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, UK
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| |
Collapse
|
15
|
Nyberg T, Twohig KA, Harris RJ, Seaman SR, Flannagan J, Allen H, Charlett A, De Angelis D, Dabrera G, Presanis AM. Risk of hospital admission for patients with SARS-CoV-2 variant B.1.1.7: cohort analysis. BMJ 2021; 373:n1412. [PMID: 34130987 PMCID: PMC8204098 DOI: 10.1136/bmj.n1412] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/01/2021] [Indexed: 01/06/2023]
Abstract
OBJECTIVE To evaluate the relation between diagnosis of covid-19 with SARS-CoV-2 variant B.1.1.7 (also known as variant of concern 202012/01) and the risk of hospital admission compared with diagnosis with wild-type SARS-CoV-2 variants. DESIGN Retrospective cohort analysis. SETTING Community based SARS-CoV-2 testing in England, individually linked with hospital admission data. PARTICIPANTS 839 278 patients with laboratory confirmed covid-19, of whom 36 233 had been admitted to hospital within 14 days, tested between 23 November 2020 and 31 January 2021 and analysed at a laboratory with an available TaqPath assay that enables assessment of S-gene target failure (SGTF), a proxy test for the B.1.1.7 variant. Patient data were stratified by age, sex, ethnicity, deprivation, region of residence, and date of positive test. MAIN OUTCOME MEASURES Hospital admission between one and 14 days after the first positive SARS-CoV-2 test. RESULTS 27 710 (4.7%) of 592 409 patients with SGTF variants and 8523 (3.5%) of 246 869 patients without SGTF variants had been admitted to hospital within one to 14 days. The stratum adjusted hazard ratio of hospital admission was 1.52 (95% confidence interval 1.47 to 1.57) for patients with covid-19 infected with SGTF variants, compared with those infected with non-SGTF variants. The effect was modified by age (P<0.001), with hazard ratios of 0.93-1.21 in patients younger than 20 years with versus without SGTF variants, 1.29 in those aged 20-29, and 1.45-1.65 in those aged ≥30 years. The adjusted absolute risk of hospital admission within 14 days was 4.7% (95% confidence interval 4.6% to 4.7%) for patients with SGTF variants and 3.5% (3.4% to 3.5%) for those with non-SGTF variants. CONCLUSIONS The results suggest that the risk of hospital admission is higher for people infected with the B.1.1.7 variant compared with wild-type SARS-CoV-2, likely reflecting a more severe disease. The higher severity may be specific to adults older than 30 years.
Collapse
Affiliation(s)
- Tommy Nyberg
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | | | - Ross J Harris
- National Infection Service, Public Health England, London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Joe Flannagan
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | - Hester Allen
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | - Andre Charlett
- National Infection Service, Public Health England, London, UK
| | - Daniela De Angelis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
- National Infection Service, Public Health England, London, UK
| | - Gavin Dabrera
- COVID-19 National Epidemiology Cell, Public Health England, London, UK
| | - Anne M Presanis
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
16
|
Seaman SR, Keogh RH, Dukes O, Vansteelandt S. Using generalized linear models to implement g-estimation for survival data with time-varying confounding. Stat Med 2021; 40:3779-3790. [PMID: 33942919 PMCID: PMC7612171 DOI: 10.1002/sim.8997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 04/01/2021] [Accepted: 04/06/2021] [Indexed: 11/17/2022]
Abstract
Using data from observational studies to estimate the causal effect of a time-varying exposure, repeatedly measured over time, on an outcome of interest requires careful adjustment for confounding. Standard regression adjustment for observed time-varying confounders is unsuitable, as it can eliminate part of the causal effect and induce bias. Inverse probability weighting, g-computation, and g-estimation have been proposed as being more suitable methods. G-estimation has some advantages over the other two methods, but until recently there has been a lack of flexible g-estimation methods for a survival time outcome. The recently proposed Structural Nested Cumulative Survival Time Model (SNCSTM) is such a method. Efficient estimation of the parameters of this model required bespoke software. In this article we show how the SNCSTM can be fitted efficiently via g-estimation using standard software for fitting generalised linear models.The ability to implement g-estimation for a survival outcome using standard statistical software greatly increases the potential uptake of this method. We illustrate the use of this method of fitting the SNCSTM by reanalyzing data from the UK Cystic Fibrosis Registry, and provide example R code to facilitate the use of this approach by other researchers.
Collapse
Affiliation(s)
- Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Oliver Dukes
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Stijn Vansteelandt
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| |
Collapse
|
17
|
Pavlou M, Qu C, Omar RZ, Seaman SR, Steyerberg EW, White IR, Ambler G. Estimation of required sample size for external validation of risk models for binary outcomes. Stat Methods Med Res 2021; 30:2187-2206. [PMID: 33881369 PMCID: PMC8529102 DOI: 10.1177/09622802211007522] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Risk-prediction models for health outcomes are used in practice as part of
clinical decision-making, and it is essential that their performance be
externally validated. An important aspect in the design of a validation study is
choosing an adequate sample size. In this paper, we investigate the sample size
requirements for validation studies with binary outcomes to estimate measures of
predictive performance (C-statistic for discrimination and calibration slope and
calibration in the large). We aim for sufficient precision in the estimated
measures. In addition, we investigate the sample size to achieve sufficient
power to detect a difference from a target value. Under normality assumptions on
the distribution of the linear predictor, we obtain simple estimators for sample
size calculations based on the measures above. Simulation studies show that the
estimators perform well for common values of the C-statistic and outcome
prevalence when the linear predictor is marginally Normal. Their performance
deteriorates only slightly when the normality assumptions are violated. We also
propose estimators which do not require normality assumptions but require
specification of the marginal distribution of the linear predictor and require
the use of numerical integration. These estimators were also seen to perform
very well under marginal normality. Our sample size equations require a
specified standard error (SE) and the anticipated C-statistic and outcome
prevalence. The sample size requirement varies according to the prognostic
strength of the model, outcome prevalence, choice of the performance measure and
study objective. For example, to achieve an SE < 0.025 for the C-statistic,
60–170 events are required if the true C-statistic and outcome prevalence are
between 0.64–0.85 and 0.05–0.3, respectively. For the calibration slope and
calibration in the large, achieving SE < 0.15 would require 40–280 and 50–100 events, respectively. Our
estimators may also be used for survival outcomes when the proportion of
censored observations is high.
Collapse
Affiliation(s)
- Menelaos Pavlou
- Department of Statistical Science, University College London, UK
| | - Chen Qu
- Department of Statistical Science, University College London, UK
| | - Rumana Z Omar
- Department of Statistical Science, University College London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, Institute of Public Health, University of Cambridge, Cambridge, UK
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
| | - Ian R White
- MRC Clinical Trials Unit, University College London, London, UK
| | - Gareth Ambler
- Department of Statistical Science, University College London, UK
| |
Collapse
|
18
|
Samartsidis P, Seaman SR, Montagna S, Charlett A, Hickman M, De Angelis D. A Bayesian multivariate factor analysis model for evaluating an intervention by using observational time series data on multiple outcomes. J R Stat Soc Ser A Stat Soc 2021; 28:155-166. [PMID: 34949904 PMCID: PMC7612111 DOI: 10.1111/rssa.12569] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
A problem that is frequently encountered in many areas of scientific research is that of estimating the effect of a non-randomized binary intervention on an outcome of interest by using time series data on units that received the intervention ('treated') and units that did not ('controls'). One popular estimation method in this setting is based on the factor analysis (FA) model. The FA model is fitted to the preintervention outcome data on treated units and all the outcome data on control units, and the counterfactual treatment-free post-intervention outcomes of the former are predicted from the fitted model. Intervention effects are estimated as the observed outcomes minus these predicted counterfactual outcomes. We propose a model that extends the FA model for estimating intervention effects by jointly modelling the multiple outcomes to exploit shared variability, and assuming an auto-regressive structure on factors to account for temporal correlations in the outcome. Using simulation studies, we show that the method proposed can improve the precision of the intervention effect estimates and achieve better control of the type I error rate (compared with the FA model), especially when either the number of preintervention measurements or the number of control units is small. We apply our method to estimate the effect of stricter alcohol licensing policies on alcohol-related harms.
Collapse
|
19
|
Tompsett D, Sutton S, Seaman SR, White IR. A general method for elicitation, imputation, and sensitivity analysis for incomplete repeated binary data. Stat Med 2020; 39:2921-2935. [PMID: 32677726 PMCID: PMC7612109 DOI: 10.1002/sim.8584] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 04/06/2020] [Accepted: 04/29/2020] [Indexed: 01/28/2023]
Abstract
We develop and demonstrate methods to perform sensitivity analyses to assess sensitivity to plausible departures from missing at random in incomplete repeated binary outcome data. We use multiple imputation in the not at random fully conditional specification framework, which includes one or more sensitivity parameters (SPs) for each incomplete variable. The use of an online elicitation questionnaire is demonstrated to obtain expert opinion on the SPs, and highest prior density regions are used alongside opinion pooling methods to display credible regions for SPs. We demonstrate that substantive conclusions can be far more sensitive to departures from the missing at random assumption (MAR) when control and intervention nonresponders depart from MAR differently, and show that the correlation of arm specific SPs in expert opinion is particularly important. We illustrate these methods on the iQuit in Practice smoking cessation trial, which compared the impact of a tailored text messaging system versus standard care on smoking cessation. We show that conclusions about the effect of intervention on smoking cessation outcomes at 8 week and 6 months are broadly insensitive to departures from MAR, with conclusions significantly affected only when the differences in behavior between the nonresponders in the two trial arms is larger than expert opinion judges to be realistic.
Collapse
Affiliation(s)
- Daniel Tompsett
- Great Ormond Street Institute of Child Health, UCL, London, UK
| | - Stephen Sutton
- Institute of Public Health, University of Cambridge, Cambridge, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
20
|
Kiddle SJ, Whittaker HR, Seaman SR, Quint JK. Prediction of five-year mortality after COPD diagnosis using primary care records. PLoS One 2020; 15:e0236011. [PMID: 32692772 PMCID: PMC7373295 DOI: 10.1371/journal.pone.0236011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 06/26/2020] [Indexed: 11/18/2022] Open
Abstract
Accurate prognosis information after a diagnosis of chronic obstructive pulmonary disease (COPD) would facilitate earlier and better informed decisions about the use of prevention strategies and advanced care plans. We therefore aimed to develop and validate an accurate prognosis model for incident COPD cases using only information present in general practitioner (GP) records at the point of diagnosis. Incident COPD patients between 2004-2012 over the age of 35 were studied using records from 396 general practices in England. We developed a model to predict all-cause five-year mortality at the point of COPD diagnosis, using 47,964 English patients. Our model uses age, gender, smoking status, body mass index, forced expiratory volume in 1-second (FEV1) % predicted and 16 co-morbidities (the same number as the Charlson Co-morbidity Index). The performance of our chosen model was validated in all countries of the UK (N = 48,304). Our model performed well, and performed consistently in validation data. The validation area under the curves in each country varied between 0.783-0.809 and the calibration slopes between 0.911-1.04. Our model performed better in this context than models based on the Charlson Co-morbidity Index or Cambridge Multimorbidity Score. We have developed and validated a model that outperforms general multimorbidity scores at predicting five-year mortality after COPD diagnosis. Our model includes only data routinely collected before COPD diagnosis, allowing it to be readily translated into clinical practice, and has been made available through an online risk calculator (https://skiddle.shinyapps.io/incidentcopdsurvival/).
Collapse
Affiliation(s)
- Steven J. Kiddle
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
- * E-mail: (SJK); (JKQ)
| | - Hannah R. Whittaker
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Shaun R. Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, United Kingdom
| | - Jennifer K. Quint
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
- * E-mail: (SJK); (JKQ)
| |
Collapse
|
21
|
Wason JM, Seaman SR. A latent variable model for improving inference in trials assessing the effect of dose on toxicity and composite efficacy endpoints. Stat Methods Med Res 2020; 29:230-242. [PMID: 30799777 PMCID: PMC6986906 DOI: 10.1177/0962280219831038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
It is often of interest to explore how dose affects the toxicity and efficacy properties of a novel treatment. In oncology, efficacy is often assessed through response, which is defined by a patient having no new tumour lesions and their tumour size shrinking by 30%. Usually response and toxicity are analysed as binary outcomes in early phase trials. Methods have been proposed to improve the efficiency of analysing response by utilising the continuous tumour size information instead of dichotomising it. However, these methods do not allow for toxicity or for different doses. Motivated by a phase II trial testing multiple doses of a treatment against placebo, we propose a latent variable model that can estimate the probability of response and no toxicity (or other related outcomes) for different doses. We assess the confidence interval coverage and efficiency properties of the method, compared to methods that do not use the continuous tumour size, in a simulation study and the real study. The coverage is close to nominal when model assumptions are met, although can be below nominal when the model is misspecified. Compared to methods that treat response as binary, the method has confidence intervals with 30-50% narrower widths. The method adds considerable efficiency but care must be taken that the model assumptions are reasonable.
Collapse
Affiliation(s)
- James Ms Wason
- Institute of Health and Society, Newcastle University, Newcastle upon Tyne, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
22
|
Wen L, Terrera GM, Seaman SR. Methods for handling longitudinal outcome processes truncated by dropout and death. Biostatistics 2019; 19:407-425. [PMID: 29028922 PMCID: PMC5971107 DOI: 10.1093/biostatistics/kxx045] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 08/22/2017] [Indexed: 11/14/2022] Open
Abstract
Cohort data are often incomplete because some subjects drop out of the study, and inverse probability weighting (IPW), multiple imputation (MI), and linear increments (LI) are methods that deal with such missing data. In cohort studies of ageing, missing data can arise from dropout or death. Methods that do not distinguish between these reasons for missingness typically provide inference about a hypothetical cohort where no one can die (immortal cohort). It has been suggested that inference about the cohort composed of those who are still alive at any time point (partly conditional inference) may be more meaningful. MI, LI, and IPW can all be adapted to provide partly conditional inference. In this article, we clarify and compare the assumptions required by these MI, LI, and IPW methods for partly conditional inference on continuous outcomes. We also propose augmented IPW estimators for making partly conditional inference. These are more efficient than IPW estimators and more robust to model misspecification. Our simulation studies show that the methods give approximately unbiased estimates of partly conditional estimands when their assumptions are met, but may be biased otherwise. We illustrate the application of the missing data methods using data from the ‘Origins of Variance in the Old–old’ Twin study.
Collapse
Affiliation(s)
- Lan Wen
- MRC Biostatistics Unit, University of Cambridge, IPH Forvie Site, Robinson Way, Cambridge, UK
| | | | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, IPH Forvie Site, Robinson Way, Cambridge, UK
| |
Collapse
|
23
|
Samartsidis P, Seaman SR, Presanis AM, Hickman M, De Angelis D. Assessing the Causal Effect of Binary Interventions from Observational Panel Data with Few Treated Units. Stat Sci 2019. [DOI: 10.1214/19-sts713] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
Wason JMS, Seaman SR. A latent variable model for improving inference in trials assessing the effect of dose on toxicity and composite efficacy endpoints. Stat Methods Med Res 2019. [PMID: 30799777 PMCID: PMC6986906 DOI: 10.1177/tobeassigned] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
It is often of interest to explore how dose affects the toxicity and efficacy properties of a novel treatment. In oncology, efficacy is often assessed through response, which is defined by a patient having no new tumour lesions and their tumour size shrinking by 30%. Usually response and toxicity are analysed as binary outcomes in early phase trials. Methods have been proposed to improve the efficiency of analysing response by utilising the continuous tumour size information instead of dichotomising it. However, these methods do not allow for toxicity or for different doses. Motivated by a phase II trial testing multiple doses of a treatment against placebo, we propose a latent variable model that can estimate the probability of response and no toxicity (or other related outcomes) for different doses. We assess the confidence interval coverage and efficiency properties of the method, compared to methods that do not use the continuous tumour size, in a simulation study and the real study. The coverage is close to nominal when model assumptions are met, although can be below nominal when the model is misspecified. Compared to methods that treat response as binary, the method has confidence intervals with 30-50% narrower widths. The method adds considerable efficiency but care must be taken that the model assumptions are reasonable.
Collapse
Affiliation(s)
- James M. S. Wason
- Institute of Health and Society, Newcastle University,MRC Biostatistics Unit, University of Cambridge
| | | |
Collapse
|
25
|
Wen L, Seaman SR. Semi-parametric methods of handling missing data in mortal cohorts under non-ignorable missingness. Biometrics 2018; 74:1427-1437. [PMID: 29772074 PMCID: PMC6481558 DOI: 10.1111/biom.12891] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 03/01/2018] [Accepted: 04/01/2018] [Indexed: 11/29/2022]
Abstract
We propose semi-parametric methods to model cohort data where repeated outcomes may be missing due to death and non-ignorable dropout. Our focus is to obtain inference about the cohort composed of those who are still alive at any time point (partly conditional inference). We propose: i) an inverse probability weighted method that upweights observed subjects to represent subjects who are still alive but are not observed; ii) an outcome regression method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) an augmented inverse probability method that combines the previous two methods and is double robust against model misspecification. These methods are described for both monotone and non-monotone missing data patterns, and are applied to a cohort of elderly adults from the Health and Retirement Study. Sensitivity analysis to departures from the assumption that missingness at some visit t is independent of the outcome at visit t given past observed data and time of death is used in the data application.
Collapse
Affiliation(s)
| | - Shaun R. Seaman
- MRC Biostatistics Unit, University of Cambridge, IPH Forvie Site,
Robinson Way, Cambridge CB2 0SR, U.K
| |
Collapse
|
26
|
Wen L, Terrera GM, Seaman SR. Erratum: Methods for handling longitudinal outcome processes truncated by dropout and death. Biostatistics 2018; 19:594. [PMID: 29462283 PMCID: PMC6180845 DOI: 10.1093/biostatistics/kxy001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Lan Wen
- MRC Biostatistics Unit, University of Cambridge, IPH Forvie Site, Robinson Way, Cambridge, UK
| | | | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, IPH Forvie Site, Robinson Way, Cambridge, UK
| |
Collapse
|
27
|
Keogh RH, Seaman SR, Bartlett JW, Wood AM. Multiple imputation of missing data in nested case-control and case-cohort studies. Biometrics 2018; 74:1438-1449. [PMID: 29870056 PMCID: PMC6481559 DOI: 10.1111/biom.12910] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 04/01/2018] [Accepted: 04/01/2018] [Indexed: 12/18/2022]
Abstract
The nested case-control and case-cohort designs are two main approaches for carrying out a substudy within a prospective cohort. This article adapts multiple imputation (MI) methods for handling missing covariates in full-cohort studies for nested case-control and case-cohort studies. We consider data missing by design and data missing by chance. MI analyses that make use of full-cohort data and MI analyses based on substudy data only are described, alongside an intermediate approach in which the imputation uses full-cohort data but the analysis uses only the substudy. We describe adaptations to two imputation methods: the approximate method (MI-approx) of White and Royston (2009) and the “substantive model compatible” (MI-SMC) method of Bartlett et al. (2015). We also apply the “MI matched set” approach of Seaman and Keogh (2015) to nested case-control studies, which does not require any full-cohort information. The methods are investigated using simulation studies and all perform well when their assumptions hold. Substantial gains in efficiency can be made by imputing data missing by design using the full-cohort approach or by imputing data missing by chance in analyses using the substudy only. The intermediate approach brings greater gains in efficiency relative to the substudy approach and is more robust to imputation model misspecification than the full-cohort approach. The methods are illustrated using the ARIC Study cohort. Supplementary Materials provide R and Stata code.
Collapse
Affiliation(s)
- Ruth H Keogh
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, U.K
| | | | | | - Angela M Wood
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, U.K
| |
Collapse
|
28
|
Abstract
Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and 'design-consistent' estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation.
Collapse
Affiliation(s)
- Shaun R Seaman
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.,Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
29
|
Leyrat C, Seaman SR, White IR, Douglas I, Smeeth L, Kim J, Resche-Rigon M, Carpenter JR, Williamson EJ. Propensity score analysis with partially observed covariates: How should multiple imputation be used? Stat Methods Med Res 2017; 28:3-19. [PMID: 28573919 PMCID: PMC6313366 DOI: 10.1177/0962280217713032] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement multiple imputation for propensity score analysis: (a) should we apply Rubin's rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after multiple imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third multiple imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas multiple imputation approaches were approximately unbiased as long as the outcome was included in the imputation model. Only MIte was unbiased in all the studied scenarios and Rubin's rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using multiple imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the imputation model is the preferred approach.
Collapse
Affiliation(s)
- Clémence Leyrat
- 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK
| | - Shaun R Seaman
- 2 MRC Biostatistics Unit, Cambridge Institute for Public Health, Cambridge, UK
| | - Ian R White
- 2 MRC Biostatistics Unit, Cambridge Institute for Public Health, Cambridge, UK.,3 London Hub for Trials Methodology Research, MRC Clinical Trials Unit, UCL, London, UK
| | - Ian Douglas
- 4 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK
| | - Liam Smeeth
- 4 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK
| | - Joseph Kim
- 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK.,5 IMS Health, Real-World Evidence Solutions, UK
| | - Matthieu Resche-Rigon
- 6 SBIM Biostatistics and Medical Information, Hôpital Saint-Louis, France.,7 ECSTRA Team (Epidémiologie Clinique et Statistiques pour la Recherche en Santé), UMR 1153 INSERM, Université Paris Diderot, France
| | - James R Carpenter
- 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK.,3 London Hub for Trials Methodology Research, MRC Clinical Trials Unit, UCL, London, UK
| | - Elizabeth J Williamson
- 1 Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK.,8 Farr Institute of Health Informatics, London University College, London, UK
| |
Collapse
|
30
|
Seaman SR, Farewell D, White IR. Linear Increments with Non-monotone Missing Data and Measurement Error. Scand Stat Theory Appl 2016; 43:996-1018. [PMID: 27867251 PMCID: PMC5111617 DOI: 10.1111/sjos.12225] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 12/09/2015] [Accepted: 02/10/2016] [Indexed: 11/30/2022]
Abstract
Linear increments (LI) are used to analyse repeated outcome data with missing values. Previously, two LI methods have been proposed, one allowing non‐monotone missingness but not independent measurement error and one allowing independent measurement error but only monotone missingness. In both, it was suggested that the expected increment could depend on current outcome. We show that LI can allow non‐monotone missingness and either independent measurement error of unknown variance or dependence of expected increment on current outcome but not both. A popular alternative to LI is a multivariate normal model ignoring the missingness pattern. This gives consistent estimation when data are normally distributed and missing at random (MAR). We clarify the relation between MAR and the assumptions of LI and show that for continuous outcomes multivariate normal estimators are also consistent under (non‐MAR and non‐normal) assumptions not much stronger than those of LI. Moreover, when missingness is non‐monotone, they are typically more efficient.
Collapse
Affiliation(s)
| | - Daniel Farewell
- Institute of Primary Care and Public Health Cardiff University
| | | |
Collapse
|
31
|
Seaman SR, Hughes RA. Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model. Stat Methods Med Res 2016; 27:1603-1614. [PMID: 27597798 PMCID: PMC5496676 DOI: 10.1177/0962280216665872] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.
Collapse
Affiliation(s)
- Shaun R Seaman
- MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK
- Shaun R Seaman, MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge CB20SR, UK.
| | - Rachael A Hughes
- School of Social and Community Medicine, University of Bristol, Bristol, UK
| |
Collapse
|
32
|
Yelland LN, Sullivan TR, Pavlou M, Seaman SR. Response to Klebanoff. Paediatr Perinat Epidemiol 2016; 30:206. [PMID: 26860448 DOI: 10.1111/ppe.12271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Lisa N Yelland
- Women's and Children's Health Research Institute, North Adelaide, SA, Australia.,School of Population Health, The University of Adelaide, Adelaide, SA, Australia
| | - Thomas R Sullivan
- School of Population Health, The University of Adelaide, Adelaide, SA, Australia
| | - Menelaos Pavlou
- Department of Statistical Science, University College London, London, UK
| | | |
Collapse
|
33
|
Abstract
BACKGROUND Informative birth size occurs when the average outcome depends on the number of infants per birth. Although analysis methods have been proposed for handling informative birth size, their performance is not well understood. Our aim was to evaluate the performance of these methods and to provide recommendations for their application in randomised trials including infants from single and multiple births. METHODS Three generalised estimating equation (GEE) approaches were considered for estimating the effect of treatment on a continuous or binary outcome: cluster weighted GEEs, which produce treatment effects with a mother-level interpretation when birth size is informative; standard GEEs with an independence working correlation structure, which produce treatment effects with an infant-level interpretation when birth size is informative; and standard GEEs with an exchangeable working correlation structure, which do not account for informative birth size. The methods were compared through simulation and analysis of an example dataset. RESULTS Treatment effect estimates were affected by informative birth size in the simulation study when the effect of treatment in singletons differed from that in multiples (i.e. in the presence of a treatment group by multiple birth interaction). The strength of evidence supporting the effectiveness of treatment varied between methods in the example dataset. CONCLUSIONS Informative birth size is always a possibility in randomised trials including infants from both single and multiple births, and analysis methods should be pre-specified with this in mind. We recommend estimating treatment effects using standard GEEs with an independence working correlation structure to give an infant-level interpretation.
Collapse
Affiliation(s)
- Lisa N. Yelland
- Women’s and Children’s Health Research Institute, North Adelaide,School of Population Health, The University of Adelaide, Adelaide, South Australia, Australia
| | - Thomas R. Sullivan
- School of Population Health, The University of Adelaide, Adelaide, South Australia, Australia
| | - Menelaos Pavlou
- Department of Statistical Science, University College London, London
| | | |
Collapse
|
34
|
Abstract
When the number of events is low relative to the number of predictors, standard regression could produce overfitted risk models that make inaccurate predictions. Use of penalised regression may improve the accuracy of risk prediction
Collapse
Affiliation(s)
- Menelaos Pavlou
- Department of Statistical Science, University College London, WC1E 6BT London, UK
| | - Gareth Ambler
- Department of Statistical Science, University College London, WC1E 6BT London, UK
| | | | - Oliver Guttmann
- School of Life and Medical Sciences, Institute of Cardiovascular Science, University College London
| | - Perry Elliott
- Inherited Cardiac Disease Unit, the Heart Hospital, London
| | - Michael King
- Division of Psychiatry, University College London
| | - Rumana Z Omar
- Department of Statistical Science, University College London, WC1E 6BT London, UK
| |
Collapse
|
35
|
Seaman SR, Keogh RH. Handling missing data in matched case-control studies using multiple imputation. Biometrics 2015; 71:1150-9. [PMID: 26237003 DOI: 10.1111/biom.12358] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 12/26/2022]
Abstract
Analysis of matched case-control studies is often complicated by missing data on covariates. Analysis can be restricted to individuals with complete data, but this is inefficient and may be biased. Multiple imputation (MI) is an efficient and flexible alternative. We describe two MI approaches. The first uses a model for the data on an individual and includes matching variables; the second uses a model for the data on a whole matched set and avoids the need to model the matching variables. Within each approach, we consider three methods: full-conditional specification (FCS), joint model MI using a normal model, and joint model MI using a latent normal model. We show that FCS MI is asymptotically equivalent to joint model MI using a restricted general location model that is compatible with the conditional logistic regression analysis model. The normal and latent normal imputation models are not compatible with this analysis model. All methods allow for multiple partially-observed covariates, non-monotone missingness, and multiple controls per case. They can be easily applied in standard statistical software and valid variance estimates obtained using Rubin's Rules. We compare the methods in a simulation study. The approach of including the matching variables is most efficient. Within each approach, the FCS MI method generally yields the least-biased odds ratio estimates, but normal or latent normal joint model MI is sometimes more efficient. All methods have good confidence interval coverage. Data on colorectal cancer and fibre intake from the EPIC-Norfolk study are used to illustrate the methods, in particular showing how efficiency is gained relative to just using individuals with complete data.
Collapse
Affiliation(s)
| | - Ruth H Keogh
- London School of Hygiene and Tropical Medicine, London, WC1E 7HT, U.K
| |
Collapse
|
36
|
Seaman SR, White IR, Leacy FP. Comment on "analysis of longitudinal trials with protocol deviations: a framework for relevant, accessible assumptions, and inference via multiple imputation," by Carpenter, Roger, and Kenward. J Biopharm Stat 2015; 24:1358-62. [PMID: 24915418 PMCID: PMC4241629 DOI: 10.1080/10543406.2014.928306] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Shaun R Seaman
- a Medical Research Council, Biostatistics Unit , Institute of Public Health , Cambridge , United Kingdom
| | | | | |
Collapse
|
37
|
Abstract
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation.
Collapse
Affiliation(s)
- Gordon C. S. Smith
- Correspondence to Dr. Gordon C. S. Smith, University of Cambridge, The Rosie Hospital, Cambridge, CB2 0SW, United Kingdom (e-mail: )
| | | | | | | | | |
Collapse
|
38
|
Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JAC. Joint modelling rationale for chained equations. BMC Med Res Methodol 2014; 14:28. [PMID: 24559129 PMCID: PMC3936896 DOI: 10.1186/1471-2288-14-28] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Accepted: 02/13/2014] [Indexed: 11/30/2022] Open
Abstract
Background Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples. Methods Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied. Results We provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak. Conclusions Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible.
Collapse
Affiliation(s)
- Rachael A Hughes
- School of Social and Community Medicine, University of Bristol, Bristol, UK.
| | | | | | | | | | | |
Collapse
|
39
|
Bartlett JW, Seaman SR, White IR, Carpenter JR. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Methods Med Res 2014; 24:462-87. [PMID: 24525487 PMCID: PMC4513015 DOI: 10.1177/0962280214521348] [Citation(s) in RCA: 267] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.
Collapse
Affiliation(s)
- Jonathan W Bartlett
- Department of Medical Statistics, London School of Hygiene & Tropical Medicine, UK
| | | | | | - James R Carpenter
- Department of Medical Statistics, London School of Hygiene & Tropical Medicine, UK MRC Clinical Trials Unit, London, UK
| | | |
Collapse
|
40
|
Seaman SR, Pavlou M, Copas AJ. Methods for observed-cluster inference when cluster size is informative: a review and clarifications. Biometrics 2014; 70:449-56. [PMID: 24479899 PMCID: PMC4312901 DOI: 10.1111/biom.12151] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Revised: 11/01/2013] [Accepted: 01/01/2014] [Indexed: 11/28/2022]
Abstract
Clustered data commonly arise in epidemiology. We assume each cluster member has an outcome Y and covariates X. When there are missing data in Y, the distribution of Y given X in all cluster members ("complete clusters") may be different from the distribution just in members with observed Y ("observed clusters"). Often the former is of interest, but when data are missing because in a fundamental sense Y does not exist (e.g., quality of life for a person who has died), the latter may be more meaningful (quality of life conditional on being alive). Weighted and doubly weighted generalized estimating equations and shared random-effects models have been proposed for observed-cluster inference when cluster size is informative, that is, the distribution of Y given X in observed clusters depends on observed cluster size. We show these methods can be seen as actually giving inference for complete clusters and may not also give observed-cluster inference. This is true even if observed clusters are complete in themselves rather than being the observed part of larger complete clusters: here methods may describe imaginary complete clusters rather than the observed clusters. We show under which conditions shared random-effects models proposed for observed-cluster inference do actually describe members with observed Y. A psoriatic arthritis dataset is used to illustrate the danger of misinterpreting estimates from shared random-effects models.
Collapse
|
41
|
Morris TP, White IR, Royston P, Seaman SR, Wood AM. Multiple imputation for an incomplete covariate that is a ratio. Stat Med 2013; 33:88-104. [PMID: 23922236 PMCID: PMC3920636 DOI: 10.1002/sim.5935] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 07/11/2013] [Indexed: 11/27/2022]
Abstract
We are concerned with multiple imputation of the ratio of two variables, which is to be used as a covariate in a regression analysis. If the numerator and denominator are not missing simultaneously, it seems sensible to make use of the observed variable in the imputation model. One such strategy is to impute missing values for the numerator and denominator, or the log-transformed numerator and denominator, and then calculate the ratio of interest; we call this ‘passive’ imputation. Alternatively, missing ratio values might be imputed directly, with or without the numerator and/or the denominator in the imputation model; we call this ‘active’ imputation. In two motivating datasets, one involving body mass index as a covariate and the other involving the ratio of total to high-density lipoprotein cholesterol, we assess the sensitivity of results to the choice of imputation model and, as an alternative, explore fully Bayesian joint models for the outcome and incomplete ratio. Fully Bayesian approaches using Winbugs were unusable in both datasets because of computational problems. In our first dataset, multiple imputation results are similar regardless of the imputation model; in the second, results are sensitive to the choice of imputation model. Sensitivity depends strongly on the coefficient of variation of the ratio's denominator. A simulation study demonstrates that passive imputation without transformation is risky because it can lead to downward bias when the coefficient of variation of the ratio's denominator is larger than about 0.1. Active imputation or passive imputation after log-transformation is preferable. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Tim P Morris
- Hub for Trials Methodology Research, MRC Clinical Trials Unit, Aviation House, 125 Kingsway, London WC2B 6NH, U.K.; MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 0SR, U.K
| | | | | | | | | |
Collapse
|
42
|
Wason JMS, Seaman SR. Using continuous data on tumour measurements to improve inference in phase II cancer studies. Stat Med 2013; 32:4639-50. [PMID: 23776143 PMCID: PMC4282550 DOI: 10.1002/sim.5867] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 05/09/2013] [Indexed: 11/09/2022]
Abstract
In phase II cancer trials, tumour response is either the primary or an important secondary endpoint. Tumour response is a binary composite endpoint determined, according to the Response Evaluation Criteria in Solid Tumors, by (1) whether the percentage change in tumour size is greater than a prescribed threshold and (2) (binary) criteria such as whether a patient develops new lesions. Further binary criteria, such as death or serious toxicity, may be added to these criteria. The probability of tumour response (i.e. 'success' on the composite endpoint) would usually be estimated simply as the proportion of successes among patients. This approach uses the tumour size variable only through a discretised form, namely whether or not it is above the threshold. In this article, we propose a method that also estimates the probability of success but that gains precision by using the information on the undiscretised (i.e. continuous) tumour size variable. This approach can also be used to increase the power to detect a difference between the probabilities of success under two different treatments in a comparative trial. We demonstrate these increases in precision and power using simulated data. We also apply the method to real data from a phase II cancer trial and show that it results in a considerably narrower confidence interval for the probability of tumour response.
Collapse
|
43
|
|
44
|
Seaman SR, Bartlett JW, White IR. Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol 2012; 12:46. [PMID: 22489953 PMCID: PMC3403931 DOI: 10.1186/1471-2288-12-46] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 04/10/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X2. In 'passive imputation' a value X* is imputed for X and then X2 is imputed as (X*)2. A recent proposal is to treat X2 as 'just another variable' (JAV) and impute X and X2 under multivariate normality. METHODS We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. RESULTS JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. CONCLUSIONS Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.
Collapse
Affiliation(s)
- Shaun R Seaman
- MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR, UK.
| | | | | |
Collapse
|
45
|
Abstract
Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribution of the missing data (a multivariate outcome) given the observed data. Inadequacies in either model may lead to important bias if large amounts of data are missing. A third approach combines MI and IPW to give a doubly robust estimator. A fourth approach (IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only isolated missing values and uses weights to account for remaining larger blocks of unimputed missing data, such as would arise, e.g., in a cohort study subject to sample attrition, and/or unequal sampling fractions. In this article, we examine the performance, in terms of bias and efficiency, of IPW/MI relative to MI and IPW alone and investigate whether the Rubin's rules variance estimator is valid for IPW/MI. We prove that the Rubin's rules variance estimator is valid for IPW/MI for linear regression with an imputed outcome, we present simulations supporting the use of this variance estimator in more general settings, and we demonstrate that IPW/MI can have advantages over alternatives. IPW/MI is applied to data from the National Child Development Study.
Collapse
|
46
|
Abstract
The simplest approach to dealing with missing data is to restrict the analysis to complete cases, i.e. individuals with no missing values. This can induce bias, however. Inverse probability weighting (IPW) is a commonly used method to correct this bias. It is also used to adjust for unequal sampling fractions in sample surveys. This article is a review of the use of IPW in epidemiological research. We describe how the bias in the complete-case analysis arises and how IPW can remove it. IPW is compared with multiple imputation (MI) and we explain why, despite MI generally being more efficient, IPW may sometimes be preferred. We discuss the choice of missingness model and methods such as weight truncation, weight stabilisation and augmented IPW. The use of IPW is illustrated on data from the 1958 British Birth Cohort.
Collapse
Affiliation(s)
- Shaun R Seaman
- MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Robinson Way, Cambridge, UK.
| | | |
Collapse
|
47
|
|
48
|
Hensiek AE, Seaman SR, Barcellos LF, Oturai A, Eraksoi M, Cocco E, Vecsei L, Stewart G, Dubois B, Bellman-Strobl J, Leone M, Andersen O, Bencsik K, Booth D, Celius EG, Harbo HF, Hauser SL, Heard R, Hillert J, Myhr KM, Marrosu MG, Oksenberg JR, Rajda C, Sawcer SJ, Sørensen PS, Zipp F, Compston DAS. Familial effects on the clinical course of multiple sclerosis. Neurology 2007; 68:376-83. [PMID: 17261686 DOI: 10.1212/01.wnl.0000252822.53506.46] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Familial factors influence susceptibility to multiple sclerosis (MS) but it is unknown whether there are additional effects on the natural history of the disease. METHOD We evaluated 1,083 families with > or =2 first-degree relatives with MS for concordance of age at onset, clinical course, and disease severity and investigated transmission patterns of these clinical features in affected parent-child pairs. RESULTS There is concordance for age at onset for all families (correlation coefficient 0.14; p < 0.001), as well as for affected siblings (correlation coefficient 0.15; p < 0.001), and affected parent-child pairs (correlation coefficient 0.12; p = 0.03) when each is evaluated separately. Concordance for year of onset is present among affected siblings (correlation coefficient 0.18; p < 0.001) but not the parent-child group (correlation coefficient 0.08; p = 0.15). The clinical course is similar between siblings (kappa 0.12; p < 0.001) but not affected parents and their children (kappa -0.04; p = 0.09). This influence on the natural history is present in all clinical subgroups of relapsing-remitting, and primary and secondary progressive MS, reflecting a familial effect on episodic and progressive phases of the disease. There is no concordance for disease severity within any of the considered family groups (correlation coefficients: all families analyzed together, 0.02, p = 0.53; affected sibling group, 0.02, p = 0.61; affected parent-child group, 0.02, p = 0.69). Furthermore, there are no apparent transmission patterns of any of the investigated clinical features in affected parent-child pairs and no evidence for anticipation or effects of genetic loading. CONCLUSION Familial factors do not significantly affect eventual disease severity. However, they increase the probability of a progressive clinical course, either from onset or after a phase of relapsing remitting disease. The familial effect is more likely to reflect genetic than environmental conditions. The results are relevant for counseling patients and have implications for the design of studies seeking to identify factors that influence the natural history of the disease.
Collapse
Affiliation(s)
- A E Hensiek
- Department of Clinical Neuroscience, University of Cambridge Clinical School, Addenbrooke's Hospital, Box 165, Cambridge CB2 2QQ, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Höfler M, Seaman SR. Re-interpreting conventional interval estimates taking into account bias and extra-variation. BMC Med Res Methodol 2006; 6:51. [PMID: 17042949 PMCID: PMC1618852 DOI: 10.1186/1471-2288-6-51] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2006] [Accepted: 10/16/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The study design with the smallest bias for causal inference is a perfect randomized clinical trial. Since this design is often not feasible in epidemiologic studies, an important challenge is to model bias properly and take random and systematic variation properly into account. A value for a target parameter might be said to be "incompatible" with the data (under the model used) if the parameter's confidence interval excludes it. However, this "incompatibility" may be due to bias and/or extra-variation. DISCUSSION We propose the following way of re-interpreting conventional results. Given a specified focal value for a target parameter (typically the null value, but possibly a non-null value like that representing a twofold risk), the difference between the focal value and the nearest boundary of the confidence interval for the parameter is calculated. This represents the maximum correction of the interval boundary, for bias and extra-variation, that would still leave the focal value outside the interval, so that the focal value remained "incompatible" with the data. We describe a short example application concerning a meta analysis of air versus pure oxygen resuscitation treatment in newborn infants. Some general guidelines are provided for how to assess the probability that the appropriate correction for a particular study would be greater than this maximum (e.g. using knowledge of the general effects of bias and extra-variation from published bias-adjusted results). SUMMARY Although this approach does not yet provide a method, because the latter probability can not be objectively assessed, this paper aims to stimulate the re-interpretation of conventional confidence intervals, and more and better studies of the effects of different biases.
Collapse
Affiliation(s)
- Michael Höfler
- Institute of Clinical Psychology and Psychotherapy, Dresden University of Technology, Chemnitzer Str. 46a, 01187 Dresden, Germany
| | - Shaun R Seaman
- Department of Statistical Science, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
50
|
Salyakina D, Seaman SR, Browning BL, Dudbridge F, Muller-Myhsok B. Evaluation of Nyholt’s Procedure for Multiple Testing Correction. Hum Hered 2005; 60:19-25; discussion 61-2. [PMID: 16118503 DOI: 10.1159/000087540] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2005] [Accepted: 06/27/2005] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE A simple method for accounting efficiently for multiple testing of many SNPs in an association study was recently proposed by Nyholt, but its performance was not extensively evaluated. The method involves estimating an 'effective number' of independent tests and then adjusting the smallest observed p value using Sidák's formula based on this number of tests. We sought to carry out an empirical and theoretical evaluation of Nyholt's method. METHODS Nyholt's method was applied to a sample of 31 genes typed at a total of 291 SNPs and permutation used to determine the type-I error rate for each gene. Based on our empirical results, we algebraically investigated the effective number of independent tests for a simple model of haplotype block structure. RESULTS The nominal 5% type I error rate varied from under 3% to over 7%, and was dependent on linkage disequilibrium. Theoretical considerations show further that the method can be very conservative in the presence of haplotype block structure. CONCLUSION Although Nyholt's approach may be useful as an exploratory tool, it is not an adequate substitute for permutation tests.
Collapse
|