1
|
Smith MJ, Phillips RV, Luque-Fernandez MA, Maringe C. Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review. Ann Epidemiol 2023; 86:34-48.e28. [PMID: 37343734 DOI: 10.1016/j.annepidem.2023.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/24/2023] [Accepted: 06/06/2023] [Indexed: 06/23/2023]
Abstract
PURPOSE The targeted maximum likelihood estimation (TMLE) statistical data analysis framework integrates machine learning, statistical theory, and statistical inference to provide a least biased, efficient, and robust strategy for estimation and inference of a variety of statistical and causal parameters. We describe and evaluate the epidemiological applications that have benefited from recent methodological developments. METHODS We conducted a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies. We summarized the epidemiological discipline, geographical location, expertize of the authors, and TMLE methods over time. We used the Roadmap of Targeted Learning and Causal Inference to extract key methodological aspects of the publications. We showcase the contributions to the literature of these TMLE results. RESULTS Of the 89 publications included, 33% originated from the University of California at Berkeley, where the framework was first developed by Professor Mark van der Laan. By 2022, 59% of the publications originated from outside the United States and explored up to seven different epidemiological disciplines in 2021-2022. Double-robustness, bias reduction, and model misspecification were the main motivations that drew researchers toward the TMLE framework. Through time, a wide variety of methodological, tutorial, and software-specific articles were cited, owing to the constant growth of methodological developments around TMLE. CONCLUSIONS There is a clear dissemination trend of the TMLE framework to various epidemiological disciplines and to increasing numbers of geographical areas. The availability of R packages, publication of tutorial papers, and involvement of methodological experts in applied publications have contributed to an exponential increase in the number of studies that understood the benefits and adoption of TMLE.
Collapse
Affiliation(s)
- Matthew J Smith
- Inequalities in Cancer Outcomes Network, London School of Hygiene and Tropical Medicine, London, UK.
| | - Rachael V Phillips
- Division of Biostatistics, School of Public Health, University of California at Berkeley, Berkeley, CA
| | - Miguel Angel Luque-Fernandez
- Inequalities in Cancer Outcomes Network, London School of Hygiene and Tropical Medicine, London, UK; Department of Statistics and Operations Research, University of Granada, Granada, Spain
| | - Camille Maringe
- Inequalities in Cancer Outcomes Network, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
2
|
Rassen JA, Blin P, Kloss S, Neugebauer RS, Platt RW, Pottegård A, Schneeweiss S, Toh S. High-dimensional propensity scores for empirical covariate selection in secondary database studies: Planning, implementation, and reporting. Pharmacoepidemiol Drug Saf 2023; 32:93-106. [PMID: 36349471 PMCID: PMC10099872 DOI: 10.1002/pds.5566] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/14/2022] [Accepted: 10/17/2022] [Indexed: 11/11/2022]
Abstract
Real-world evidence used for regulatory, payer, and clinical decision-making requires principled epidemiology in design and analysis, applying methods to minimize confounding given the lack of randomization. One technique to deal with potential confounding is propensity score (PS) analysis, which allows for the adjustment for measured preexposure covariates. Since its first publication in 2009, the high-dimensional propensity score (hdPS) method has emerged as an approach that extends traditional PS covariate selection to include large numbers of covariates that may reduce confounding bias in the analysis of healthcare databases. hdPS is an automated, data-driven analytic approach for covariate selection that empirically identifies preexposure variables and proxies to include in the PS model. This article provides an overview of the hdPS approach and recommendations on the planning, implementation, and reporting of hdPS used for causal treatment-effect estimations in longitudinal healthcare databases. We supply a checklist with key considerations as a supportive decision tool to aid investigators in the implementation and transparent reporting of hdPS techniques, and to aid decision-makers unfamiliar with hdPS in the understanding and interpretation of studies employing this approach. This article is endorsed by the International Society for Pharmacoepidemiology.
Collapse
Affiliation(s)
| | - Patrick Blin
- Bordeaux PharmacoEpi, Bordeaux University, INSERM CIC‐P 1401BordeauxFrance
| | - Sebastian Kloss
- EMEA Real‐World Evidence & Value‐Based HealthcareJanssenBerlinGermany
| | | | - Robert W. Platt
- Professor, Departments of Pediatrics and of Epidemiology, Biostatistics, and Occupational HealthMcGill UniversityMontrealQuebecCanada
| | - Anton Pottegård
- Clinical Pharmacology, Pharmacy and Environmental Medicine, Department of Public HealthUniversity of Southern DenmarkOdenseDenmark
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and PharmacoeconomicsBrigham and Women's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
| | - Sengwee Toh
- Department of Population MedicineHarvard Medical School and Harvard Pilgrim Health Care InstituteBostonMassachusettsUSA
| |
Collapse
|
3
|
Dorie V, Perrett G, Hill JL, Goodrich B. Stan and BART for Causal Inference: Estimating Heterogeneous Treatment Effects Using the Power of Stan and the Flexibility of Machine Learning. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1782. [PMID: 36554187 PMCID: PMC9778579 DOI: 10.3390/e24121782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/22/2022] [Accepted: 11/06/2022] [Indexed: 06/17/2023]
Abstract
A wide range of machine-learning-based approaches have been developed in the past decade, increasing our ability to accurately model nonlinear and nonadditive response surfaces. This has improved performance for inferential tasks such as estimating average treatment effects in situations where standard parametric models may not fit the data well. These methods have also shown promise for the related task of identifying heterogeneous treatment effects. However, the estimation of both overall and heterogeneous treatment effects can be hampered when data are structured within groups if we fail to correctly model the dependence between observations. Most machine learning methods do not readily accommodate such structure. This paper introduces a new algorithm, stan4bart, that combines the flexibility of Bayesian Additive Regression Trees (BART) for fitting nonlinear response surfaces with the computational and statistical efficiencies of using Stan for the parametric components of the model. We demonstrate how stan4bart can be used to estimate average, subgroup, and individual-level treatment effects with stronger performance than other flexible approaches that ignore the multilevel structure of the data as well as multilevel approaches that have strict parametric forms.
Collapse
Affiliation(s)
| | - George Perrett
- Department of Applied Statistics, Social Science, and the Humanities, New York University, New York, NY 10003, USA
| | - Jennifer L. Hill
- Department of Applied Statistics, Social Science, and the Humanities, New York University, New York, NY 10003, USA
| | - Benjamin Goodrich
- Department of Political Science, Columbia University, New York, NY 10025, USA
| |
Collapse
|
4
|
Wyss R, Schneeweiss S, Lin KJ, Miller DP, Kalilani L, Franklin JM. Synthetic Negative Controls: Using Simulation to Screen Large-scale Propensity Score Analyses. Epidemiology 2022; 33:541-550. [PMID: 35439779 PMCID: PMC9156547 DOI: 10.1097/ede.0000000000001482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The propensity score has become a standard tool to control for large numbers of variables in healthcare database studies. However, little has been written on the challenge of comparing large-scale propensity score analyses that use different methods for confounder selection and adjustment. In these settings, balance diagnostics are useful but do not inform researchers on which variables balance should be assessed or quantify the impact of residual covariate imbalance on bias. Here, we propose a framework to supplement balance diagnostics when comparing large-scale propensity score analyses. Instead of focusing on results from any single analysis, we suggest conducting and reporting results for many analytic choices and using both balance diagnostics and synthetically generated control studies to screen analyses that show signals of bias caused by measured confounding. To generate synthetic datasets, the framework does not require simulating the outcome-generating process. In healthcare database studies, outcome events are often rare, making it difficult to identify and model all predictors of the outcome to simulate a confounding structure closely resembling the given study. Therefore, the framework uses a model for treatment assignment to divide the comparator population into pseudo-treatment groups where covariate differences resemble those in the study cohort. The partially simulated datasets have a confounding structure approximating the study population under the null (synthetic negative control studies). The framework is used to screen analyses that likely violate partial exchangeability due to lack of control for measured confounding. We illustrate the framework using simulations and an empirical example.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
- Division of General Internal Medicine, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | | | | | - Jessica M Franklin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
5
|
Wyss R, Yanover C, El-Hay T, Bennett D, Platt RW, Zullo AR, Sari G, Wen X, Ye Y, Yuan H, Gokhale M, Patorno E, Lin KJ. Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: an overview of the current literature. Pharmacoepidemiol Drug Saf 2022; 31:932-943. [PMID: 35729705 PMCID: PMC9541861 DOI: 10.1002/pds.5500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 06/01/2022] [Accepted: 06/05/2022] [Indexed: 11/10/2022]
Abstract
Controlling for large numbers of variables that collectively serve as 'proxies' for unmeasured factors can often improve confounding control in pharmacoepidemiologic studies utilizing administrative healthcare databases. There is a growing body of evidence showing that data-driven machine learning algorithms for high-dimensional proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment. In this paper, we discuss the considerations underpinning three areas for data-driven high-dimensional proxy confounder adjustment: 1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; 2) covariate prioritization, selection and adjustment; and 3) diagnostic assessment. We survey current approaches and recent advancements within each area, including the most widely used approach to proxy confounder adjustment in healthcare database studies (the high-dimensional propensity score or hdPS). We also discuss limitations of the hdPS and outline recent advancements that incorporate the principles of proxy adjustment with machine learning extensions to improve performance. We further discuss challenges and avenues of future development within each area. This manuscript is endorsed by the International Society for Pharmacoepidemiology (ISPE). This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Tal El-Hay
- KI Research Institute, Kfar Malal, Israel.,IBM Research-Haifa Labs, Haifa, Israel
| | - Dimitri Bennett
- Global Evidence and Outcomes, Takeda Pharmaceutical Company Ltd., Cambridge, MA, USA
| | | | - Andrew R Zullo
- Department of Health Services, Policy, and Practice, Brown University School of Public Health and Center of Innovation in Long-Term Services and Supports, Providence Veterans Affairs Medical Center, Providence, RI, USA
| | - Grammati Sari
- Real World Evidence Strategy Lead, Visible Analytics Ltd, Oxford, UK
| | - Xuerong Wen
- Health Outcomes, Pharmacy Practice, College of Pharmacy, University of Rhode Island, Kingston, RI, USA
| | - Yizhou Ye
- Global Epidemiology, AbbVie Inc. North Chicago, IL, USA
| | - Hongbo Yuan
- Canadian Agency for Drugs and Technologies in Health, Ottawa, Canada
| | - Mugdha Gokhale
- Pharmacoepidemiology, Center for Observational and Real-world Evidence, Merck, PA, USA
| | - Elisabetta Patorno
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Li H, Rosete S, Coyle J, Phillips RV, Hejazi NS, Malenica I, Arnold BF, Benjamin-Chung J, Mertens A, Colford JM, van der Laan MJ, Hubbard AE. Evaluating the robustness of targeted maximum likelihood estimators via realistic simulations in nutrition intervention trials. Stat Med 2022; 41:2132-2165. [PMID: 35172378 PMCID: PMC10362909 DOI: 10.1002/sim.9348] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 01/20/2022] [Accepted: 01/26/2022] [Indexed: 12/18/2022]
Abstract
Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cross-validation (cross-validated targeted maximum likelihood estimation and double machine learning, as applied to substitution and estimating equation approaches, respectively). While these methods have been evaluated individually on simulated and experimental data sets, a comprehensive analysis of their performance across real data based simulations have yet to be conducted. In this work, we benchmark multiple widely used methods for estimation of the average treatment effect using ten different nutrition intervention studies data. A nonparametric regression method, undersmoothed highly adaptive lasso, is used to generate the simulated distribution which preserves important features from the observed data and reproduces a set of true target parameters. For each simulated data, we apply the methods above to estimate the average treatment effects as well as their standard errors and resulting confidence intervals. Based on the analytic results, a general recommendation is put forth for use of the cross-validated variants of both substitution and estimating equation estimators. We conclude that the additional layer of cross-validation helps in avoiding unintentional over-fitting of nuisance parameter functionals and leads to more robust inferences.
Collapse
Affiliation(s)
- Haodong Li
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Sonali Rosete
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Jeremy Coyle
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Rachael V Phillips
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Nima S Hejazi
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Ivana Malenica
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Benjamin F Arnold
- Proctor Foundation, University of California, San Francisco, San Francisco, California, USA
| | - Jade Benjamin-Chung
- Epidemiology & Population Health, Stanford University, Stanford, California, USA
| | - Andrew Mertens
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - John M Colford
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Mark J van der Laan
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| | - Alan E Hubbard
- Divisions of Epidemiology & Biostatistics, University of California, Berkeley, Berkeley, California, USA
| |
Collapse
|
7
|
Benasseur I, Talbot D, Durand M, Holbrook A, Matteau A, Potter BJ, Renoux C, Schnitzer ME, Tarride JÉ, Guertin JR. A Comparison of Confounder Selection and Adjustment Methods for Estimating Causal Effects Using Large Healthcare Databases. Pharmacoepidemiol Drug Saf 2021; 31:424-433. [PMID: 34953160 PMCID: PMC9304306 DOI: 10.1002/pds.5403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 12/16/2021] [Accepted: 12/20/2021] [Indexed: 11/11/2022]
Abstract
PURPOSE Confounding adjustment is required to estimate the effect of an exposure on an outcome in observational studies. However, variable selection and unmeasured confounding are particularly challenging when analyzing large healthcare data. Machine learning methods may help address these challenges. The objective was to evaluate the capacity of such methods to select confounders and reduce unmeasured confounding bias. METHODS A simulation study with known true effects was conducted. Completely synthetic and partially synthetic data incorporating real large healthcare data were generated. We compared Bayesian Adjustment for Confounding, Generalized Bayesian Causal Effect Estimation, Group Lasso and Doubly Robust Estimation, high-dimensional propensity score, and scalable collaborative targeted maximum likelihood algorithms. For the high-dimensional propensity score, two adjustment approaches targeting the effect in the whole population were considered: full matching and inverse probability weighting. RESULTS In scenarios without hidden confounders, most methods were essentially unbiased. The bias and variance of the high-dimensional propensity score varied considerably according to the number of variables selected by the algorithm. In scenarios with hidden confounders, substantial bias reduction was achieved by using machine learning methods to identify proxies as compared to adjusting only by observed confounders. High-dimensional propensity score and Group Lasso performed poorly in the partially synthetic simulation. Bayesian Adjustment for Confounding, Generalized Bayesian Causal Effect Estimation, and scalable collaborative targeted maximum likelihood algorithms performed particularly well. CONCLUSIONS Machine learning can help to identify measured confounders in large healthcare databases. They can also capitalize on proxies of unmeasured confounders to substantially reduce residual confounding bias. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Imane Benasseur
- Département de mathématiques et de statistique, Université Laval, Québec, Qc, Canada.,Unité santé des populations et pratiques optimales en santé, CHU de Québec - Université Laval research center, Québec, Qc, Canada
| | - Denis Talbot
- Unité santé des populations et pratiques optimales en santé, CHU de Québec - Université Laval research center, Québec, Qc, Canada.,Département de médecine sociale et préventive, Université Laval, Québec, Qc, Canada
| | - Madeleine Durand
- Département de médecine, Université de Montréal, Montréal, Qc, Canada.,CHUM Research Center, Montreal, Qc, Canada
| | - Anne Holbrook
- Division of Clinical Pharmacology & Toxicology, Department of Medicine, McMaster University, Hamilton, On, Canada
| | - Alexis Matteau
- Département de médecine, Université de Montréal, Montréal, Qc, Canada.,CHUM Research Center, Montreal, Qc, Canada
| | - Brian J Potter
- Département de médecine, Université de Montréal, Montréal, Qc, Canada.,CHUM Research Center, Montreal, Qc, Canada
| | - Christel Renoux
- Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research - Jewish General Hospital, Montreal, Qc, Canada.,Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Qc, Canada.,Department of Neurology and Neurosurgery, McGill University, Montréal, Qc, Canada
| | - Mireille E Schnitzer
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Qc, Canada.,Faculty of Pharmacy, Université de Montréal, Montréal, Qc, Canada.,École de santé publique - Département de médecine sociale et préventive, Université de Montréal, Montréal, Qc, Canada
| | - Jean-Éric Tarride
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, On, Canada.,Programs for Assessment of Technology in Health, The Research Institute of St. Joseph's, Hamilton, On, Canada
| | - Jason R Guertin
- Unité santé des populations et pratiques optimales en santé, CHU de Québec - Université Laval research center, Québec, Qc, Canada.,Département de médecine sociale et préventive, Université Laval, Québec, Qc, Canada
| |
Collapse
|
8
|
Huang JY, Cai S, Huang Z, Tint MT, Yuan WL, Aris IM, Godfrey KM, Karnani N, Lee YS, Chan JKY, Chong YS, Eriksson JG, Chan SY. Analyses of child cardiometabolic phenotype following assisted reproductive technologies using a pragmatic trial emulation approach. Nat Commun 2021; 12:5613. [PMID: 34556649 PMCID: PMC8460697 DOI: 10.1038/s41467-021-25899-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 08/06/2021] [Indexed: 11/08/2022] Open
Abstract
Assisted reproductive technologies (ART) are increasingly used, however little is known about the long-term health of ART-conceived offspring. Weak selection of comparison groups and poorly characterized mechanisms impede current understanding. In a prospective cohort (Growing Up in Singapore Towards healthy Outcomes; GUSTO; Clinical Trials ID: NCT01174875) including 83 ART-conceived and 1095 spontaneously-conceived singletons, we estimate effects of ART on anthropometry, blood pressure, serum metabolic biomarkers, and cord tissue DNA methylation by emulating a pragmatic trial supported by machine learning-based estimators. We find ART-conceived children to be shorter (-0.5 SD [95% CI: -0.7, -0.2]), lighter (-0.6 SD [-0.9, -0.3]) and have lower skinfold thicknesses (e.g. -14% [-24%, -3%] suprailiac), and blood pressure (-3 mmHg [-6, -0.5] systolic) at 6-6.5 years, with no strong differences in metabolic biomarkers. Differences are not explained by parental anthropometry or comorbidities, polygenic risk score, breastfeeding, or illnesses. Our simulations demonstrate ART is strongly associated with lower NECAB3 DNA methylation, with negative control analyses suggesting these estimates are unbiased. However, methylation changes do not appear to mediate observed differences in child phenotype.
Collapse
Affiliation(s)
- Jonathan Yinhao Huang
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore.
| | - Shirong Cai
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhongwei Huang
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
| | - Mya Thway Tint
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Wen Lun Yuan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Université de Paris, CRESS, Inserm, Paris, France
| | - Izzuddin M Aris
- Division of Chronic Disease Research Across the Lifecourse, Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Keith M Godfrey
- MRC Lifecourse Epidemiology Centre and NIHR Southampton Biomedical Research Centre, University of Southampton and University Hospital Southampton, Southampton, UK
| | - Neerja Karnani
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
| | - Yung Seng Lee
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Jerry Kok Yen Chan
- Department of Reproductive Medicine, KK Women's and Children's Hospital, Singapore, Singapore
- Academic Clinical Program in Obstetrics and Gynaecology, Duke-NUS Medical School, Singapore, Singapore
| | - Yap Seng Chong
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Johan Gunnar Eriksson
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- University of Helsinki, Department of General Practise and Primary Health Care, Helsinki University Hospital, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Shiao-Yng Chan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore
- Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
9
|
The Society for Vascular Surgery Objective Performance Goals for Critical Limb Ischemia are attainable in select patients with ischemic wounds managed with wound care alone. Ann Vasc Surg 2021; 78:28-35. [PMID: 34543715 DOI: 10.1016/j.avsg.2021.06.034] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/28/2021] [Accepted: 06/30/2021] [Indexed: 01/22/2023]
Abstract
BACKGROUND To set therapeutic benchmarks, in 2009 the Society for Vascular Surgery defined objective performance goals (OPG) for treatment of patients with chronic limb threatening ischemia (CLTI) with either open surgical bypass or endovascular intervention. The goal of these OPGs are to set standards of care from a revascularization standpoint and to provide performance benchmarks for 1 year patency rates for new endovascular therapies. While OPGs are useful in this regard, a critical decision point in the treatment of patients with CLTI is determining when revascularization is necessary. There is little guidance in the comprehensive treatment of this patient population, especially in the nonoperative cohort. Guidelines are needed for the CLTI patient population as a whole and not just those revascularized, and our aim was to assess whether CLTI OPGs could be attained with nonoperative management alone. METHODS Our cohort included patients with an incident diagnosis of CLTI (by hemodynamic and symptomatic criteria) at our institution from 2013-2017. The primary outcome measured was mortality. Secondary outcomes were limb loss and failure of amputation-free survival. Descriptive statistics were used to define the 2 groups - patients undergoing primary revascularization and patients undergoing primary wound management. The risk difference in outcomes between the 2 groups was estimated using collaborative-targeted maximum likelihood estimation. RESULTS Our cohort included 349 incident CLTI patients; 60% male, 51% white, mean age 63 +/- 13 years, 20% Rutherford 4, and 80% Rutherford 5. Most patients (277, 79%) underwent primary revascularization, and 72 (21%) were treated with wound care alone. Demographics and presenting characteristics were similar between groups. Although the revascularized patients were more likely to have femoropopliteal disease (72% vs. 36%), both groups had a high rate of infrapopliteal disease (62% vs. 57%). Not surprisingly, the patients in the revascularization group were less likely to have congestive heart failure (34% vs. 42%), complicated diabetes (52% vs. 79%), obesity (19% vs. 33%), and end stage renal disease (14% vs. 28%). In the wound care group, 2-year outcomes were 65% survival, 51% amputation free survival, 19% major limb amputation, and 17% major adverse cardiac event. The wound care cohort had a 13% greater risk of death at 2 years; however, the risk of limb loss at 2 years was 12% less in the wound care cohort. CONCLUSIONS A comprehensive set treatment goals and expected amputation free survival outcomes can guide revascularization, but also assure that appropriate outcomes are achieved for patients treated without revascularization. The 2-year outcomes achieved in this cohort provide an estimate of outcomes for nonrevascularized CLTI patients. Although multi-center or prospective studies are needed, we demonstrate that equal, even improved, limb salvage rates are possible.
Collapse
|
10
|
Franklin JM, Platt R, Dreyer NA, London AJ, Simon GE, Watanabe JH, Horberg M, Hernandez A, Califf RM. When Can Nonrandomized Studies Support Valid Inference Regarding Effectiveness or Safety of New Medical Treatments? Clin Pharmacol Ther 2021; 111:108-115. [PMID: 33826756 PMCID: PMC9291272 DOI: 10.1002/cpt.2255] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/25/2021] [Indexed: 12/28/2022]
Abstract
The randomized controlled trial (RCT) is the gold standard for evaluating the causal effects of medications. Limitations of RCTs have led to increasing interest in using real-world evidence (RWE) to augment RCT evidence and inform decision making on medications. Although RWE can be either randomized or nonrandomized, nonrandomized RWE can capitalize on the recent proliferation of large healthcare databases and can often answer questions that cannot be answered in randomized studies due to resource constraints. However, the results of nonrandomized studies are much more likely to be impacted by confounding bias, and the existence of unmeasured confounders can never be completely ruled out. Furthermore, nonrandomized studies require more complex design considerations which can sometimes result in design-related biases. We discuss questions that can help investigators or evidence consumers evaluate the potential impact of confounding or other biases on their findings: Does the design emulate a hypothetical randomized trial design? Is the comparator or control condition appropriate? Does the primary analysis adjust for measured confounders? Do sensitivity analyses quantify the potential impact of residual confounding? Are methods open to inspection and (if possible) replication? Designing a high-quality nonrandomized study of medications remains challenging and requires broad expertise across a range of disciplines, including relevant clinical areas, epidemiology, and biostatistics. The questions posed in this paper provide a guiding framework for assessing the credibility of nonrandomized RWE and could be applied across many clinical questions.
Collapse
Affiliation(s)
- Jessica M Franklin
- Optum Epidemiology, Boston, Massachusetts, USA.,Division of Pharmacoepidemiology & Pharmacoeconomics, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Richard Platt
- Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, Massachusetts, USA
| | - Nancy A Dreyer
- IQVIA Real World Solutions, Cambridge, Massachusetts, USA
| | - Alex John London
- Philosophy Department & Center for Ethics and Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Gregory E Simon
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Jonathan H Watanabe
- School of Pharmacy and Pharmaceutical Sciences, University of California Irvine, Irvine, California, USA
| | - Michael Horberg
- Kaiser Permanente Mid-Atlantic Permanente Research Institute and Mid-Atlantic Permanente Medical Group, Bethesda, Maryland, USA
| | | | - Robert M Califf
- Verily Life Sciences and Google Health, Cambridge, Massachusetts, USA
| |
Collapse
|
11
|
Benkeser D, Cai W, van der Laan MJ. A Nonparametric Super-Efficient Estimator of the Average Treatment Effect. Stat Sci 2020. [DOI: 10.1214/19-sts735] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Schnitzer ME. Comment: Increasing Real World Usage of Targeted Minimum Loss-Based Estimators. Stat Sci 2020. [DOI: 10.1214/20-sts770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Schnitzer ME, Sango J, Ferreira Guerra S, van der Laan MJ. Data-adaptive longitudinal model selection in causal inference with collaborative targeted minimum loss-based estimation. Biometrics 2019; 76:145-157. [PMID: 31397506 DOI: 10.1111/biom.13135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 07/22/2019] [Indexed: 11/26/2022]
Abstract
Causal inference methods have been developed for longitudinal observational study designs where confounding is thought to occur over time. In particular, one may estimate and contrast the population mean counterfactual outcome under specific exposure patterns. In such contexts, confounders of the longitudinal treatment-outcome association are generally identified using domain-specific knowledge. However, this may leave an analyst with a large set of potential confounders that may hinder estimation. Previous approaches to data-adaptive model selection for this type of causal parameter were limited to the single time-point setting. We develop a longitudinal extension of a collaborative targeted minimum loss-based estimation (C-TMLE) algorithm that can be applied to perform variable selection in the models for the probability of treatment with the goal of improving the estimation of the population mean counterfactual outcome under a fixed exposure pattern. We investigate the properties of this method through a simulation study, comparing it to G-Computation and inverse probability of treatment weighting. We then apply the method in a real-data example to evaluate the safety of trimester-specific exposure to inhaled corticosteroids during pregnancy in women with mild asthma. The data for this study were obtained from the linkage of electronic health databases in the province of Quebec, Canada. The C-TMLE covariate selection approach allowed for a reduction of the set of potential confounders, which included baseline and longitudinal variables.
Collapse
Affiliation(s)
| | - Joel Sango
- Statistics Canada, Ottawa, Ontario, Canada.,Department of Mathematics and Statistics, Université de Montréal, Montréal, Québec, Canada
| | | | - Mark J van der Laan
- Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, California
| |
Collapse
|
14
|
Ju C, Benkeser D, van der Laan MJ. Robust inference on the average treatment effect using the outcome highly adaptive lasso. Biometrics 2019; 76:109-118. [PMID: 31350906 DOI: 10.1111/biom.13121] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 07/16/2019] [Indexed: 12/01/2022]
Abstract
Many estimators of the average effect of a treatment on an outcome require estimation of the propensity score, the outcome regression, or both. It is often beneficial to utilize flexible techniques, such as semiparametric regression or machine learning, to estimate these quantities. However, optimal estimation of these regressions does not necessarily lead to optimal estimation of the average treatment effect, particularly in settings with strong instrumental variables. A recent proposal addressed these issues via the outcome-adaptive lasso, a penalized regression technique for estimating the propensity score that seeks to minimize the impact of instrumental variables on treatment effect estimators. However, a notable limitation of this approach is that its application is restricted to parametric models. We propose a more flexible alternative that we call the outcome highly adaptive lasso. We discuss the large sample theory for this estimator and propose closed-form confidence intervals based on the proposed estimator. We show via simulation that our method offers benefits over several popular approaches.
Collapse
Affiliation(s)
- Cheng Ju
- Division of Biostatistics, University of California, Berkeley, California
| | - David Benkeser
- Division of Biostatistics, Emory University, Atlanta, Georgia
| | | |
Collapse
|
15
|
Ju C, Wyss R, Franklin JM, Schneeweiss S, Häggström J, van der Laan MJ. Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data. Stat Methods Med Res 2019; 28:1044-1063. [PMID: 29226777 PMCID: PMC6039292 DOI: 10.1177/0962280217744588] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Propensity score-based estimators are increasingly used for causal inference in observational studies. However, model selection for propensity score estimation in high-dimensional data has received little attention. In these settings, propensity score models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a propensity score model. This "collaborative learning" considers variable associations with both treatment and outcome when selecting a propensity score model in order to minimize a bias-variance tradeoff in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for propensity score estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the propensity score model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the collaborative minimum loss-based estimation algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the propensity score model selected by collaborative minimum loss-based estimation could be applied to other propensity score-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.
Collapse
Affiliation(s)
- Cheng Ju
- Division of Biostatistics, University of California, USA
| | - Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School, USA
| | - Jessica M Franklin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School, USA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School, USA
| | | | | |
Collapse
|
16
|
Ju C, Combs M, Lendle SD, Franklin JM, Wyss R, Schneeweiss S, van der Laan MJ. Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods. J Appl Stat 2019; 46:2216-2236. [PMID: 32843815 PMCID: PMC7444746 DOI: 10.1080/02664763.2019.1582614] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 02/08/2019] [Indexed: 02/06/2023]
Abstract
The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.
Collapse
Affiliation(s)
- Cheng Ju
- Division of Biostatistics, University of California, Berkeley
| | - Mary Combs
- Division of Biostatistics, University of California, Berkeley
| | - Samuel D Lendle
- Division of Biostatistics, University of California, Berkeley
| | - Jessica M Franklin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School
| | - Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School
| | | |
Collapse
|
17
|
Gruber S, van der Laan MJ. Comment on “Automated Versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition”. Stat Sci 2019. [DOI: 10.1214/18-sts689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Ju C, Schwab J, van der Laan MJ. On adaptive propensity score truncation in causal inference. Stat Methods Med Res 2018; 28:1741-1760. [PMID: 29991330 DOI: 10.1177/0962280218774817] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of practical violations of the positivity assumption is extreme values in the estimated propensity score (PS). A common practice to address this issue is truncating the PS estimate when constructing PS-based estimators. In this study, we propose a novel adaptive truncation method, Positivity-C-TMLE, based on the collaborative targeted maximum likelihood estimation (C-TMLE) methodology. We demonstrate the outstanding performance of our novel approach in a variety of simulations by comparing it with other commonly studied estimators. Results show that by adaptively truncating the estimated PS with a more targeted objective function, the Positivity-C-TMLE estimator achieves the best performance for both point estimation and confidence interval coverage among all estimators considered.
Collapse
Affiliation(s)
- Cheng Ju
- Division of Biostatistics, University of California, Berkeley, CA, USA
| | - Joshua Schwab
- Division of Biostatistics, University of California, Berkeley, CA, USA
| | | |
Collapse
|
19
|
Schneeweiss S. Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects. Clin Epidemiol 2018; 10:771-788. [PMID: 30013400 PMCID: PMC6039060 DOI: 10.2147/clep.s166545] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Decision makers in health care increasingly rely on nonrandomized database analyses to assess the effectiveness, safety, and value of medical products. Health care data scientists use data-adaptive approaches that automatically optimize confounding control to study causal treatment effects. This article summarizes relevant experiences and extensions. METHODS The literature was reviewed on the uses of high-dimensional propensity score (HDPS) and related approaches for health care database analyses, including methodological articles on their performance and improvement. Articles were grouped into applications, comparative performance studies, and statistical simulation experiments. RESULTS The HDPS algorithm has been referenced frequently with a variety of clinical applications and data sources from around the world. The appeal of HDPS for database research rests in 1) its superior performance in situations of unobserved confounding through proxy adjustment, 2) its predictable efficiency in extracting confounding information from a given data source, 3) its ability to automate estimation of causal treatment effects to the extent achievable in a given data source, and 4) its independence of data source and coding system. Extensions of the HDPS approach have focused on improving variable selection when exposure is sparse, using free text information and time-varying confounding adjustment. CONCLUSION Semiautomated and optimized confounding adjustment in health care database analyses has proven successful across a wide range of settings. Machine-learning extensions further automate its use in estimating causal treatment effects across a range of data scenarios.
Collapse
Affiliation(s)
- Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital,
- Harvard Medical School, Boston, MA, USA,
| |
Collapse
|