1
|
Babaei H, Alemohammad S, Baraniuk RG. Covariate Balancing Methods for Randomized Controlled Trials Are Not Adversarially Robust. IEEE Trans Neural Netw Learn Syst 2024; 35:5014-5026. [PMID: 37104113 DOI: 10.1109/tnnls.2023.3266429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The first step toward investigating the effectiveness of a treatment via a randomized trial is to split the population into control and treatment groups then compare the average response of the treatment group receiving the treatment to the control group receiving the placebo. To ensure that the difference between the two groups is caused only by the treatment, it is crucial that the control and the treatment groups have similar statistics. Indeed, the validity and reliability of a trial are determined by the similarity of two groups' statistics. Covariate balancing methods increase the similarity between the distributions of the two groups' covariates. However, often in practice, there are not enough samples to accurately estimate the groups' covariate distributions. In this article, we empirically show that covariate balancing with the standardized means difference (SMD) covariate balancing measure, as well as Pocock and Simon's sequential treatment assignment method, are susceptible to worst case treatment assignments. Worst case treatment assignments are those admitted by the covariate balance measure, but result in highest possible ATE estimation errors. We developed an adversarial attack to find adversarial treatment assignment for any given trial. Then, we provide an index to measure how close the given trial is to the worst case. To this end, we provide an optimization-based algorithm, namely adversarial treatment assignment in treatment effect trials (ATASTREET), to find the adversarial treatment assignments.
Collapse
|
2
|
Dandl S, Bender A, Hothorn T. Heterogeneous treatment effect estimation for observational data using model-based forests. Stat Methods Med Res 2024; 33:392-413. [PMID: 38332489 PMCID: PMC10981193 DOI: 10.1177/09622802231224628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
The estimation of heterogeneous treatment effects has attracted considerable interest in many disciplines, most prominently in medicine and economics. Contemporary research has so far primarily focused on continuous and binary responses where heterogeneous treatment effects are traditionally estimated by a linear model, which allows the estimation of constant or heterogeneous effects even under certain model misspecifications. More complex models for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate the treatment effect. Most importantly, the noncollapsibility issue necessitates the joint estimation of treatment and prognostic effects. Model-based forests allow simultaneous estimation of covariate-dependent treatment and prognostic effects, but only for randomized trials. In this paper, we propose modifications to model-based forests to address the confounding issue in observational data. In particular, we evaluate an orthogonalization strategy originally proposed by Robinson (1988, Econometrica) in the context of model-based forests targeting heterogeneous treatment effect estimation in generalized linear models and transformation models. We found that this strategy reduces confounding effects in a simulated study with various outcome distributions. We demonstrate the practical aspects of heterogeneous treatment effect estimation for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect of Riluzole on the progress of Amyotrophic Lateral Sclerosis.
Collapse
Affiliation(s)
- Susanne Dandl
- Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany
- Munich Center for Machine Learning (MCML), Germany
| | - Andreas Bender
- Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany
- Munich Center for Machine Learning (MCML), Germany
| | - Torsten Hothorn
- Institut für Epidemiologie, Biostatistik und Prävention, Universität Zürich, Zurich, Switzerland
| |
Collapse
|
3
|
Esposti R. Non-monetary motivations of the EU agri-environmental policy adoption. A causal forest approach. J Environ Manage 2024; 352:119992. [PMID: 38194870 DOI: 10.1016/j.jenvman.2023.119992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/18/2023] [Accepted: 12/28/2023] [Indexed: 01/11/2024]
Abstract
This paper investigates the non-monetary motivations of farmers' adoption of agri-environmental policies. Unlike the monetary (income) motivations, non-monetary drivers can not be directly observed but can be identified from observational data within appropriate quasi-experimental designs. A theoretical justification of farmers' choices is first formulated and a consequent natural experiment setting is derived. The latter admits heterogeneous, i.e. Individual, Treatment Effects (ITE) that, in turn, can be interpreted in terms of more targeted and tailored policy expenditure. A Causal Forest (CF) approach is adopted to estimate these ITEs for both the treated and not treated units. The approach is applied to two balanced panel samples of Italian Farm Accountancy Data Network (FADN) farms observed over the 2008-2018 period and concerns agri-environmental policies delivered through the Common Agricultural Policy (CAP). Results show how heterogeneous the farmers' response and the associated non-monetary motivations can be, thus indicating room for a more efficient policy design.
Collapse
Affiliation(s)
- Roberto Esposti
- Department of Economics and Social Sciences - Università Politecnica Delle Marche, Piazzale Martelli 8, 60121, Ancona, Italy.
| |
Collapse
|
4
|
Endo Y, Alaimo L, Moazzam Z, Woldesenbet S, Lima HA, Munir MM, Shaikh CF, Yang J, Azap L, Katayama E, Guglielmi A, Ruzzenente A, Aldrighetti L, Alexandrescu S, Kitago M, Poultsides G, Sasaki K, Aucejo F, Pawlik TM. Postoperative morbidity after simultaneous versus staged resection of synchronous colorectal liver metastases: Impact of hepatic tumor burden. Surgery 2024; 175:432-440. [PMID: 38001013 DOI: 10.1016/j.surg.2023.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/09/2023] [Accepted: 10/25/2023] [Indexed: 11/26/2023]
Abstract
BACKGROUND We sought to characterize the risk of postoperative complications relative to the surgical approach and overall synchronous colorectal liver metastases tumor burden score. METHODS Patients with synchronous colorectal liver metastases who underwent curative-intent resection between 2000 and 2020 were identified from an international multi-institutional database. Propensity score matching was employed to control for heterogeneity between the 2 groups. A virtual twins analysis was performed to identify potential subgroups of patients who might benefit more from staged versus simultaneous resection. RESULTS Among 976 patients who underwent liver resection for synchronous colorectal liver metastases, 589 patients (60.3%) had a staged approach, whereas 387 (39.7%) patients underwent simultaneous resection of the primary tumor and synchronous colorectal liver metastases. After propensity score matching, 295 patients who underwent each surgical approach were analyzed. Overall, the incidence of postoperative complications was 34.1% (n = 201). Among patients with high tumor burden scores, the surgical approach was associated with a higher incidence of postoperative complications; in contrast, among patients with low or medium tumor burden scores, the likelihood of complications did not differ based on the surgical approach. Virtual twins analysis demonstrated that preoperative tumor burden score was important to identify which subgroup of patients benefited most from staged versus simultaneous resection. Simultaneous resection was associated with better outcomes among patients with a tumor burden score <9 and a node-negative right-sided primary tumor; in contrast, staged resection was associated with better outcomes among patients with node-positive left-sided primary tumors and higher tumor burden score. CONCLUSION Among patients with high tumor burden scores, simultaneous resection of the primary tumor and liver metastases was associated with an increased incidence of postoperative complications.
Collapse
Affiliation(s)
- Yutaka Endo
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Laura Alaimo
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH; Department of Surgery, University of Verona, Italy
| | - Zorays Moazzam
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Selamawit Woldesenbet
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Henrique A Lima
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Muhammad Musaab Munir
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Chanza F Shaikh
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Jason Yang
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Lovette Azap
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | - Erryk Katayama
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH
| | | | | | | | | | - Minoru Kitago
- Department of Surgery, Keio University, Tokyo, Japan
| | | | | | - Federico Aucejo
- Department of General Surgery, Cleveland Clinic Foundation, OH
| | - Timothy M Pawlik
- Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, OH.
| |
Collapse
|
5
|
Post RAJ, Petkovic M, van den Heuvel IL, van den Heuvel ER. Flexible Machine Learning Estimation of Conditional Average Treatment Effects: A Blessing and a Curse. Epidemiology 2024; 35:32-40. [PMID: 37889951 DOI: 10.1097/ede.0000000000001684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
Causal inference from observational data requires untestable identification assumptions. If these assumptions apply, machine learning methods can be used to study complex forms of causal effect heterogeneity. Recently, several machine learning methods were developed to estimate the conditional average treatment effect (ATE). If the features at hand cannot explain all heterogeneity, the individual treatment effects can seriously deviate from the conditional ATE. In this work, we demonstrate how the distributions of the individual treatment effect and the conditional ATE can differ when a causal random forest is applied. We extend the causal random forest to estimate the difference in conditional variance between treated and controls. If the distribution of the individual treatment effect equals that of the conditional ATE, this estimated difference in variance should be small. If they differ, an additional causal assumption is necessary to quantify the heterogeneity not captured by the distribution of the conditional ATE. The conditional variance of the individual treatment effect can be identified when the individual effect is independent of the outcome under no treatment given the measured features. Then, in the cases where the individual treatment effect and conditional ATE distributions differ, the extended causal random forest can appropriately estimate the variance of the individual treatment effect distribution, whereas the causal random forest fails to do so.
Collapse
Affiliation(s)
- Richard A J Post
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
| | - Marko Petkovic
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
| | - Isabel L van den Heuvel
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
| | - Edwin R van den Heuvel
- From the Department of Mathematics and Computer Science, Eindhoven University of Technology, the Netherlands
- Department of Preventive Medicine and Epidemiology, School of Medicine, Boston University, Boston, MA
| |
Collapse
|
6
|
Ferrario PG, Gedrich K. Machine learning and personalized nutrition: a promising liaison? Eur J Clin Nutr 2024; 78:74-76. [PMID: 37833568 PMCID: PMC10774117 DOI: 10.1038/s41430-023-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 09/12/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023]
Affiliation(s)
- Paola G Ferrario
- Department of Physiology and Biochemistry of Nutrition, Max Rubner-Institut, Karlsruhe, Germany.
| | - Kurt Gedrich
- Technical University of Munich, ZIEL - Institute for Food & Health, Research Group Public Health Nutrition, Freising, Germany
| |
Collapse
|
7
|
Hu L. A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection. Biom J 2024; 66:e2200178. [PMID: 38072661 PMCID: PMC10953775 DOI: 10.1002/bimj.202200178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/31/2023] [Accepted: 08/11/2023] [Indexed: 01/30/2024]
Abstract
We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey 08854
| |
Collapse
|
8
|
Zhang J, Zhang P, Ma J, Shentu Y. Covariate-adjusted value-guided subgroup identification via boosting. J Biopharm Stat 2023:1-18. [PMID: 37955423 DOI: 10.1080/10543406.2023.2275757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 10/22/2023] [Indexed: 11/14/2023]
Abstract
It is widely recognized that treatment effects could differ across subgroups of patients. Subgroup analysis, which assesses such heterogeneity, provides valuable information in developing personalized therapies. There has been extensive research developing novel statistical methods for subgroup identification. The recent contribution is a value-guided subgroup identification method that directly maximizes treatment benefit at the subgroup level for survival outcome, rather than relying on individual treatment effect estimation. In this paper, we first completed this framework by illustrating its application to continuous and binary outcomes. More importantly, we extended the original framework to account for the prognostic effects and named this new method Covariate-Adjusted Value-guided subgroup identification via boosting (CAVboost). The original method directly used the outcome to formulate the value function for subgroup identification. Since the outcome can further be decomposed as prognostic effects and treatment effects, specifying the prognostic effects as the covariates of a model for the outcome can single out the treatment effects and improve the power to detect them across subgroups. Our proposed CAVboost was based on this key idea. It used a covariate-adjusted treatment effect estimator, instead of the outcome itself, to formulate the value function for subgroup identification. CAVboost estimates the treatment effect by using covariates to account for the prognostic effects, which mimics the idea of using covariates in an ANCOVA estimator. We showed that CAVboost could effectively improve the subgroup identification capability for both continuous and binary outcomes.
Collapse
Affiliation(s)
| | - Pingye Zhang
- Gilead Sciences Inc, Foster City, California, USA
| | - Junshui Ma
- Merck & Co. MRL, BARDS, Rahway, New Jersey, USA
| | - Yue Shentu
- Merck & Co. MRL, BARDS, Rahway, New Jersey, USA
| |
Collapse
|
9
|
Johnson D, Lu W, Davidian M. A general framework for subgroup detection via one-step value difference estimation. Biometrics 2023; 79:2116-2126. [PMID: 35793474 PMCID: PMC10694635 DOI: 10.1111/biom.13711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 06/15/2022] [Indexed: 11/29/2022]
Abstract
Recent statistical methodology for precision medicine has focused on either identification of subgroups with enhanced treatment effects or estimating optimal treatment decision rules so that treatment is allocated in a way that maximizes, on average, predefined patient outcomes. Less attention has been given to subgroup testing, which involves evaluation of whether at least a subgroup of the population benefits from an investigative treatment, compared to some control or standard of care. In this work, we propose a general framework for testing for the existence of a subgroup with enhanced treatment effects based on the difference of the estimated value functions under an estimated optimal treatment regime and a fixed regime that assigns everyone to the same treatment. Our proposed test does not require specification of the parametric form of the subgroup and allows heterogeneous treatment effects within the subgroup. The test applies to cases when the outcome of interest is either a time-to-event or a (uncensored) scalar, and is valid at the exceptional law. To demonstrate the empirical performance of the proposed test, we study the type I error and power of the test statistics in simulations and also apply our test to data from a Phase III trial in patients with hematological malignancies.
Collapse
Affiliation(s)
- Dana Johnson
- United Therapeutics Corp., Research Triangle Park, Durham, North Carolina, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Marie Davidian
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| |
Collapse
|
10
|
Baird A, Cheng Y, Xia Y. Determinants of outpatient substance use disorder treatment length-of-stay and completion: the case of a treatment program in the southeast U.S. Sci Rep 2023; 13:13961. [PMID: 37633996 PMCID: PMC10460408 DOI: 10.1038/s41598-023-41350-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/24/2023] [Indexed: 08/28/2023] Open
Abstract
Successful outcomes of outpatient substance use disorder treatment result from many factors for clients-including intersections between individual characteristics, choices made, and social determinants. However, prioritizing which of these and in what combination, to address and provide support for remains an open and complex question. Therefore, we ask: What factors are associated with outpatient substance use disorder clients remaining in treatment for > 90 days and successfully completing treatment? To answer this question, we apply a virtual twins machine learning (ML) model to de-identified data for a census of clients who received outpatient substance use disorder treatment services from 2018 to 2021 from one treatment program in the Southeast U.S. We find that primary predictors of outcome success are: (1) attending self-help groups while in treatment, and (2) setting goals for treatment. Secondary predictors are: (1) being linked to a primary care provider (PCP) during treatment, (2) being linked to supplemental nutrition assistance program (SNAP), and (3) attending 6 or more self-help group sessions during treatment. These findings can help treatment programs guide client choice making and help set priorities for social determinant support. Further, the ML method applied can explain intersections between individual and social predictors, as well as outcome heterogeneity associated with subgroup differences.
Collapse
Affiliation(s)
- Aaron Baird
- Institute for Insight, Robinson College of Business, Georgia State University, 55 Park Place, Atlanta, GA, 30303, USA.
| | - Yichen Cheng
- Institute for Insight, Robinson College of Business, Georgia State University, 55 Park Place, Atlanta, GA, 30303, USA
| | - Yusen Xia
- Institute for Insight, Robinson College of Business, Georgia State University, 55 Park Place, Atlanta, GA, 30303, USA
| |
Collapse
|
11
|
Raja S, Rice TW, Lu M, Semple ME, Blackstone EH, Murthy SC, Ahmad U, McNamara M, Toth AJ, Hemant I. Adjuvant Therapy After Neoadjuvant Therapy for Esophageal Cancer: Who Needs It? Ann Surg 2023; 278:e240-e249. [PMID: 35997269 PMCID: PMC10955553 DOI: 10.1097/sla.0000000000005679] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE We hypothesized that, on average, patients do not benefit from additional adjuvant therapy after neoadjuvant therapy for locally advanced esophageal cancer, although subsets of patients might. Therefore, we sought to identify profiles of patients predicted to receive the most survival benefit or greatest detriment from adding adjuvant therapy. BACKGROUND Although neoadjuvant therapy has become the treatment of choice for locally advanced esophageal cancer, the value of adding adjuvant therapy is unknown. METHODS From 1970 to 2014, 22,123 patients were treated for esophageal cancer at 33 centers on 6 continents (Worldwide Esophageal Cancer Collaboration), of whom 7731 with adenocarcinoma or squamous cell carcinoma received neoadjuvant therapy; 1348 received additional adjuvant therapy. Random forests for survival and virtual-twin analyses were performed for all-cause mortality. RESULTS Patients received a small survival benefit from adjuvant therapy (3.2±10 months over the subsequent 10 years for adenocarcinoma, 1.8±11 for squamous cell carcinoma). Consistent benefit occurred in ypT3-4 patients without nodal involvement and those with ypN2-3 disease. The small subset of patients receiving most benefit had high nodal burden, ypT4, and positive margins. Patients with ypT1-2N0 cancers had either no benefit or a detriment in survival. CONCLUSIONS Adjuvant therapy after neoadjuvant therapy has value primarily for patients with more advanced esophageal cancer. Because the benefit is often small, patients considering adjuvant therapy should be counseled on benefits versus morbidity. In addition, given that the overall benefit was meaningful in a small number of patients, emerging modalities such as immunotherapy may hold more promise in the adjuvant setting.
Collapse
Affiliation(s)
- Siva Raja
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Thomas W. Rice
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Min Lu
- Department of Public Health Sciences, Division of Biostatistics, University of Miami, Miami, Florida
| | - Marie E. Semple
- Lerner Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Eugene H. Blackstone
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
- Lerner Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Sudish C. Murthy
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Usman Ahmad
- Heart, Vascular, and Thoracic Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Michael McNamara
- Taussig Cancer Institute, Department of Hematology and Medical Oncology, Cleveland Clinic, Cleveland, Ohio
| | - Andrew J. Toth
- Lerner Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Ishwaran Hemant
- Department of Public Health Sciences, Division of Biostatistics, University of Miami, Miami, Florida
| | | |
Collapse
|
12
|
Ghosh S, Feng Z, Bian J, Butler K, Prosperi M. DR-VIDAL - Doubly Robust Variational Information-theoretic Deep Adversarial Learning for Counterfactual Prediction and Treatment Effect Estimation on Real World Data. AMIA Annu Symp Proc 2023; 2022:485-494. [PMID: 37128454 PMCID: PMC10148269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Determining causal effects of interventions onto outcomes from real-world, observational (non-randomized) data, e.g., treatment repurposing using electronic health records, is challenging due to underlying bias. Causal deep learning has improved over traditional techniques for estimating individualized treatment effects (ITE). We present the Doubly Robust Variational Information-theoretic Deep Adversarial Learning (DR-VIDAL), a novel generative framework that combines two joint models of treatment and outcome, ensuring an unbiased ITE estimation even when one of the two is misspecified. DR-VIDAL integrates: (i) a variational autoencoder (VAE) to factorize confounders into latent variables according to causal assumptions; (ii) an information-theoretic generative adversarial network (Info-GAN) to generate counterfactuals; (iii) a doubly robust block incorporating treatment propensities for outcome predictions. On synthetic and real-world datasets (Infant Health and Development Program, Twin Birth Registry, and National Supported Work Program), DR-VIDAL achieves better performance than other non-generative and generative methods. In conclusion, DR-VIDAL uniquely fuses causal assumptions, VAE, Info-GAN, and doubly robustness into a comprehensive, per- formant framework. Code is available at: https://github.com/Shantanu48114860/DR-VIDAL-AMIA-22 under MIT license.
Collapse
|
13
|
Blette BS, Granholm A, Li F, Shankar-Hari M, Lange T, Munch MW, Møller MH, Perner A, Harhay MO. Causal Bayesian machine learning to assess treatment effect heterogeneity by dexamethasone dose for patients with COVID-19 and severe hypoxemia. Sci Rep 2023; 13:6570. [PMID: 37085591 PMCID: PMC10120498 DOI: 10.1038/s41598-023-33425-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open
Abstract
The currently recommended dose of dexamethasone for patients with severe or critical COVID-19 is 6 mg per day (mg/d) regardless of patient features and variation. However, patients with severe or critical COVID-19 are heterogenous in many ways (e.g., age, weight, comorbidities, disease severity, and immune features). Thus, it is conceivable that a standardized dosing protocol may not be optimal. We assessed treatment effect heterogeneity in the COVID STEROID 2 trial, which compared 6 mg/d to 12 mg/d, using a causal inference framework with Bayesian Additive Regression Trees, a flexible modeling method that detects interactive effects and nonlinear relationships among multiple patient characteristics simultaneously. We found that 12 mg/d of dexamethasone, relative to 6 mg/d, was probably associated with better long-term outcomes (days alive without life support and mortality after 90 days) among the entire trial population (i.e., no signals of harm), and probably more beneficial among those without diabetes mellitus, that were older, were not using IL-6 inhibitors at baseline, weighed less, or had higher level respiratory support at baseline. This adds more evidence supporting the use of 12 mg/d in practice for most patients not receiving other immunosuppressants and that additional study of dosing could potentially optimize clinical outcomes.
Collapse
Affiliation(s)
- Bryan S Blette
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anders Granholm
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Fan Li
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT, USA
- Center for Methods in Implementation and Prevention Science, Yale University School of Public Health, New Haven, CT, USA
| | - Manu Shankar-Hari
- Centre for Inflammation Research, University of Edinburgh, Edinburgh, UK
| | - Theis Lange
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Marie Warrer Munch
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Morten Hylander Møller
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Anders Perner
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Michael O Harhay
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Division of Pulmonary and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 304 Blockley Hall, 423 Guardian Drive, Philadelphia, PA, 19104-6021, USA.
| |
Collapse
|
14
|
Rekkas A, Rijnbeek PR, Kent DM, Steyerberg EW, van Klaveren D. Estimating individualized treatment effects from randomized controlled trials: a simulation study to compare risk-based approaches. BMC Med Res Methodol 2023; 23:74. [PMID: 36977990 PMCID: PMC10045909 DOI: 10.1186/s12874-023-01889-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 03/15/2023] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND Baseline outcome risk can be an important determinant of absolute treatment benefit and has been used in guidelines for "personalizing" medical decisions. We compared easily applicable risk-based methods for optimal prediction of individualized treatment effects. METHODS We simulated RCT data using diverse assumptions for the average treatment effect, a baseline prognostic index of risk, the shape of its interaction with treatment (none, linear, quadratic or non-monotonic), and the magnitude of treatment-related harms (none or constant independent of the prognostic index). We predicted absolute benefit using: models with a constant relative treatment effect; stratification in quarters of the prognostic index; models including a linear interaction of treatment with the prognostic index; models including an interaction of treatment with a restricted cubic spline transformation of the prognostic index; an adaptive approach using Akaike's Information Criterion. We evaluated predictive performance using root mean squared error and measures of discrimination and calibration for benefit. RESULTS The linear-interaction model displayed optimal or close-to-optimal performance across many simulation scenarios with moderate sample size (N = 4,250; ~ 785 events). The restricted cubic splines model was optimal for strong non-linear deviations from a constant treatment effect, particularly when sample size was larger (N = 17,000). The adaptive approach also required larger sample sizes. These findings were illustrated in the GUSTO-I trial. CONCLUSIONS An interaction between baseline risk and treatment assignment should be considered to improve treatment effect predictions.
Collapse
Affiliation(s)
- Alexandros Rekkas
- Department of Medical Informatics, Erasmus Medical Center, P.O. Box 2040, 3000, CA, Rotterdam, The Netherlands.
| | - Peter R Rijnbeek
- Department of Medical Informatics, Erasmus Medical Center, P.O. Box 2040, 3000, CA, Rotterdam, The Netherlands
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David van Klaveren
- Department of Public Health, Erasmus Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
15
|
Guo X, Wei W, Liu M, Cai T, Wu C, Wang J. Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data. J Am Stat Assoc 2023; 118:1488-1499. [PMID: 38223220 PMCID: PMC10786632 DOI: 10.1080/01621459.2022.2157727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 11/21/2022] [Indexed: 12/23/2022]
Abstract
There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset Type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for developing T2D after taking statins. In this case study, leveraging the biobank and electronic health record data in the Partner Health System, we introduce a new data analysis pipeline and a novel statistical methodology that address existing limitations by (i) designing a rigorous causal framework that systematically examines the causal effects of statin usage on T2D risk in observational data, (ii) uncovering which patient subgroup is most vulnerable for developing T2D after taking statins, and (iii) assessing the replicability and statistical significance of the most vulnerable subgroup via a bootstrap calibration procedure. Our proposed approach delivers asymptotically sharp confidence intervals and debiased estimate for the treatment effect of the most vulnerable subgroup in the presence of high-dimensional covariates. With our proposed approach, we find that females with high T2D genetic risk are at the highest risk of developing T2D due to statin usage.
Collapse
Affiliation(s)
- Xinzhou Guo
- Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
| | - Waverly Wei
- Division of Biostatistics, UC Berkeley, Berkeley, CA
| | - Molei Liu
- Department of Biostatistics, Columbia Mailman School of Public Health, New York, NY
| | - Tianxi Cai
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Chong Wu
- Department of Biostatistics, MD Anderson Cancer Center, Houston, TX
| | - Jingshen Wang
- Division of Biostatistics, UC Berkeley, Berkeley, CA
| |
Collapse
|
16
|
Hapfelmeier A, On BI, Mühlau M, Kirschke JS, Berthele A, Gasperi C, Mansmann U, Wuschek A, Bussas M, Boeker M, Bayas A, Senel M, Havla J, Kowarik MC, Kuhn K, Gatz I, Spengler H, Wiestler B, Grundl L, Sepp D, Hemmer B. Retrospective cohort study to devise a treatment decision score predicting adverse 24-month radiological activity in early multiple sclerosis. Ther Adv Neurol Disord 2023; 16:17562864231161892. [PMID: 36993939 PMCID: PMC10041597 DOI: 10.1177/17562864231161892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 02/19/2023] [Indexed: 03/31/2023] Open
Abstract
Background Multiple sclerosis (MS) is a chronic neuroinflammatory disease affecting about 2.8 million people worldwide. Disease course after the most common diagnoses of relapsing-remitting multiple sclerosis (RRMS) and clinically isolated syndrome (CIS) is highly variable and cannot be reliably predicted. This impairs early personalized treatment decisions. Objectives The main objective of this study was to algorithmically support clinical decision-making regarding the options of early platform medication or no immediate treatment of patients with early RRMS and CIS. Design Retrospective monocentric cohort study within the Data Integration for Future Medicine (DIFUTURE) Consortium. Methods Multiple data sources of routine clinical, imaging and laboratory data derived from a large and deeply characterized cohort of patients with MS were integrated to conduct a retrospective study to create and internally validate a treatment decision score [Multiple Sclerosis Treatment Decision Score (MS-TDS)] through model-based random forests (RFs). The MS-TDS predicts the probability of no new or enlarging lesions in cerebral magnetic resonance images (cMRIs) between 6 and 24 months after the first cMRI. Results Data from 65 predictors collected for 475 patients between 2008 and 2017 were included. No medication and platform medication were administered to 277 (58.3%) and 198 (41.7%) patients. The MS-TDS predicted individual outcomes with a cross-validated area under the receiver operating characteristics curve (AUROC) of 0.624. The respective RF prediction model provides patient-specific MS-TDS and probabilities of treatment success. The latter may increase by 5-20% for half of the patients if the treatment considered superior by the MS-TDS is used. Conclusion Routine clinical data from multiple sources can be successfully integrated to build prediction models to support treatment decision-making. In this study, the resulting MS-TDS estimates individualized treatment success probabilities that can identify patients who benefit from early platform medication. External validation of the MS-TDS is required, and a prospective study is currently being conducted. In addition, the clinical relevance of the MS-TDS needs to be established.
Collapse
Affiliation(s)
| | - Begum Irmak On
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität in Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Mark Mühlau
- Department of Neurology, Klinikum rechts der Isar School of Medicine, Technical University of Munich, Munich, Germany
| | - Jan S. Kirschke
- Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Achim Berthele
- Department of Neurology, Klinikum rechts der Isar School of Medicine, Technical University of Munich, Munich, Germany
| | - Christiane Gasperi
- Department of Neurology, Klinikum rechts der Isar School of Medicine, Technical University of Munich, Munich, Germany
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität in Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Alexander Wuschek
- Department of Neurology, Klinikum rechts der Isar School of Medicine, Technical University of Munich, Munich, Germany
| | - Matthias Bussas
- Department of Neurology, Klinikum rechts der Isar School of Medicine, Technical University of Munich, Munich, Germany
| | - Martin Boeker
- Institute of AI and Informatics in Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Antonios Bayas
- Department of Neurology, Medical Faculty, University of Augsburg, Augsburg, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Makbule Senel
- Department of Neurology, Ulm University Hospital, Ulm, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Joachim Havla
- Institute of Clinical Neuroimmunology, LMU Hospital, Ludwig-Maximilians-Universität in Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Markus C. Kowarik
- Department of Neurology & Stroke and Hertie-Institute for Clinical Brain Research, Eberhard-Karls University of Tübingen, Tübingen, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Klaus Kuhn
- Institute of AI and Informatics in Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Ingrid Gatz
- Institute of AI and Informatics in Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Helmut Spengler
- Institute of AI and Informatics in Medicine, School of Medicine, Technical University of Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
| | - Benedikt Wiestler
- Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Lioba Grundl
- Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Dominik Sepp
- Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Bernhard Hemmer
- Department of Neurology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
- Data Integration for Future Medicine (DIFUTURE) Consortium, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
| |
Collapse
|
17
|
Xu J, Wei K, Wang C, Huang C, Xue Y, Zhang R, Qin G, Yu Y. Estimation of average treatment effect based on a multi-index propensity score. BMC Med Res Methodol 2022; 22:337. [PMID: 36577950 PMCID: PMC9795597 DOI: 10.1186/s12874-022-01822-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/16/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Estimating the average effect of a treatment, exposure, or intervention on health outcomes is a primary aim of many medical studies. However, unbalanced covariates between groups can lead to confounding bias when using observational data to estimate the average treatment effect (ATE). In this study, we proposed an estimator to correct confounding bias and provide multiple protection for estimation consistency. METHODS With reference to the kernel function-based double-index propensity score (Ker.DiPS) estimator, we proposed the artificial neural network-based multi-index propensity score (ANN.MiPS) estimator. The ANN.MiPS estimator employed the artificial neural network to estimate the MiPS that combines the information from multiple candidate models for propensity score and outcome regression. A Monte Carlo simulation study was designed to evaluate the performance of the proposed ANN.MiPS estimator. Furthermore, we applied our estimator to real data to discuss its practicability. RESULTS The simulation study showed the bias of the ANN.MiPS estimators is very small and the standard error is similar if any one of the candidate models is correctly specified under all evaluated sample sizes, treatment rates, and covariate types. Compared to the kernel function-based estimator, the ANN.MiPS estimator usually yields smaller standard error when the correct model is incorporated in the estimator. The empirical study indicated the point estimation for ATE and its bootstrap standard error of the ANN.MiPS estimator is stable under different model specifications. CONCLUSIONS The proposed estimator extended the combination of information from two models to multiple models and achieved multiply robust estimation for ATE. Extra efficiency was gained by our estimator compared to the kernel-based estimator. The proposed estimator provided a novel approach for estimating the causal effects in observational studies.
Collapse
Affiliation(s)
- Jiaqin Xu
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Kecheng Wei
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Ce Wang
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Chen Huang
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Yaxin Xue
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Rui Zhang
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Guoyou Qin
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443Key Laboratory of Public Health Safety of Ministry of Education, Fudan University, Shanghai, China ,Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
| | - Yongfu Yu
- grid.8547.e0000 0001 0125 2443Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443Key Laboratory of Public Health Safety of Ministry of Education, Fudan University, Shanghai, China ,Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
| |
Collapse
|
18
|
Hu L, Ji J, Liu H, Ennis R. A Flexible Approach for Assessing Heterogeneity of Causal Treatment Effects on Patient Survival Using Large Datasets with Clustered Observations. Int J Environ Res Public Health 2022; 19:14903. [PMID: 36429621 PMCID: PMC9690785 DOI: 10.3390/ijerph192214903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/08/2022] [Accepted: 11/09/2022] [Indexed: 06/16/2023]
Abstract
Personalized medicine requires an understanding of treatment effect heterogeneity. Evolving toward causal evidence for scenarios not studied in randomized trials necessitates a methodology using real-world evidence. Herein, we demonstrate a methodology that generates causal effects, assesses the heterogeneity of the effects and adjusts for the clustered nature of the data. This study uses a state-of-the-art machine learning survival model, riAFT-BART, to draw causal inferences about individual survival treatment effects, while accounting for the variability in institutional effects; further, it proposes a data-driven approach to agnostically (as opposed to a priori hypotheses) ascertain which subgroups exhibit an enhanced treatment effect from which intervention, relative to global evidence-average treatment effects measured at the population level. Comprehensive simulations show the advantages of the proposed method in terms of bias, efficiency and precision in estimating heterogeneous causal effects. The empirically validated method was then used to analyze the National Cancer Database.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
| | - Jiayi Ji
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
| | - Hao Liu
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
- Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 07102, USA
| | - Ronald Ennis
- Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 07102, USA
- Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ 07102, USA
| |
Collapse
|
19
|
Baird A, Cheng Y, Xia Y. Use of machine learning to examine disparities in completion of substance use disorder treatment. PLoS One 2022; 17:e0275054. [PMID: 36149868 PMCID: PMC9506659 DOI: 10.1371/journal.pone.0275054] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 09/11/2022] [Indexed: 11/19/2022] Open
Abstract
The objective of this work is to examine disparities in the completion of substance use disorder treatment in the U.S. Our data is from the Treatment Episode Dataset Discharge (TEDS-D) datasets from the U.S. Substance Abuse and Mental Health Services Administration (SAMHSA) for 2017–2019. We apply a two-stage virtual twins model (random forest + decision tree) where, in the first stage (random forest), we determine differences in treatment completion probability associated with race/ethnicity, income source, no co-occurrence of mental health disorders, gender (biological), no health insurance, veteran status, age, and primary substance (alcohol or opioid). In the second stage (decision tree), we identify subgroups associated with probability differences, where such subgroups are more or less likely to complete treatment. We find the subgroups most likely to complete substance use disorder treatment, when the subgroup represents more than 1% of the sample, are those with no mental health condition co-occurrence (4.8% more likely when discharged from an ambulatory outpatient treatment program, representing 62% of the sample; and 10% more likely for one of the more specifically defined subgroups representing 10% of the sample), an income source of job-related wages/salary (4.3% more likely when not having used in the 30 days primary to discharge and when primary substance is not alcohol only, representing 28% of the sample), and white non-Hispanics (2.7% more likely when discharged from residential long-term treatment, representing 9% of the sample). Important implications are that: 1) those without a co-occurring mental health condition are the most likely to complete treatment, 2) those with job related wages or income are more likely to complete treatment, and 3) racial/ethnicity disparities persist in favor of white non-Hispanic individuals seeking to complete treatment. Thus, additional resources may be needed to combat such disparities.
Collapse
Affiliation(s)
- Aaron Baird
- Institute of Health Administration, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America
- * E-mail:
| | - Yichen Cheng
- Institute for Insight, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America
| | - Yusen Xia
- Institute for Insight, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America
| |
Collapse
|
20
|
Rostami M, Saarela O. Targeted L1-Regularization and Joint Modeling of Neural Networks for Causal Inference. Entropy (Basel) 2022; 24:1290. [PMID: 36141175 PMCID: PMC9497603 DOI: 10.3390/e24091290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/07/2022] [Accepted: 09/08/2022] [Indexed: 06/16/2023]
Abstract
The calculation of the Augmented Inverse Probability Weighting (AIPW) estimator of the Average Treatment Effect (ATE) is carried out in two steps, where in the first step, the treatment and outcome are modeled, and in the second step, the predictions are inserted into the AIPW estimator. The model misspecification in the first step has led researchers to utilize Machine Learning algorithms instead of parametric algorithms. However, the existence of strong confounders and/or Instrumental Variables (IVs) can lead the complex ML algorithms to provide perfect predictions for the treatment model which can violate the positivity assumption and elevate the variance of AIPW estimators. Thus the complexity of ML algorithms must be controlled to avoid perfect predictions for the treatment model while still learning the relationship between the confounders and the treatment and outcome. We use two NN architectures with an L1-regularization on specific NN parameters and investigate how their certain hyperparameters should be tuned in the presence of confounders and IVs to achieve a low bias-variance tradeoff for ATE estimators such as AIPW estimator. Through simulation results, we will provide recommendations as to how NNs can be employed for ATE estimation.
Collapse
|
21
|
Shi J, Norgeot B. Learning Causal Effects From Observational Data in Healthcare: A Review and Summary. Front Med (Lausanne) 2022; 9:864882. [PMID: 35872797 PMCID: PMC9300826 DOI: 10.3389/fmed.2022.864882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 06/17/2022] [Indexed: 11/29/2022] Open
Abstract
Causal inference is a broad field that seeks to build and apply models that learn the effect of interventions on outcomes using many data types. While the field has existed for decades, its potential to impact healthcare outcomes has increased dramatically recently due to both advancements in machine learning and the unprecedented amounts of observational data resulting from electronic capture of patient claims data by medical insurance companies and widespread adoption of electronic health records (EHR) worldwide. However, there are many different schools of learning causality coming from different fields of statistics, some of them strongly conflicting. While the recent advances in machine learning greatly enhanced causal inference from a modeling perspective, it further exacerbated the fractured state in this field. This fractured state has limited research at the intersection of causal inference, modern machine learning, and EHRs that could potentially transform healthcare. In this paper we unify the classical causal inference approaches with new machine learning developments into a straightforward framework based on whether the researcher is most interested in finding the best intervention for an individual, a group of similar people, or an entire population. Through this lens, we then provide a timely review of the applications of causal inference in healthcare from the literature. As expected, we found that applications of causal inference in medicine were mostly limited to just a few technique types and lag behind other domains. In light of this gap, we offer a helpful schematic to guide data scientists and healthcare stakeholders in selecting appropriate causal methods and reviewing the findings generated by them.
Collapse
|
22
|
Cai H, Lu W, Marceau West R, Mehrotra DV, Huang L. CAPITAL: Optimal subgroup identification via constrained policy tree search. Stat Med 2022; 41:4227-4244. [PMID: 35799329 PMCID: PMC9544117 DOI: 10.1002/sim.9507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 05/04/2022] [Accepted: 06/06/2022] [Indexed: 11/10/2022]
Abstract
Personalized medicine, a paradigm of medicine tailored to a patient's characteristics, is an increasingly attractive field in health care. An important goal of personalized medicine is to identify a subgroup of patients, based on baseline covariates, that benefits more from the targeted treatment than other comparative treatments. Most of the current subgroup identification methods only focus on obtaining a subgroup with an enhanced treatment effect without paying attention to subgroup size. Yet, a clinically meaningful subgroup learning approach should identify the maximum number of patients who can benefit from the better treatment. In this article, we present an optimal subgroup selection rule (SSR) that maximizes the number of selected patients, and in the meantime, achieves the pre‐specified clinically meaningful mean outcome, such as the average treatment effect. We derive two equivalent theoretical forms of the optimal SSR based on the contrast function that describes the treatment‐covariates interaction in the outcome. We further propose a constrained policy tree search algorithm (CAPITAL) to find the optimal SSR within the interpretable decision tree class. The proposed method is flexible to handle multiple constraints that penalize the inclusion of patients with negative treatment effects, and to address time to event data using the restricted mean survival time as the clinically interesting mean outcome. Extensive simulations, comparison studies, and real data applications are conducted to demonstrate the validity and utility of our method.
Collapse
Affiliation(s)
- Hengrui Cai
- Department of Statistics, University of California Irvine, Irvine, California, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Rachel Marceau West
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, USA
| | - Lingkang Huang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| |
Collapse
|
23
|
Xu J, Guo Y, Wang F, Xu H, Lucero R, Bian J, Prosperi M. Protocol for the development of a reporting guideline for causal and counterfactual prediction models in biomedicine. BMJ Open 2022; 12:e059715. [PMID: 35725267 PMCID: PMC9214357 DOI: 10.1136/bmjopen-2021-059715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
INTRODUCTION While there are guidelines for reporting on observational studies (eg, Strengthening the Reporting of Observational Studies in Epidemiology, Reporting of Studies Conducted Using Observational Routinely Collected Health Data Statement), estimation of causal effects from both observational data and randomised experiments (eg, A Guideline for Reporting Mediation Analyses of Randomised Trials and Observational Studies, Consolidated Standards of Reporting Trials, PATH) and on prediction modelling (eg, Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis), none is purposely made for deriving and validating models from observational data to predict counterfactuals for individuals on one or more possible interventions, on the basis of given (or inferred) causal structures. This paper describes methods and processes that will be used to develop a Reporting Guideline for Causal and Counterfactual Prediction Models (PRECOG). METHODS AND ANALYSIS PRECOG will be developed following published guidance from the Enhancing the Quality and Transparency of Health Research (EQUATOR) network and will comprise five stages. Stage 1 will be meetings of a working group every other week with rotating external advisors (active until stage 5). Stage 2 will comprise a systematic review of literature on counterfactual prediction modelling for biomedical sciences (registered in Prospective Register of Systematic Reviews). In stage 3, a computer-based, real-time Delphi survey will be performed to consolidate the PRECOG checklist, involving experts in causal inference, epidemiology, statistics, machine learning, informatics and protocols/standards. Stage 4 will involve the write-up of the PRECOG guideline based on the results from the prior stages. Stage 5 will seek the peer-reviewed publication of the guideline, the scoping/systematic review and dissemination. ETHICS AND DISSEMINATION The study will follow the principles of the Declaration of Helsinki. The study has been registered in EQUATOR and approved by the University of Florida's Institutional Review Board (#202200495). Informed consent will be obtained from the working groups and the Delphi survey participants. The dissemination of PRECOG and its products will be done through journal publications, conferences, websites and social media.
Collapse
Affiliation(s)
- Jie Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York City, New York, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, Texas, USA
| | - Robert Lucero
- School of Nursing, University of California - Los Angeles, Los Angeles, California, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
24
|
Prosperi M, Boucher C, Bian J, Marini S. Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions. Artif Intell Med 2022; 130:102326. [PMID: 35809965 PMCID: PMC9425730 DOI: 10.1016/j.artmed.2022.102326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/11/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022]
Abstract
Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -under an explicit set of causal assumptions- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on antibiotic resistance prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e., DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n = 1085), where AUROCs do not decrease. We observe a 1 %-5 % gain in AUROC with bias-handling compared to the sole use of genetic signatures. In conclusion, we recommend using causally-informed prediction methods for modeling real-world AMR data; however, traditional adjustment or propensity-based methods may not provide advantage in all use cases and further methodological development should be sought.
Collapse
|
25
|
Caron A, Baio G, Manolopoulou I. Shrinkage Bayesian Causal Forests for Heterogeneous Treatment Effects Estimation*. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2067549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Alberto Caron
- Department of Statistical Science, University College London
| | - Gianluca Baio
- Department of Statistical Science, University College London
| | | |
Collapse
|
26
|
Zhou N, Brook RD, Dinov ID, Wang L. Optimal dynamic treatment regime estimation using information extraction from unstructured clinical text. Biom J 2022; 64:805-817. [PMID: 35112726 PMCID: PMC9185731 DOI: 10.1002/bimj.202100077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 10/18/2021] [Accepted: 10/21/2021] [Indexed: 11/10/2022]
Abstract
The wide-scale adoption of electronic health records (EHRs) provides extensive information to support precision medicine and personalized health care. In addition to structured EHRs, we leverage free-text clinical information extraction (IE) techniques to estimate optimal dynamic treatment regimes (DTRs), a sequence of decision rules that dictate how to individualize treatments to patients based on treatment and covariate history. The proposed IE of patient characteristics closely resembles "The clinical Text Analysis and Knowledge Extraction System" and employs named entity recognition, boundary detection, and negation annotation. It also utilizes regular expressions to extract numerical information. Combining the proposed IE with optimal DTR estimation, we extract derived patient characteristics and use tree-based reinforcement learning (T-RL) to estimate multistage optimal DTRs. IE significantly improved the estimation in counterfactual outcome models compared to using structured EHR data alone, which often include incomplete data, data entry errors, and other potentially unobserved risk factors. Moreover, including IE in optimal DTR estimation provides larger study cohorts and a broader pool of candidate tailoring variables. We demonstrate the performance of our proposed method via simulations and an application using clinical records to guide blood pressure control treatments among critically ill patients with severe acute hypertension. This joint estimation approach improves the accuracy of identifying the optimal treatment sequence by 14-24% compared to traditional inference without using IE, based on our simulations over various scenarios. In the blood pressure control application, we successfully extracted significant blood pressure predictors that are unobserved or partially missing from structured EHR.
Collapse
Affiliation(s)
- Nina Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.,Statistics Online Computational Resource, University of Michigan, Ann Arbor, MI, USA
| | - Robert D Brook
- Division of Cardiovascular Diseases, School of Medicine, Wayne State University, Detroit, MI, USA
| | - Ivo D Dinov
- Statistics Online Computational Resource, University of Michigan, Ann Arbor, MI, USA
| | - Lu Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
27
|
Zhou T, Ji Y. Incorporating external data into the analysis of clinical trials via Bayesian additive regression trees. Stat Med 2021; 40:6421-6442. [PMID: 34494288 DOI: 10.1002/sim.9191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/18/2021] [Accepted: 08/21/2021] [Indexed: 11/06/2022]
Abstract
Most clinical trials involve the comparison of a new treatment to a control arm (eg, the standard of care) and the estimation of a treatment effect. External data, including historical clinical trial data and real-world observational data, are commonly available for the control arm. With proper statistical adjustments, borrowing information from external data can potentially reduce the mean squared errors of treatment effect estimates and increase the power of detecting a meaningful treatment effect. In this article, we propose to use Bayesian additive regression trees (BART) for incorporating external data into the analysis of clinical trials, with a specific goal of estimating the conditional or population average treatment effect. BART naturally adjusts for patient-level covariates and captures potentially heterogeneous treatment effects across different data sources, achieving flexible borrowing. Simulation studies demonstrate that BART maintains desirable and robust performance across a variety of scenarios and compares favorably to alternatives. We illustrate the proposed method with an acupuncture trial and a colorectal cancer trial.
Collapse
Affiliation(s)
- Tianjian Zhou
- Department of Statistics, Colorado State University, Fort Collins, Colorado, USA
| | - Yuan Ji
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
28
|
Zhong Y, Kennedy EH, Bodnar LM, Naimi AI. AIPW: An R Package for Augmented Inverse Probability-Weighted Estimation of Average Causal Effects. Am J Epidemiol 2021; 190:2690-2699. [PMID: 34268567 PMCID: PMC8796813 DOI: 10.1093/aje/kwab207] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 07/09/2021] [Accepted: 07/13/2021] [Indexed: 12/26/2022] Open
Abstract
An increasing number of recent studies have suggested that doubly robust estimators with cross-fitting should be used when estimating causal effects with machine learning methods. However, not all existing programs that implement doubly robust estimators support machine learning methods and cross-fitting, or provide estimates on multiplicative scales. To address these needs, we developed AIPW, a software package implementing augmented inverse probability weighting (AIPW) estimation of average causal effects in R (R Foundation for Statistical Computing, Vienna, Austria). Key features of the AIPW package include cross-fitting and flexible covariate adjustment for observational studies and randomized controlled trials (RCTs). In this paper, we use a simulated RCT to illustrate implementation of the AIPW estimator. We also perform a simulation study to evaluate the performance of the AIPW package compared with other doubly robust implementations, including CausalGAM, npcausal, tmle, and tmle3. Our simulation showed that the AIPW package yields performance comparable to that of other programs. Furthermore, we also found that cross-fitting substantively decreases the bias and improves the confidence interval coverage for doubly robust estimators fitted with machine learning algorithms. Our findings suggest that the AIPW package can be a useful tool for estimating average causal effects with machine learning methods in RCTs and observational studies.
Collapse
Affiliation(s)
| | | | | | - Ashley I Naimi
- Correspondence to Dr. Ashley I. Naimi, Department of Epidemiology, Rollins School of Public Health, Emory University, 1518 Clifton Road, Atlanta, GA 30322 (e-mail: )
| |
Collapse
|
29
|
Hoogland J, IntHout J, Belias M, Rovers MM, Riley RD, E. Harrell Jr F, Moons KGM, Debray TPA, Reitsma JB. A tutorial on individualized treatment effect prediction from randomized trials with a binary endpoint. Stat Med 2021; 40:5961-5981. [PMID: 34402094 PMCID: PMC9291969 DOI: 10.1002/sim.9154] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 06/08/2021] [Accepted: 07/19/2021] [Indexed: 12/23/2022]
Abstract
Randomized trials typically estimate average relative treatment effects, but decisions on the benefit of a treatment are possibly better informed by more individualized predictions of the absolute treatment effect. In case of a binary outcome, these predictions of absolute individualized treatment effect require knowledge of the individual's risk without treatment and incorporation of a possibly differential treatment effect (ie, varying with patient characteristics). In this article, we lay out the causal structure of individualized treatment effect in terms of potential outcomes and describe the required assumptions that underlie a causal interpretation of its prediction. Subsequently, we describe regression models and model estimation techniques that can be used to move from average to more individualized treatment effect predictions. We focus mainly on logistic regression-based methods that are both well-known and naturally provide the required probabilistic estimates. We incorporate key components from both causal inference and prediction research to arrive at individualized treatment effect predictions. While the separate components are well known, their successful amalgamation is very much an ongoing field of research. We cut the problem down to its essentials in the setting of a randomized trial, discuss the importance of a clear definition of the estimand of interest, provide insight into the required assumptions, and give guidance with respect to modeling and estimation options. Simulated data illustrate the potential of different modeling options across scenarios that vary both average treatment effect and treatment effect heterogeneity. Two applied examples illustrate individualized treatment effect prediction in randomized trial data.
Collapse
Affiliation(s)
- Jeroen Hoogland
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| | - Joanna IntHout
- Radboud Institute for Health Sciences (RIHS)Radboud University Medical CenterNijmegenthe Netherlands
| | - Michail Belias
- Radboud Institute for Health Sciences (RIHS)Radboud University Medical CenterNijmegenthe Netherlands
| | - Maroeska M. Rovers
- Radboud Institute for Health Sciences (RIHS)Radboud University Medical CenterNijmegenthe Netherlands
| | | | - Frank E. Harrell Jr
- Department of BiostatisticsVanderbilt University School of MedicineNashvilleTennesseeUSA
| | - Karel G. M. Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
- Cochrane Netherlands, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| | - Thomas P. A. Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
- Cochrane Netherlands, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| | - Johannes B. Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
- Cochrane Netherlands, University Medical Center UtrechtUtrecht UniversityUtrechtthe Netherlands
| |
Collapse
|
30
|
Hatch SG, Lobaina D, Doss BD. Optimizing Coaching During Web-Based Relationship Education for Low-Income Couples: Protocol for Precision Medicine Research. JMIR Res Protoc 2021; 10:e33047. [PMID: 34734838 PMCID: PMC8603166 DOI: 10.2196/33047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 09/03/2021] [Indexed: 11/29/2022] Open
Abstract
Background In-person relationship education classes funded by the federal government tend to experience relatively high attrition rates and have only a limited effect on relationships. In contrast, low-income couples tend to report meaningful gains from web-based relationship education when provided with individualized coach contact. However, little is known about the method and intensity of practitioner contact that a couple requires to complete the web-based program and receive the intended benefit. Objective The aim of this study is to use within-group models to create an algorithm to assign future couples to different programs and levels of coach contact, identify the most powerful predictors of treatment adherence and gains in relationship satisfaction within 3 different levels of coaching, and examine the most powerful predictors of treatment adherence and gains in relationship satisfaction among the 3 levels of coach contact. Methods To accomplish these goals, this project intends to use data from a web-based Sequential Multiple Assignment Randomized Trial of the OurRelationship and web-based Prevention and Relationship Enhancement programs, in which the method and type of coach contact were randomly varied across 1248 couples (2496 individuals), with the hope of advancing theory in this area and generating accurate predictions. This study was funded by the US Department of Health and Human Services, Administration for Children and Families (grant number 90PD0309). Results Data collection from the Sequential Multiple Assignment Randomized Trial of the OurRelationship and web-based Prevention and Relationship Enhancement Program was completed in October of 2020. Conclusions Some of the direct benefits of this study include benefits to social services program administrators, tailoring of more effective relationship education, and effective delivery of evidence- and web-based relationship health interventions. International Registered Report Identifier (IRRID) DERR1-10.2196/33047
Collapse
Affiliation(s)
- S Gabe Hatch
- Department of Psychology, University of Miami, Coral Gables, FL, United States
| | - Diana Lobaina
- Department of Psychology, University of Miami, Coral Gables, FL, United States
| | - Brian D Doss
- Department of Psychology, University of Miami, Coral Gables, FL, United States
| |
Collapse
|
31
|
Raja S, Rice TW, Murthy SC, Ahmad U, Semple ME, Blackstone EH, Ishwaran H. Value of Lymphadenectomy in Patients Receiving Neoadjuvant Therapy for Esophageal Adenocarcinoma. Ann Surg 2021; 274:e320-e327. [PMID: 31850981 PMCID: PMC7295683 DOI: 10.1097/sla.0000000000003598] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
OBJECTIVE The aim of this study was to assess the effect on survival of extent of lymphadenectomy during esophagectomy for patients undergoing multimodality (neoadjuvant) therapy for adenocarcinoma of the esophagus and esophagogastric junction using Worldwide Esophageal Cancer Collaboration data. SUMMARY BACKGROUND DATA Previous worldwide data demonstrated that optimum lymphadenectomy during esophagectomy alone for esophageal cancer provides accurate staging and maximum survival. However, for patients undergoing neoadjuvant therapy for locally advanced adenocarcinoma, its value is unclear, leading to wide practice variability. METHODS A total of 3859 patients with adenocarcinoma of the esophagus or esophagogastric junction received neoadjuvant therapy. The endpoint was all-cause mortality, reported as gain or loss of lifetime within 10 years. Lifetime predicted for each regional lymph node resected used quantile survival random forest methodology. RESULTS Across all post-neoadjuvant ypTNM cancer categories, some degree of lymphadenectomy was associated with longer lifetime, but in a nonlinear fashion. For patients with ypN0 cancers, there was a modest gain in lifetime up to 25 lymph nodes resected and an incremental loss in lifetime as >25 were resected. For patients with ypN+ cancers, there was a robust gain in lifetime up to 30 lymph nodes resected and then an incremental loss in lifetime. CONCLUSIONS Worldwide data for adenocarcinoma of the esophagus and esophagogastric junction demonstrate that lymphadenectomy during esophagectomy is a valuable component of neoadjuvant therapy. Survival is maximized when an optimum range of nodes is resected.
Collapse
Affiliation(s)
- Siva Raja
- Heart and Vascular Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Thomas W. Rice
- Heart and Vascular Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Sudish C. Murthy
- Heart and Vascular Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Usman Ahmad
- Heart and Vascular Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Marie E. Semple
- Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Eugene H. Blackstone
- Heart and Vascular Institute, Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
- Research Institute, Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | | |
Collapse
|
32
|
Zhang Y, Sabbaghi A. The Designed Bootstrap for Causal Inference in Big Observational Data. J Stat Theory Pract 2021. [DOI: 10.1007/s42519-021-00213-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
33
|
Hu L, Ji J, Li F. Estimating heterogeneous survival treatment effect in observational data using machine learning. Stat Med 2021; 40:4691-4713. [PMID: 34114252 PMCID: PMC9827499 DOI: 10.1002/sim.9090] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 05/16/2021] [Accepted: 05/19/2021] [Indexed: 01/12/2023]
Abstract
Methods for estimating heterogeneous treatment effect in observational data have largely focused on continuous or binary outcomes, and have been relatively less vetted with survival outcomes. Using flexible machine learning methods in the counterfactual framework is a promising approach to address challenges due to complex individual characteristics, to which treatments need to be tailored. To evaluate the operating characteristics of recent survival machine learning methods for the estimation of treatment effect heterogeneity and inform better practice, we carry out a comprehensive simulation study presenting a wide range of settings describing confounded heterogeneous survival treatment effects and varying degrees of covariate overlap. Our results suggest that the nonparametric Bayesian Additive Regression Trees within the framework of accelerated failure time model (AFT-BART-NP) consistently yields the best performance, in terms of bias, precision, and expected regret. Moreover, the credible interval estimators from AFT-BART-NP provide close to nominal frequentist coverage for the individual survival treatment effect when the covariate overlap is at least moderate. Including a nonparametrically estimated propensity score as an additional fixed covariate in the AFT-BART-NP model formulation can further improve its efficiency and frequentist coverage. Finally, we demonstrate the application of flexible causal machine learning estimators through a comprehensive case study examining the heterogeneous survival effects of two radiotherapy approaches for localized high-risk prostate cancer.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, NJ
| | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Fan Li
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut
- Center for Methods in Implementation and Prevention Science, Yale University School of Public Health, New Haven, Connecticut
| |
Collapse
|
34
|
Sun LZ, Wu C, Li X, Chen C, Schmidt EV. Independent action models and prediction of combination treatment effects for response rate, duration of response and tumor size change in oncology drug development. Contemp Clin Trials 2021; 106:106434. [PMID: 34004341 DOI: 10.1016/j.cct.2021.106434] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 03/05/2021] [Accepted: 05/10/2021] [Indexed: 11/16/2022]
Abstract
An unprecedented number of new cancer targets are in development, and most are being developed in combination therapies. Early oncology development is strategically challenged in choosing the best combinations to move forward to late stage development. The most common early endpoints to be assessed in such decision-making include objective response rate, duration of response and tumor size change. In this paper, using independent-drug-action and Bliss-drug-independence concepts as a foundation, we introduce simple models to predict combination therapy efficacy for duration of response and tumor size change. These models complement previous publications using the independent action models (Palmer 2017, Schmidt 2020) to predict progression-free survival and objective response rate and serve as new predictive models to understand drug combinations for early endpoints. The models can be applied to predict the combination treatment effect for early endpoints given monotherapy data, or to estimate the possible effect of one monotherapy in the combination if data are available from the combination therapy and the other monotherapy. Such quantitative work facilitates strategic planning and decision making in early stage oncology drug development.
Collapse
Affiliation(s)
- Linda Z Sun
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA.
| | - Cai Wu
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| | - Xiaoyun Li
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| | - Cong Chen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| | - Emmett V Schmidt
- Oncology Early Development, Merck & Co., Inc., Kenilworth, NJ 07033, USA
| |
Collapse
|
35
|
Prosperi M, Salemi M, Ghosh S, Lyu T, Bian J, Chen Z, Zhao J. Causal AI with Real World Data: Do Statins Protect from Alzheimer's Disease Onset? ICMHI 2021 (2021) 2021; 2021:296-303. [PMID: 37954527 PMCID: PMC10636706 DOI: 10.1145/3472813.3473206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Causal artificial intelligence aims at developing bias-robust models that can be used to intervene on, rather than just be predictive, of risks or outcomes. However, learning interventional models from observational data, including electronic health records (EHR), is challenging due to inherent bias, e.g., protopathic, confounding, collider. When estimating the effects of treatment interventions, classical approaches like propensity score matching are often used, but they pose limitations with large feature sets, nonlinear/nonparallel treatment group assignments, and collider bias. In this work, we used data from a large EHR consortium -OneFlorida- and evaluated causal statistical/machine learning methods for determining the effect of statin treatment on the risk of Alzheimer's disease, a debated clinical research question. We introduced a combination of directed acyclic graph (DAG) learning and comparison with expert's design, with calculation of the generalized adjustment criterion (GAC), to find an optimal set of covariates for estimation of treatment effects -ameliorating collider bias. The DAG/CAC approach was assessed together with traditional propensity score matching, inverse probability weighting, virtual-twin/counterfactual random forests, and deep counterfactual networks. We showed large heterogeneity in effect estimates upon different model configurations. Our results did not exclude a protective effect of statins, where the DAG/GAC point estimate aligned with the maximum credibility estimate, although the 95% credibility interval included a null effect, warranting further studies and replication.
Collapse
Affiliation(s)
| | | | | | - Tianchen Lyu
- Department of Health Outcomes and Biomedical Informatics, University of Florida
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida
| | - Zhaoyi Chen
- Department of Health Outcomes and Biomedical Informatics, University of Florida
| | - Jinying Zhao
- Department of Epidemiology, University of Florida
| |
Collapse
|
36
|
Qi W, Abu-Hanna A, van Esch TEM, de Beurs D, Liu Y, Flinterman LE, Schut MC. Explaining heterogeneity of individual treatment causal effects by subgroup discovery: An observational case study in antibiotics treatment of acute rhino-sinusitis. Artif Intell Med 2021; 116:102080. [PMID: 34020753 DOI: 10.1016/j.artmed.2021.102080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 03/09/2021] [Accepted: 04/20/2021] [Indexed: 11/19/2022]
Abstract
OBJECTIVES Individuals may respond differently to the same treatment, and there is a need to understand such heterogeneity of causal individual treatment effects. We propose and evaluate a modelling approach to better understand this heterogeneity from observational studies by identifying patient subgroups with a markedly deviating response to treatment. We illustrate this approach in a primary care case-study of antibiotic (AB) prescription on recovery from acute rhino-sinusitis (ARS). METHODS Our approach consists of four stages and is applied to a large dataset in primary care dataset of 24,392 patients suspected of suffering from ARS. We first identify pre-treatment variables that either confound the relationship between treatment and outcome or are risk factors of the outcome. Second, based on the pre-treatment variables we create Synthetic Random Forest (SRF) models to compute the potential outcomes and subsequently the causal individual treatment effect (ITE) estimates. Third, we perform subgroup discovery using the ITE estimates as outcomes to identify positive and negative responders. Fourth, we evaluate the predictive performance of the identified subgroups for predicting the outcome in two ways: the likelihood ratio test, and whether the subgroups are selected via the Akaike Information Criterion (AIC) using backward stepwise variable selection. We validate the whole modelling strategy by means of 10-fold-cross-validation. RESULTS Based on 20 pre-treatment variables, four subgroups (three for positive responders and one for negative responders) were identified. The log likelihood ratio tests showed that the subgroups were significant. Variable selection using the AIC kept two of the four subgroups, one for positive responders and one for negative responders. As for the validation of the whole modelling strategy, all reported measures (the number of pre-treatment variables associated with the outcome, number of subgroups, number of subgroups surviving variable selection and coverage) showed little variation. CONCLUSIONS With the proposed approach, we identified subgroups of positive and negative responders to treatment that markedly deviate from the mean response. The subgroups showed additive predictive value of the outcome. The modelling approach strategy was shown to be robust on this dataset. Our approach was thus able to discover understandable subgroups from observational data that have predictive value and which may be considered by the clinical users to get insight into who responds positively or negatively to a proposed treatment.
Collapse
Affiliation(s)
- W Qi
- Department of Medical Informatics, Amsterdam University Medical Centers, Location AMC, Amsterdam, the Netherlands; Tianjin Institute of Cardiology, Second Hospital of Tianjin Medical University, Tianjin, China; School of Health Policy and Management, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - A Abu-Hanna
- Department of Medical Informatics, Amsterdam University Medical Centers, Location AMC, Amsterdam, the Netherlands.
| | - T E M van Esch
- NIVEL, Netherlands Institute for Health Services Research, Utrecht, the Netherlands
| | - D de Beurs
- Department of epidemiology, Netherlands Institute of Mental Health and Addiction (Trimbos Institute), Utrecht, the Netherlands
| | - Y Liu
- School of Health Policy and Management, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - L E Flinterman
- NIVEL, Netherlands Institute for Health Services Research, Utrecht, the Netherlands
| | - M C Schut
- Department of Medical Informatics, Amsterdam University Medical Centers, Location AMC, Amsterdam, the Netherlands
| |
Collapse
|
37
|
Prosperi M, Guo Y, Bian J. Bagged random causal networks for interventional queries on observational biomedical datasets. J Biomed Inform 2021; 115:103689. [PMID: 33548542 DOI: 10.1016/j.jbi.2021.103689] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 12/30/2020] [Accepted: 01/23/2021] [Indexed: 11/30/2022]
Abstract
Learning causal effects from observational data, e.g. estimating the effect of a treatment on survival by data-mining electronic health records (EHRs), can be biased due to unmeasured confounders, mediators, and colliders. When the causal dependencies among features/covariates are expressed in the form of a directed acyclic graph, using do-calculus it is possible to identify one or more adjustment sets for eliminating the bias on a given causal query under certain assumptions. However, prior knowledge of the causal structure might be only partial; algorithms for causal structure discovery often provide ambiguous solutions, and their computational complexity becomes practically intractable when the feature sets grow large. We hypothesize that the estimation of the true causal effect of a causal query on to an outcome can be approximated as an ensemble of lower complexity estimators, namely bagged random causal networks. A bagged random causal network is an ensemble of subnetworks constructed by sampling the feature subspaces (with the query, the outcome, and a random number of other features), drawing conditional dependencies among the features, and inferring the corresponding adjustment sets. The causal effect can be then estimated by any regression function of the outcome by the query paired with the adjustment sets. Through simulations and a real-world clinical dataset (class III malocclusion data), we show that the bagged estimator is -in most cases- consistent with the true causal effect if the structure is known, has a good variance/bias trade-off when the structure is unknown (estimated using heuristics), has lower computational complexity than learning a full network, and outperforms boosted regression. In conclusion, the bagged random causal network is well-suited to estimate query-target causal effects from observational studies on EHR and other high-dimensional biomedical databases.
Collapse
Affiliation(s)
- Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, FL, USA.
| | - Yi Guo
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, FL, USA; Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, FL, USA.
| | - Jiang Bian
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, FL, USA; Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, FL, USA.
| |
Collapse
|
38
|
Adil SM, Elahi C, Gramer R, Spears CA, Fuller AT, Haglund MM, Dunn TW. Predicting the Individual Treatment Effect of Neurosurgery for Patients with Traumatic Brain Injury in the Low-Resource Setting: A Machine Learning Approach in Uganda. J Neurotrauma 2020; 38:928-939. [PMID: 33054545 DOI: 10.1089/neu.2020.7262] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Traumatic brain injury (TBI) disproportionately affects low- and middle-income countries (LMICs). In these low-resource settings, effective triage of patients with TBI-including the decision of whether or not to perform neurosurgery-is critical in optimizing patient outcomes and healthcare resource utilization. Machine learning may allow for effective predictions of patient outcomes both with and without surgery. Data from patients with TBI was collected prospectively at Mulago National Referral Hospital in Kampala, Uganda, from 2016 to 2019. One linear and six non-linear machine learning models were designed to predict good versus poor outcome near hospital discharge and internally validated using nested five-fold cross-validation. The 13 predictors included clinical variables easily acquired on admission and whether or not the patient received surgery. Using an elastic-net regularized logistic regression model (GLMnet), with predictions calibrated using Platt scaling, the probability of poor outcome was calculated for each patient both with and without surgery (with the difference quantifying the "individual treatment effect," ITE). Relative ITE represents the percent reduction in chance of poor outcome, equaling this ITE divided by the probability of poor outcome with no surgery. Ultimately, 1766 patients were included. Areas under the receiver operating characteristic curve (AUROCs) ranged from 83.1% (single C5.0 ruleset) to 88.5% (random forest), with the GLMnet at 87.5%. The two variables promoting good outcomes in the GLMnet model were high Glasgow Coma Scale score and receiving surgery. For the subgroup not receiving surgery, the median relative ITE was 42.9% (interquartile range [IQR], 32.7% to 53.5%); similarly, in those receiving surgery, it was 43.2% (IQR, 32.9% to 54.3%). We provide the first machine learning-based model to predict TBI outcomes with and without surgery in LMICs, thus enabling more effective surgical decision making in the resource-limited setting. Predicted ITE similarity between surgical and non-surgical groups suggests that, currently, patients are not being chosen optimally for neurosurgical intervention. Our clinical decision aid has the potential to improve outcomes.
Collapse
Affiliation(s)
- Syed M Adil
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA
| | - Cyrus Elahi
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA
| | - Robert Gramer
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA
| | - Charis A Spears
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA
| | - Anthony T Fuller
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA.,Department of Neurosurgery, Duke University Medical Center, Durham, North Carolina, USA.,Duke Global Health Institute, Duke University, Durham, North Carolina. USA
| | - Michael M Haglund
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA.,Department of Neurosurgery, Duke University Medical Center, Durham, North Carolina, USA.,Duke Global Health Institute, Duke University, Durham, North Carolina. USA
| | - Timothy W Dunn
- Division of Global Neurosurgery and Neurology, Duke University Medical Center, Durham, North Carolina, USA.,Department of Neurosurgery, Duke University Medical Center, Durham, North Carolina, USA.,Department of Statistical Science, Duke University Medical Center, Durham, North Carolina, USA
| |
Collapse
|
39
|
Zhang P, Ma J, Chen X, Shentu Y. A nonparametric method for value function guided subgroup identification via gradient tree boosting for censored survival data. Stat Med 2020; 39:4133-4146. [PMID: 32786155 DOI: 10.1002/sim.8714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 06/08/2020] [Accepted: 07/09/2020] [Indexed: 11/07/2022]
Abstract
In randomized clinical trials with survival outcome, there has been an increasing interest in subgroup identification based on baseline genomic, proteomic markers, or clinical characteristics. Some of the existing methods identify subgroups that benefit substantially from the experimental treatment by directly modeling outcomes or treatment effect. When the goal is to find an optimal treatment for a given patient rather than finding the right patient for a given treatment, methods under the individualized treatment regime framework estimate an individualized treatment rule that would lead to the best expected clinical outcome as measured by a value function. Connecting the concept of value function to subgroup identification, we propose a nonparametric method that searches for subgroup membership scores by maximizing a value function that directly reflects the subgroup-treatment interaction effect based on restricted mean survival time. A gradient tree boosting algorithm is proposed to search for the individual subgroup membership scores. We conduct simulation studies to evaluate the performance of the proposed method and an application to an AIDS clinical trial is performed for illustration.
Collapse
Affiliation(s)
- Pingye Zhang
- Biostatistics and Research Decision Sciences, MRL, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Junshui Ma
- Biostatistics and Research Decision Sciences, MRL, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Xinqun Chen
- Biostatistics and Research Decision Sciences, MRL, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Yue Shentu
- Biostatistics and Research Decision Sciences, MRL, Merck & Co., Inc., Rahway, New Jersey, USA
| |
Collapse
|
40
|
Hu L, Li L, Ji J. Machine learning to identify and understand key factors for provider-patient discussions about smoking. Prev Med Rep 2020; 20:101238. [PMID: 33224719 PMCID: PMC7666379 DOI: 10.1016/j.pmedr.2020.101238] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/07/2020] [Accepted: 10/20/2020] [Indexed: 12/15/2022] Open
Abstract
We sought to identify key determinants of the likelihood of provider-patient discussions about smoking and to understand the effects of these determinants. We used data on 3666 self-reported current smokers who talked to a health professional within a year of the time the survey was conducted using the 2017 National Health Interview Survey. We included wide-ranging information on 43 potential covariates across four domains, demographic and socio-economic status, behavior, health status and healthcare utilization. We exploited a principled nonparametric permutation based approach using Bayesian machine learning to identify and rank important determinants of discussions about smoking between health providers and patients. In the order of importance, frequency of doctor office visits, intensity of cigarette use, length of smoking history, chronic obstructive pulmonary disease, emphysema, marital status were major determinants of disparities in provider-patient discussions about smoking. There was a distinct interaction between intensity of cigarette use and length of smoking history. Our analysis may provide some insights into strategies for promoting discussions on smoking and facilitating smoking cessation. Health care resource usage, smoking intensity and duration and smoking-related conditions were key drivers. The "usual suspects", age, gender, race and ethnicity were less important, and gender, in particular, had little effect.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Institute for Healthcare Delivery, Mount Sinai Health System, New York, NY, USA
| | - Lihua Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Institute for Healthcare Delivery, Mount Sinai Health System, New York, NY, USA
| | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,The Institute for Healthcare Delivery, Mount Sinai Health System, New York, NY, USA
| |
Collapse
|
41
|
Affiliation(s)
- Muxuan Liang
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
42
|
Garcia-Montemayor V, Martin-Malo A, Barbieri C, Bellocchio F, Soriano S, Pendon-Ruiz de Mier V, Molina IR, Aljama P, Rodriguez M. Predicting mortality in hemodialysis patients using machine learning analysis. Clin Kidney J 2020; 14:1388-1395. [PMID: 34221370 PMCID: PMC8247746 DOI: 10.1093/ckj/sfaa126] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022] Open
Abstract
Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.
Collapse
Affiliation(s)
| | - Alejandro Martin-Malo
- Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain.,Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Reina Sofia University Hospital, University of Cordoba, Spain.,RETICs-REDinREN (National Institute of Health Carlos III), Madrid, Spain
| | - Carlo Barbieri
- Fresenius Medical Care Italia, Vaiano Cremasco, Cremona, Italy
| | | | - Sagrario Soriano
- Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain
| | - Victoria Pendon-Ruiz de Mier
- Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain.,Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Reina Sofia University Hospital, University of Cordoba, Spain
| | - Ignacio R Molina
- Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain
| | - Pedro Aljama
- Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain.,Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Reina Sofia University Hospital, University of Cordoba, Spain
| | - Mariano Rodriguez
- Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain.,Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Reina Sofia University Hospital, University of Cordoba, Spain.,RETICs-REDinREN (National Institute of Health Carlos III), Madrid, Spain
| |
Collapse
|
43
|
Prosperi M, Guo Y, Sperrin M, Koopman JS, Min JS, He X, Rich S, Wang M, Buchan IE, Bian J. Causal inference and counterfactual prediction in machine learning for actionable healthcare. NAT MACH INTELL 2020; 2:369-75. [DOI: 10.1038/s42256-020-0197-y] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
44
|
Sun M, Cai Y, Zhang K, Zhao X, Chen Z. A method to analyze the sensitivity ranking of various abiotic factors to acoustic densities of fishery resources in the surface mixed layer and bottom cold water layer of the coastal area of low latitude: a case study in the northern South China Sea. Sci Rep 2020; 10:11128. [PMID: 32636512 DOI: 10.1038/s41598-020-67387-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 06/04/2020] [Indexed: 11/09/2022] Open
Abstract
This is an exploratory analysis combining artificial intelligence algorithms, fishery acoustics technology, and a variety of abiotic factors in low-latitude coastal waters. This approach can be used to analyze the sensitivity level between the acoustic density of fishery resources and various abiotic factors in the surface mixed layer (the water layer above the constant thermocline) and the bottom cold water layer (the water layer below the constant thermocline). The fishery acoustic technology is used to obtain the acoustic density of fishery resources in each water layer, which is characterized by Nautical Area Scattering Coefficient values (NASC), and the artificial intelligence algorithm is used to rank the sensitivity of various abiotic factors and NASC values of two water layers, and the grades are classified according to the cumulative contribution percentage. We found that stratified or multidimensional analysis of the sensitivity of abiotic factors is necessary. One factor could have different levels of sensitivity in different water layers, such as temperature, nitrite, water depth, and salinity. Besides, eXtreme Gradient Boosting and random forests models performed better than the linear regression model, with 0.2 to 0.4 greater R2 value. The performance of the models had smaller fluctuations with a larger sample size.
Collapse
|
45
|
Yadlowsky S, Pellegrini F, Lionetto F, Braune S, Tian L. Estimation and Validation of Ratio-based Conditional Average Treatment Effects Using Observational Data. J Am Stat Assoc 2020; 116:335-352. [PMID: 33767517 PMCID: PMC7985957 DOI: 10.1080/01621459.2020.1772080] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 04/20/2020] [Accepted: 05/16/2020] [Indexed: 10/24/2022]
Abstract
While sample sizes in randomized clinical trials are large enough to estimate the average treatment effect well, they are often insufficient for estimation of treatment-covariate interactions critical to studying data-driven precision medicine. Observational data from real world practice may play an important role in alleviating this problem. One common approach in trials is to predict the outcome of interest with separate regression models in each treatment arm, and estimate the treatment effect based on the contrast of the predictions. Unfortunately, this simple approach may induce spurious treatment-covariate interaction in observational studies when the regression model is misspecified. Motivated by the need of modeling the number of relapses in multiple sclerosis patients, where the ratio of relapse rates is a natural choice of the treatment effect, we propose to estimate the conditional average treatment effect (CATE) as the ratio of expected potential outcomes, and derive a doubly robust estimator of this CATE in a semiparametric model of treatment-covariate interactions. We also provide a validation procedure to check the quality of the estimator on an independent sample. We conduct simulations to demonstrate the finite sample performance of the proposed methods, and illustrate their advantages on real data by examining the treatment effect of dimethyl fumarate compared to teriflunomide in multiple sclerosis patients.
Collapse
Affiliation(s)
- Steve Yadlowsky
- Stanford University, Electrical Engineering, 1265 Welch Rd, Stanford, 94305-6104 United States
| | | | | | - Stefan Braune
- NeuroTransData, Neurology, Neuburg an der Donau, Germany
| | - Lu Tian
- Stanford University, Department of Biomedical Data Science, Stanford, 94305-6104 United States
| |
Collapse
|
46
|
Sugasawa S, Noma H. Efficient screening of predictive biomarkers for individual treatment selection. Biometrics 2020; 77:249-257. [PMID: 32294246 DOI: 10.1111/biom.13279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 03/27/2020] [Accepted: 03/30/2020] [Indexed: 01/18/2023]
Abstract
The development of molecular diagnostic tools to achieve individualized medicine requires identifying predictive biomarkers associated with subgroups of individuals who might receive beneficial or harmful effects from different available treatments. However, due to the large number of candidate biomarkers in the large-scale genetic and molecular studies, and complex relationships among clinical outcome, biomarkers, and treatments, the ordinary statistical tests for the interactions between treatments and covariates have difficulties from their limited statistical powers. In this paper, we propose an efficient method for detecting predictive biomarkers. We employ weighted loss functions of Chen et al. to directly estimate individual treatment scores and propose synthetic posterior inference for effect sizes of biomarkers. We develop an empirical Bayes approach, namely, we estimate unknown hyperparameters in the prior distribution based on data. We then provide efficient screening methods for the candidate biomarkers via optimal discovery procedure with adequate control of false discovery rate. The proposed method is demonstrated in simulation studies and an application to a breast cancer clinical study in which the proposed method was shown to detect the much larger numbers of significant biomarkers than existing standard methods.
Collapse
Affiliation(s)
- Shonosuke Sugasawa
- Center for Spatial Information Science, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Hisashi Noma
- Department of Data Science, The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan
| |
Collapse
|
47
|
Wongvibulsin S, Wu KC, Zeger SL. Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis. BMC Med Res Methodol 2019; 20:1. [PMID: 31888507 PMCID: PMC6937754 DOI: 10.1186/s12874-019-0863-0] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Accepted: 11/08/2019] [Indexed: 12/23/2022] Open
Abstract
Background Clinical research and medical practice can be advanced through the prediction of an individual’s health state, trajectory, and responses to treatments. However, the majority of current clinical risk prediction models are based on regression approaches or machine learning algorithms that are static, rather than dynamic. To benefit from the increasing emergence of large, heterogeneous data sets, such as electronic health records (EHRs), novel tools to support improved clinical decision making through methods for individual-level risk prediction that can handle multiple variables, their interactions, and time-varying values are necessary. Methods We introduce a novel dynamic approach to clinical risk prediction for survival, longitudinal, and multivariate (SLAM) outcomes, called random forest for SLAM data analysis (RF-SLAM). RF-SLAM is a continuous-time, random forest method for survival analysis that combines the strengths of existing statistical and machine learning methods to produce individualized Bayes estimates of piecewise-constant hazard rates. We also present a method-agnostic approach for time-varying evaluation of model performance. Results We derive and illustrate the method by predicting sudden cardiac arrest (SCA) in the Left Ventricular Structural (LV) Predictors of Sudden Cardiac Death (SCD) Registry. We demonstrate superior performance relative to standard random forest methods for survival data. We illustrate the importance of the number of preceding heart failure hospitalizations as a time-dependent predictor in SCA risk assessment. Conclusions RF-SLAM is a novel statistical and machine learning method that improves risk prediction by incorporating time-varying information and accommodating a large number of predictors, their interactions, and missing values. RF-SLAM is designed to easily extend to simultaneous predictions of multiple, possibly competing, events and/or repeated measurements of discrete or continuous variables over time.Trial registration: LV Structural Predictors of SCD Registry (clinicaltrials.gov, NCT01076660), retrospectively registered 25 February 2010
Collapse
Affiliation(s)
- Shannon Wongvibulsin
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, USA.
| | - Katherine C Wu
- Department of Medicine, Division of Cardiology, Johns Hopkins School of Medicine, Baltimore, USA
| | - Scott L Zeger
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| |
Collapse
|
48
|
Żegleń M, Marini E, Cabras S, Kryst Ł, Das R, Chakraborty A, Dasgupta P. The relationship among the age at menarche, anthropometric characteristics, and socio-economic factors in Bengali girls from Kolkata, India. Am J Hum Biol 2019; 32:e23380. [PMID: 31875347 DOI: 10.1002/ajhb.23380] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 12/05/2019] [Accepted: 12/09/2019] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVES The aim of the study was to measure the causal effect of selected socio-economic factors and anthropometrical characteristics on the menarche occurrence. METHODS The sample consisted of 2195 Bengali girls (aged 7-21) from middle-class families, from Kolkata city, India. The age at menarche was recorded from the retrospective data and status quo method. The causal effect of anthropometric and socio-economic variables on menarche occurrence was estimated by the nonparametrical analysis of survival probability (survival random forest). RESULTS In the examined cohort menarche occurred, on average, at 11.8 years of age. The probability of menarche occurrence amplified with the increasing values of factors such as body mass index, height-for-age z-scores, number of family members, household rooms, and toilets, but decreased when expenditures increased. The relation maintained a similar pattern of causal effect with girls' age. CONCLUSIONS A complex pattern of relationship among sexual development, physique, and socio-economic characteristics was defined. The tendency toward early menarche, along with the observed causal relationships indicate that the analyzed sample is nearing the characteristics and standards of living noted in other middle and even high-income countries in the world.
Collapse
Affiliation(s)
- Magdalena Żegleń
- Department of Anthropology, Faculty of Physical Education, University of Physical Education in Kraków, Kraków, Poland
| | - Elisabetta Marini
- Department of Life and Environmental Sciences, University of Cagliari, Cagliari, Italy
| | - Stefano Cabras
- Department of Life and Environmental Sciences, University of Cagliari, Cagliari, Italy
| | - Łukasz Kryst
- Department of Anthropology, Faculty of Physical Education, University of Physical Education in Kraków, Kraków, Poland
| | - Rituparna Das
- Biological Anthropology Unit, Indian Statistical Institute, Kolkata, India
| | | | - Parasmani Dasgupta
- Biological Anthropology Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
49
|
Rice TW, Lu M, Ishwaran H, Blackstone EH. Precision Surgical Therapy for Adenocarcinoma of the Esophagus and Esophagogastric Junction. J Thorac Oncol 2019; 14:2164-2175. [PMID: 31442498 PMCID: PMC6876319 DOI: 10.1016/j.jtho.2019.08.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/31/2019] [Accepted: 08/05/2019] [Indexed: 12/12/2022]
Abstract
INTRODUCTION To facilitate the initial clinical decision regarding whether to use esophagectomy alone or neoadjuvant therapy in surgical care for individual patients with adenocarcinoma of the esophagus and esophagogastric junction-information not available from randomized trials-a machine-learning analysis was performed using worldwide real-world data on patients undergoing different therapies for this rare adenocarcinoma. METHODS Using random forest technology in a sequential analysis, we (1) identified eligibility for each of four therapies among 13,365 patients: esophagectomy alone (n = 6649), neoadjuvant therapy (n = 4706), esophagectomy and adjuvant therapy (n = 998), and neoadjuvant and adjuvant therapy (n = 1022); (2) performed survival analyses incorporating interactions of patient and cancer characteristics with therapy; (3) determined optimal therapy as that predicted to maximize lifetime within 10 years (restricted mean survival time; RMST) for each patient; and (4) compared lifetime gained from optimal versus actual therapies. RESULTS Actual therapy was optimal in 61% of those receiving esophagectomy alone; neoadjuvant therapy was optimal for 36% receiving neoadjuvant therapy. Many patients were predicted to benefit from postoperative adjuvant therapy. Total RMST for actual therapy received was 58,825 years. Had patients received optimal therapy, total RMST was predicted to be 62,982 years, a 7% gain. CONCLUSIONS Average treatment effect for adenocarcinoma of the esophagus yields only crude evidence-based therapy guidelines. However, patient response to therapy is widely variable, and survival after data-driven predicted optimal therapy often differs from actual therapy received. Therapy must address an individual patient's cancer and clinical characteristics to provide precision surgical therapy for adenocarcinoma of the esophagus and esophagogastric junction.
Collapse
Affiliation(s)
- Thomas W Rice
- Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | - Min Lu
- Department of Public Health Sciences, Division of Biostatistics, University of Miami, Coral Gables, Florida
| | - Hemant Ishwaran
- Department of Public Health Sciences, Division of Biostatistics, University of Miami, Coral Gables, Florida
| | - Eugene H Blackstone
- Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio; Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio.
| |
Collapse
|
50
|
Grigoryan H, Schiffman C, Gunter MJ, Naccarati A, Polidoro S, Dagnino S, Dudoit S, Vineis P, Rappaport SM. Cys34 Adductomics Links Colorectal Cancer with the Gut Microbiota and Redox Biology. Cancer Res 2019; 79:6024-6031. [PMID: 31641032 PMCID: PMC6891211 DOI: 10.1158/0008-5472.can-19-1529] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/21/2019] [Accepted: 10/11/2019] [Indexed: 12/12/2022]
Abstract
Chronic inflammation is an established risk factor for colorectal cancer. To study reactive products of gut inflammation and redox signaling on colorectal cancer development, we used untargeted adductomics to detect adduct features in prediagnostic serum from the EPIC Italy cohort. We focused on modifications to Cys34 in human serum albumin, which is responsible for scavenging small reactive electrophiles that might initiate cancers. Employing a combination of statistical methods, we selected seven Cys34 adducts associated with colorectal cancer, as well as body mass index (BMI; a well-known risk factor). Five adducts were more abundant in colorectal cancer cases than controls and clustered with each other, suggesting a common pathway. Because two of these adducts were Cys34 modifications by methanethiol, a microbial-human cometabolite, and crotonaldehyde, a product of lipid peroxidation, these findings further implicate infiltration of gut microbes into the intestinal mucosa and the corresponding inflammatory response as causes of colorectal cancer. The other two associated adducts were Cys34 disulfides of homocysteine that were less abundant in colorectal cancer cases than controls and may implicate homocysteine metabolism as another causal pathway. The selected adducts and BMI ranked higher as potentially causal factors than variables previously associated with colorectal cancer (smoking, alcohol consumption, physical activity, and total meat consumption). Regressions of case-control differences in adduct levels on days to diagnosis showed no statistical evidence that disease progression, rather than causal factors at recruitment, contributed to the observed differences. These findings support the hypothesis that infiltration of gut microbes into the intestinal mucosa and the resulting inflammation are causal factors for colorectal cancer. SIGNIFICANCE: Infiltration of gut microbes into the intestinal mucosa and the resulting inflammation are causal factors for colorectal cancer.
Collapse
Affiliation(s)
- Hasmik Grigoryan
- School of Public Health, University of California, Berkeley, California
| | | | - Marc J Gunter
- Section of Nutrition and Metabolism, International Agency for Research on Cancer, Lyon, France
| | | | - Silvia Polidoro
- Italian Institute for Genomic Medicine (IIGM), Torino, Italy
| | - Sonia Dagnino
- MRC-PHE Centre for Environment & Health, Imperial College, London, United Kingdom
| | - Sandrine Dudoit
- School of Public Health, University of California, Berkeley, California
- Department of Statistics, University of California, Berkeley, California
| | - Paolo Vineis
- Italian Institute for Genomic Medicine (IIGM), Torino, Italy
- MRC-PHE Centre for Environment & Health, Imperial College, London, United Kingdom
| | | |
Collapse
|