1
|
Liang CJ, Luo C, Kranzler HR, Bian J, Chen Y. Communication-efficient federated learning of temporal effects on opioid use disorder with data from distributed research networks. J Am Med Inform Assoc 2025; 32:656-664. [PMID: 39864407 PMCID: PMC12005629 DOI: 10.1093/jamia/ocae313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 11/30/2024] [Accepted: 12/20/2024] [Indexed: 01/28/2025] Open
Abstract
OBJECTIVE To develop a distributed algorithm to fit multi-center Cox regression models with time-varying coefficients to facilitate privacy-preserving data integration across multiple health systems. MATERIALS AND METHODS The Cox model with time-varying coefficients relaxes the proportional hazards assumption of the usual Cox model and is particularly useful to model time-to-event outcomes. We proposed a One-shot Distributed Algorithm to fit multi-center Cox regression models with Time varying coefficients (ODACT). This algorithm constructed a surrogate likelihood function to approximate the Cox partial likelihood function, using patient-level data from a lead site and aggregated data from other sites. The performance of ODACT was demonstrated by simulation and a real-world study of opioid use disorder (OUD) using decentralized data from a large clinical research network across 5 sites with 69 163 subjects. RESULTS The ODACT method precisely estimated the time-varying effects over time. In the simulation study, ODACT always achieved estimation close to that of the pooled analysis, while the meta-estimator showed considerable amount of bias. In the OUD study, the bias of the estimated hazard ratios by ODACT are smaller than those of the meta-estimator for all 7 risk factors at almost all of the time points from 0 to 2.5 years. The greatest bias of the meta-estimator was for the effects of age ≥65 years, and smoking. CONCLUSION ODACT is a privacy-preserving and communication-efficient method for analyzing multi-center time-to-event data which allows the covariates' effects to be time-varying. ODACT provides estimates close to the pooled estimator and substantially outperforms the meta-analysis estimator. DISCUSSION The proposed ODACT is a privacy-preserving distributed algorithm for fitting Cox models with time-varying coefficients. The limitations of ODACT include that privacy-preserving via aggregate data does rely on relatively large number of data at each individual site, and rigorous quantification of the risk of privacy leaks requires further investigation.
Collapse
Affiliation(s)
- C Jason Liang
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, United States
| | - Chongliang Luo
- Division of Public Health Sciences, Washington University School of Medicine, St Louis, MO 63110, United States
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Henry R Kranzler
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32610, United States
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
- Center for Health AI and Synthesis of Evidence, University of Pennsylvania, Philadelphia, PA 19104, United States
| |
Collapse
|
2
|
Martinez-Morata I, Schilling K, Glabonjat RA, Domingo-Relloso A, Mayer M, McGraw K, Fernandez MG, Sanchez T, Nigra AE, Kaufman J, Vaidya D, Jones MR, Bancks MP, Barr R, Shimbo D, Post WS, Valeri L, Shea S, Navas-Acien A. Association of Urinary Metals With Cardiovascular Disease Incidence and All-Cause Mortality in the Multi-Ethnic Study of Atherosclerosis (MESA). Circulation 2024; 150:758-769. [PMID: 39087344 PMCID: PMC11371385 DOI: 10.1161/circulationaha.124.069414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 06/24/2024] [Indexed: 08/02/2024]
Abstract
BACKGROUND Exposure to metals has been associated with cardiovascular disease (CVD) end points and mortality, yet prospective evidence is limited beyond arsenic, cadmium, and lead. In this study, we assessed the prospective association of urinary metals with incident CVD and all-cause mortality in a racially diverse population of US adults from MESA (the Multi-Ethnic Study of Atherosclerosis). METHODS We included 6599 participants (mean [SD] age, 62.1 [10.2] years; 53% female) with urinary metals available at baseline (2000 to 2001) and followed through December 2019. We used Cox proportional hazards models to estimate the adjusted hazard ratio and 95% CI of CVD and all-cause mortality by baseline urinary levels of cadmium, tungsten, and uranium (nonessential metals), and cobalt, copper, and zinc (essential metals). The joint association of the 6 metals as a mixture and the corresponding 10-year survival probability was calculated using Cox Elastic-Net. RESULTS During follow-up, 1162 participants developed CVD, and 1844 participants died. In models adjusted by behavioral and clinical indicators, the hazard ratios (95% CI) for incident CVD and all-cause mortality comparing the highest with the lowest quartile were, respectively: 1.25 (1.03, 1.53) and 1.68 (1.43, 1.96) for cadmium; 1.20 (1.01, 1.42) and 1.16 (1.01, 1.33) for tungsten; 1.32 (1.08, 1.62) and 1.32 (1.12, 1.56) for uranium; 1.24 (1.03, 1.48) and 1.37 (1.19, 1.58) for cobalt; 1.42 (1.18, 1.70) and 1.50 (1.29, 1.74) for copper; and 1.21 (1.01, 1.45) and 1.38 (1.20, 1.59) for zinc. A positive linear dose-response was identified for cadmium and copper with both end points. The adjusted hazard ratios (95% CI) for an interquartile range (IQR) increase in the mixture of these 6 urinary metals and the corresponding 10-year survival probability difference (95% CI) were 1.29 (1.11, 1.56) and -1.1% (-2.0, -0.05) for incident CVD and 1.66 (1.47, 1.91) and -2.0% (-2.6, -1.5) for all-cause mortality. CONCLUSIONS This epidemiological study in US adults indicates that urinary metal levels are associated with increased CVD risk and mortality. These findings can inform the development of novel preventive strategies to improve cardiovascular health.
Collapse
Affiliation(s)
- Irene Martinez-Morata
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Kathrin Schilling
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Ronald A. Glabonjat
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Arce Domingo-Relloso
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
| | - Melanie Mayer
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
| | - Katlyn McGraw
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Marta Galvez Fernandez
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Tiffany Sanchez
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Anne E. Nigra
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| | - Joel Kaufman
- Department of Epidemiology, University of Washington, Seattle, WA
| | | | - Miranda R. Jones
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
| | - Michael P. Bancks
- Department of Epidemiology and Prevention, Wake Forest University School of Medicine, Winston-Salem, NC
| | - R.Graham Barr
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Daichi Shimbo
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Wendy S. Post
- Department of Medicine, Johns Hopkins University, Baltimore, MD
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
| | - Linda Valeri
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY
| | - Steven Shea
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Ana Navas-Acien
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY
| |
Collapse
|
3
|
Li R, Romano JD, Chen Y, Moore JH. Centralized and Federated Models for the Analysis of Clinical Data. Annu Rev Biomed Data Sci 2024; 7:179-199. [PMID: 38723657 PMCID: PMC11571052 DOI: 10.1146/annurev-biodatasci-122220-115746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA;
| | - Joseph D Romano
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA;
| |
Collapse
|
4
|
Huang C, Wei K, Wang C, Yu Y, Qin G. Covariate balance-related propensity score weighting in estimating overall hazard ratio with distributed survival data. BMC Med Res Methodol 2023; 23:233. [PMID: 37833641 PMCID: PMC10576397 DOI: 10.1186/s12874-023-02055-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 10/01/2023] [Indexed: 10/15/2023] Open
Abstract
BACKGROUND When data is distributed across multiple sites, sharing information at the individual level among sites may be difficult. In these multi-site studies, propensity score model can be fitted with data within each site or data from all sites when using inverse probability-weighted Cox regression to estimate overall hazard ratio. However, when there is unknown heterogeneity of covariates in different sites, either approach may lead to potential bias or reduced efficiency. In this study, we proposed a method to estimate propensity score based on covariate balance-related criterion and estimate the overall hazard ratio while overcoming data sharing constraints across sites. METHODS The proposed propensity score was generated by choosing between global and local propensity score based on covariate balance-related criterion, combining the global propensity score fitted in the entire population and the local propensity score fitted within each site. We used this proposed propensity score to estimate overall hazard ratio of distributed survival data with multiple sites, while requiring only the summary-level information across sites. We conducted simulation studies to evaluate the performance of the proposed method. Besides, we applied the proposed method to real-world data to examine the effect of radiation therapy on time to death among breast cancer patients. RESULTS The simulation studies showed that the proposed method improved the performance in estimating overall hazard ratio comparing with global and local propensity score method, regardless of the number of sites and sample size in each site. Similar results were observed under both homogeneous and heterogeneous settings. Besides, the proposed method yielded identical results to the pooled individual-level data analysis. The real-world data analysis indicated that the proposed method was more likely to find a significant effect of radiation therapy on mortality compared to the global propensity score method and local propensity score method. CONCLUSIONS The proposed covariate balance-related propensity score in multi-site distributed survival data outperformed the global propensity score estimated using data from the entire population or the local propensity score estimated within each site in estimating the overall hazard ratio. The proposed approach can be performed without individual-level data transfer between sites and would yield the same results as the corresponding pooled individual-level data analysis.
Collapse
Affiliation(s)
- Chen Huang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Kecheng Wei
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Ce Wang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Yongfu Yu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China.
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China.
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China.
| | - Guoyou Qin
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China.
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China.
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China.
| |
Collapse
|
5
|
Li Z, Shen Y, Ning J. Accommodating time-varying heterogeneity in risk estimation under the Cox model: a transfer learning approach. J Am Stat Assoc 2023; 118:2276-2287. [PMID: 38505403 PMCID: PMC10950074 DOI: 10.1080/01621459.2023.2210336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 04/26/2023] [Indexed: 03/21/2024]
Abstract
Transfer learning has attracted increasing attention in recent years for adaptively borrowing information across different data cohorts in various settings. Cancer registries have been widely used in clinical research because of their easy accessibility and large sample size. Our method is motivated by the question of how to utilize cancer registry data as a complement to improve the estimation precision of individual risks of death for inflammatory breast cancer (IBC) patients at The University of Texas MD Anderson Cancer Center. When transferring information for risk estimation based on the cancer registries (i.e., source cohort) to a single cancer center (i.e., target cohort), time-varying population heterogeneity needs to be appropriately acknowledged. However, there is no literature on how to adaptively transfer knowledge on risk estimation with time-to-event data from the source cohort to the target cohort while adjusting for time-varying differences in event risks between the two sources. Our goal is to address this statistical challenge by developing a transfer learning approach under the Cox proportional hazards model. To allow data-adaptive levels of information borrowing, we impose Lasso penalties on the discrepancies in regression coefficients and baseline hazard functions between the two cohorts, which are jointly solved in the proposed transfer learning algorithm. As shown in the extensive simulation studies, the proposed method yields more precise individualized risk estimation than using the target cohort alone. Meanwhile, our method demonstrates satisfactory robustness against cohort differences compared with the method that directly combines the target and source data in the Cox model. We develop a more accurate risk estimation model for the MD Anderson IBC cohort given various treatment and baseline covariates, while adaptively borrowing information from the National Cancer Database to improve risk assessment.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yu Shen
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
6
|
Rønn Hansen C, Price G, Field M, Sarup N, Zukauskaite R, Johansen J, Eriksen JG, Aly F, McPartlin A, Holloway L, Thwaites D, Brink C. Larynx cancer survival model developed through open-source federated learning. Radiother Oncol 2022; 176:179-186. [PMID: 36208652 DOI: 10.1016/j.radonc.2022.09.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 08/12/2022] [Accepted: 09/28/2022] [Indexed: 12/14/2022]
Abstract
INTRODUCTION Federated learning has the potential to perfrom analysis on decentralised data; however, there are some obstacles to survival analyses as there is a risk of data leakage. This study demonstrates how to perform a stratified Cox regression survival analysis specifically designed to avoid data leakage using federated learning on larynx cancer patients from centres in three different countries. METHODS Data were obtained from 1821 larynx cancer patients treated with radiotherapy in three centres. Tumour volume was available for all 786 of the included patients. Parameter selection among eleven clinical and radiotherapy parameters were performed using best subset selection and cross-validation through the federated learning system, AusCAT. After parameter selection, β regression coefficients were estimated using bootstrap. Calibration plots were generated at 2 and 5-years survival, and inner and outer risk groups' Kaplan-Meier curves were compared to the Cox model prediction. RESULTS The best performing Cox model included log(GTV), performance status, age, smoking, haemoglobin and N-classification; however, the simplest model with similar statistical prediction power included log(GTV) and performance status only. The Harrell C-indices for the simplest model were for Odense, Christie and Liverpool 0.75[0.71-0.78], 0.65[0.59-0.71], and 0.69[0.59-0.77], respectively. The values are slightly higher for the full model with C-index 0.77[0.74-0.80], 0.67[0.62-0.73] and 0.71[0.61-0.80], respectively. Smoking during treatment has the same hazard as a ten-years older nonsmoking patient. CONCLUSION Without any patient-specific data leaving the hospitals, a stratified Cox regression model based on data from centres in three countries was developed without data leakage risks. The overall survival model is primarily driven by tumour volume and performance status.
Collapse
Affiliation(s)
- Christian Rønn Hansen
- Laboratory of Radiation Physics, Odense University Hospital, Odense, Denmark; Department of Clinical Research, University of Southern Denmark, Odense, Denmark; Danish Centre for Particle Therapy, Aarhus University Hospital, Denmark; Institute of Medical Physics, School of Physics, University of Sydney, Sydney, Australia.
| | - Gareth Price
- Radiotherapy department, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Matthew Field
- Ingham Institute for Applied Medical Research, Sydney, Australia
| | - Nis Sarup
- Laboratory of Radiation Physics, Odense University Hospital, Odense, Denmark
| | - Ruta Zukauskaite
- Department of Clinical Research, University of Southern Denmark, Odense, Denmark; Department of Oncology, Odense University Hospital, Odense, Denmark
| | - Jørgen Johansen
- Department of Oncology, Odense University Hospital, Odense, Denmark
| | - Jesper Grau Eriksen
- Department of Oncology, Odense University Hospital, Odense, Denmark; Department of Experimental Clinical Oncology, Aarhus University Hospital, Denmark; Department of Oncology, Aarhus University Hospital, Denmark
| | - Farhannah Aly
- Ingham Institute for Applied Medical Research, Sydney, Australia; Southwest Sydney Clinical Campus, University of New South Wales, Sydney, Australia; Liverpool and Macarthur Cancer Therapy Centres, Sydney, Australia
| | - Andrew McPartlin
- Radiotherapy department, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Lois Holloway
- Institute of Medical Physics, School of Physics, University of Sydney, Sydney, Australia; Ingham Institute for Applied Medical Research, Sydney, Australia; Southwest Sydney Clinical Campus, University of New South Wales, Sydney, Australia; Liverpool and Macarthur Cancer Therapy Centres, Sydney, Australia
| | - David Thwaites
- Institute of Medical Physics, School of Physics, University of Sydney, Sydney, Australia
| | - Carsten Brink
- Laboratory of Radiation Physics, Odense University Hospital, Odense, Denmark; Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| |
Collapse
|