1
|
Tong J, Shen Y, Xu A, He X, Luo C, Edmondson M, Zhang D, Lu Y, Yan C, Li R, Siegel L, Sun L, Shenkman EA, Morton SC, Malin BA, Bian J, Asch DA, Chen Y. Evaluating site-of-care-related racial disparities in kidney graft failure using a novel federated learning framework. J Am Med Inform Assoc 2024; 31:1303-1312. [PMID: 38713006 PMCID: PMC11105132 DOI: 10.1093/jamia/ocae075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 01/09/2024] [Accepted: 03/26/2024] [Indexed: 05/08/2024] Open
Abstract
OBJECTIVES Racial disparities in kidney transplant access and posttransplant outcomes exist between non-Hispanic Black (NHB) and non-Hispanic White (NHW) patients in the United States, with the site of care being a key contributor. Using multi-site data to examine the effect of site of care on racial disparities, the key challenge is the dilemma in sharing patient-level data due to regulations for protecting patients' privacy. MATERIALS AND METHODS We developed a federated learning framework, named dGEM-disparity (decentralized algorithm for Generalized linear mixed Effect Model for disparity quantification). Consisting of 2 modules, dGEM-disparity first provides accurately estimated common effects and calibrated hospital-specific effects by requiring only aggregated data from each center and then adopts a counterfactual modeling approach to assess whether the graft failure rates differ if NHB patients had been admitted at transplant centers in the same distribution as NHW patients were admitted. RESULTS Utilizing United States Renal Data System data from 39 043 adult patients across 73 transplant centers over 10 years, we found that if NHB patients had followed the distribution of NHW patients in admissions, there would be 38 fewer deaths or graft failures per 10 000 NHB patients (95% CI, 35-40) within 1 year of receiving a kidney transplant on average. DISCUSSION The proposed framework facilitates efficient collaborations in clinical research networks. Additionally, the framework, by using counterfactual modeling to calculate the event rate, allows us to investigate contributions to racial disparities that may occur at the level of site of care. CONCLUSIONS Our framework is broadly applicable to other decentralized datasets and disparities research related to differential access to care. Ultimately, our proposed framework will advance equity in human health by identifying and addressing hospital-level racial disparities.
Collapse
Affiliation(s)
- Jiayi Tong
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Yishan Shen
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Applied Mathematics and Computational Science, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Alice Xu
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Washington University in St. Louis, St. Louis, MO 63130, United States
| | - Xing He
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - Chongliang Luo
- Division of Public Health Sciences, Department of Surgery, Washington University in St. Louis, St. Louis, MO 63110, United States
| | | | - Dazheng Zhang
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Yiwen Lu
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Lianne Siegel
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Lichao Sun
- Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA 18015, United States
| | - Elizabeth A Shenkman
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - Sally C Morton
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85287, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - David A Asch
- Division of General Internal Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Leonard Davis Institute of Health Economics, Philadelphia, PA 19104, United States
| | - Yong Chen
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Applied Mathematics and Computational Science, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Leonard Davis Institute of Health Economics, Philadelphia, PA 19104, United States
| |
Collapse
|
2
|
Kim C, Yu DH, Baek H, Cho J, You SC, Park RW. Data Resource Profile: Health Insurance Review and Assessment Service Covid-19 Observational Medical Outcomes Partnership (HIRA Covid-19 OMOP) database in South Korea. Int J Epidemiol 2024; 53:dyae062. [PMID: 38658170 DOI: 10.1093/ije/dyae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 04/08/2024] [Indexed: 04/26/2024] Open
Affiliation(s)
- Chungsoo Kim
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
| | - Dong Han Yu
- Big Data Department, Health Insurance Assessment and Review Services, Wonju, Republic of Korea
| | - Hyeran Baek
- Big Data Department, Health Insurance Assessment and Review Services, Wonju, Republic of Korea
| | - Jaehyeong Cho
- Department of Research, Keimyung University Dongsan Medical Center, Daegu, Republic of Korea
| | - Seng Chan You
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|
3
|
Jing N, Liu X, Wu Q, Rao S, Mejias A, Maltenfort M, Schuchard J, Lorman V, Razzaghi H, Webb R, Zhou C, Jhaveri R, Lee GM, Pajor NM, Thacker D, Charles Bailey L, Forrest CB, Chen Y. Development and validation of a federated learning framework for detection of subphenotypes of multisystem inflammatory syndrome in children. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.26.24301827. [PMID: 38343837 PMCID: PMC10854314 DOI: 10.1101/2024.01.26.24301827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Background Multisystem inflammatory syndrome in children (MIS-C) is a severe post-acute sequela of SARS-CoV-2 infection. The highly diverse clinical features of MIS-C necessities characterizing its features by subphenotypes for improved recognition and treatment. However, jointly identifying subphenotypes in multi-site settings can be challenging. We propose a distributed multi-site latent class analysis (dMLCA) approach to jointly learn MIS-C subphenotypes using data across multiple institutions. Methods We used data from the electronic health records (EHR) systems across nine U.S. children's hospitals. Among the 3,549,894 patients, we extracted 864 patients < 21 years of age who had received a diagnosis of MIS-C during an inpatient stay or up to one day before admission. Using MIS-C conditions, laboratory results, and procedure information as input features for the patients, we applied our dMLCA algorithm and identified three MIS-C subphenotypes. As validation, we characterized and compared more granular features across subphenotypes. To evaluate the specificity of the identified subphenotypes, we further compared them with the general subphenotypes identified in the COVID-19 infected patients. Findings Subphenotype 1 (46.1%) represents patients with a mild manifestation of MIS-C not requiring intensive care, with minimal cardiac involvement. Subphenotype 2 (25.3%) is associated with a high risk of shock, cardiac and renal involvement, and an intermediate risk of respiratory symptoms. Subphenotype 3 (28.6%) represents patients requiring intensive care, with a high risk of shock and cardiac involvement, accompanied by a high risk of >4 organ system being impacted. Importantly, for hospital-specific clinical decision-making, our algorithm also revealed a substantial heterogeneity in relative proportions of these three subtypes across hospitals. Properly accounting for such heterogeneity can lead to accurate characterization of the subphenotypes at the patient-level. Interpretation Our identified three MIS-C subphenotypes have profound implications for personalized treatment strategies, potentially influencing clinical outcomes. Further, the proposed algorithm facilitates federated subphenotyping while accounting for the heterogeneity across hospitals.
Collapse
Affiliation(s)
- Naimin Jing
- Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA
- Current affiliation: Biostatistics and Research Decision Sciences, Merck & Co., Inc, Kenilworth, NJ
| | - Xiaokang Liu
- Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA
| | - Qiong Wu
- Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA
| | - Suchitra Rao
- Department of Pediatrics, University of Colorado School of Medicine and Children’s Hospital Colorado, Aurora, CO
| | - Asuncion Mejias
- Division of Infectious Diseases, Department of Pediatrics, Nationwide Children’s Hospital and The Ohio State University, Columbus, OH
| | - Mitchell Maltenfort
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Julia Schuchard
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Vitaly Lorman
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Hanieh Razzaghi
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Ryan Webb
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
| | - Chuan Zhou
- Center for Child Health, Behavior and Development, Seattle Children’s Hospital, Seattle, WA
| | - Ravi Jhaveri
- Division of Infectious Diseases, Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL
| | - Grace M. Lee
- Department of Pediatrics (Infectious Diseases), Stanford University School of Medicine, Stanford, CA
| | - Nathan M. Pajor
- Division of Pulmonary Medicine, Cincinnati Children’s Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, OH
| | - Deepika Thacker
- Division of Cardiology, Nemours Children’s Health, Wilmington, DE
| | - L. Charles Bailey
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Christopher B. Forrest
- Applied Clinical Research Center, Children’s Hospital of Philadelphia, Philadelphia, PA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Yong Chen
- Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
4
|
Li S, Ning Y, Ong MEH, Chakraborty B, Hong C, Xie F, Yuan H, Liu M, Buckland DM, Chen Y, Liu N. FedScore: A privacy-preserving framework for federated scoring system development. J Biomed Inform 2023; 146:104485. [PMID: 37660960 DOI: 10.1016/j.jbi.2023.104485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 08/08/2023] [Accepted: 08/31/2023] [Indexed: 09/05/2023]
Abstract
OBJECTIVE We propose FedScore, a privacy-preserving federated learning framework for scoring system generation across multiple sites to facilitate cross-institutional collaborations. MATERIALS AND METHODS The FedScore framework includes five modules: federated variable ranking, federated variable transformation, federated score derivation, federated model selection and federated model evaluation. To illustrate usage and assess FedScore's performance, we built a hypothetical global scoring system for mortality prediction within 30 days after a visit to an emergency department using 10 simulated sites divided from a tertiary hospital in Singapore. We employed a pre-existing score generator to construct 10 local scoring systems independently at each site and we also developed a scoring system using centralized data for comparison. RESULTS We compared the acquired FedScore model's performance with that of other scoring models using the receiver operating characteristic (ROC) analysis. The FedScore model achieved an average area under the curve (AUC) value of 0.763 across all sites, with a standard deviation (SD) of 0.020. We also calculated the average AUC values and SDs for each local model, and the FedScore model showed promising accuracy and stability with a high average AUC value which was closest to the one of the pooled model and SD which was lower than that of most local models. CONCLUSION This study demonstrates that FedScore is a privacy-preserving scoring system generator with potentially good generalizability.
Collapse
Affiliation(s)
- Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore; Health Services Research Centre, Singapore Health Services, Singapore, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Mingxuan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Daniel M Buckland
- Department of Emergency Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore; Institute of Data Science, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
5
|
Keloth VK, Banda JM, Gurley M, Heider PM, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves RM, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei WQ, Williams AE, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. J Biomed Inform 2023; 142:104343. [PMID: 36935011 PMCID: PMC10428170 DOI: 10.1016/j.jbi.2023.104343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 01/21/2023] [Accepted: 03/13/2023] [Indexed: 03/19/2023]
Abstract
Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.
Collapse
Affiliation(s)
- Vipina K Keloth
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Michael Gurley
- Lurie Cancer Center, Northwestern University, Chicago, Illinois, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Georgina Kennedy
- Ingham Institute for Applied Medical Research, Sydney, Australia
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, and Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Olga V Patterson
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Verily Life Sciences, Mountain View, CA, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Kalpana Raja
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Ruth M Reeves
- TN Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, TN, USA; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Masoud Rouhizadeh
- Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, FL, USA; Biomedical Informatics and Data Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jianlin Shi
- VA Informatics and Computing Infrastructure, Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, Utah, USA; Division of Epidemiology, Department of Internal Medicine, School of Medicine, University of Utah, Salt Lake City, Utah, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
| | - Xiaoyan Wang
- Sema4 Mount Sinai Genomics Incorporation, Stamford, CT, USA
| | - Yanshan Wang
- Department of Health Information Management, Department of Biomedical Informatics, and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Rui Zhang
- Institute for Health Informatics, and Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, MN, USA
| | | | | | - Clair Blacketer
- Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Patrick Ryan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA; Janssen Pharmaceutical Research and Development LLC, Titusville, NJ, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
6
|
Liu X, Duan R, Luo C, Ogdie A, Moore JH, Kranzler HR, Bian J, Chen Y. Multisite learning of high-dimensional heterogeneous data with applications to opioid use disorder study of 15,000 patients across 5 clinical sites. Sci Rep 2022; 12:11073. [PMID: 35773438 PMCID: PMC9245877 DOI: 10.1038/s41598-022-14029-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 05/31/2022] [Indexed: 11/17/2022] Open
Abstract
Integrating data across institutions can improve learning efficiency. To integrate data efficiently while protecting privacy, we propose A one-shot, summary-statistics-based, Distributed Algorithm for fitting Penalized (ADAP) regression models across multiple datasets. ADAP utilizes patient-level data from a lead site and incorporates the first-order (ADAP1) and second-order gradients (ADAP2) of the objective function from collaborating sites to construct a surrogate objective function at the lead site, where model fitting is then completed with proper regularizations applied. We evaluate the performance of the proposed method using both simulation and a real-world application to study risk factors for opioid use disorder (OUD) using 15,000 patient data from the OneFlorida Clinical Research Consortium. Our results show that ADAP performs nearly the same as the pooled estimator but achieves higher estimation accuracy and better variable selection than the local and average estimators. Moreover, ADAP2 successfully handles heterogeneity in covariate distributions.
Collapse
Affiliation(s)
- Xiaokang Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, 423 Guardian Drive, Philadelphia, PA, 19104, USA
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Chongliang Luo
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, 423 Guardian Drive, Philadelphia, PA, 19104, USA
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Alexis Ogdie
- Department of Medicine, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90096, USA
| | - Henry R Kranzler
- Department of Psychiatry, University of Pennsylvania Perelman School of Medicine and the VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, 423 Guardian Drive, Philadelphia, PA, 19104, USA.
| |
Collapse
|
7
|
Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites. NPJ Digit Med 2022; 5:76. [PMID: 35701668 PMCID: PMC9198031 DOI: 10.1038/s41746-022-00615-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 05/19/2022] [Indexed: 11/09/2022] Open
Abstract
Integrating real-world data (RWD) from several clinical sites offers great opportunities to improve estimation with a more general population compared to analyses based on a single clinical site. However, sharing patient-level data across sites is practically challenging due to concerns about maintaining patient privacy. We develop a distributed algorithm to integrate heterogeneous RWD from multiple clinical sites without sharing patient-level data. The proposed distributed conditional logistic regression (dCLR) algorithm can effectively account for between-site heterogeneity and requires only one round of communication. Our simulation study and data application with the data of 14,215 COVID-19 patients from 230 clinical sites in the UnitedHealth Group Clinical Research Database demonstrate that the proposed distributed algorithm provides an estimator that is robust to heterogeneity in event rates when efficiently integrating data from multiple clinical sites. Our algorithm is therefore a practical alternative to both meta-analysis and existing distributed algorithms for modeling heterogeneous multi-site binary outcomes.
Collapse
|
8
|
Luo C, Duan R, Naj AC, Kranzler HR, Bian J, Chen Y. ODACH: a one-shot distributed algorithm for Cox model with heterogeneous multi-center data. Sci Rep 2022; 12:6627. [PMID: 35459767 PMCID: PMC9033863 DOI: 10.1038/s41598-022-09069-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 02/28/2022] [Indexed: 11/08/2022] Open
Abstract
We developed a One-shot Distributed Algorithm for Cox proportional-hazards model to analyze Heterogeneous multi-center time-to-event data (ODACH) circumventing the need for sharing patient-level information across sites. This algorithm implements a surrogate likelihood function to approximate the Cox log-partial likelihood function that is stratified by site using patient-level data from a lead site and aggregated information from other sites, allowing the baseline hazard functions and the distribution of covariates to vary across sites. Simulation studies and application to a real-world opioid use disorder study showed that ODACH provides estimates close to the pooled estimator, which analyzes patient-level data directly from all sites via a stratified Cox model. Compared to the estimator from meta-analysis, the inverse variance-weighted average of the site-specific estimates, ODACH estimator demonstrates less susceptibility to bias, especially when the event is rare. ODACH is thus a valuable privacy-preserving and communication-efficient method for analyzing multi-center time-to-event data.
Collapse
Affiliation(s)
- Chongliang Luo
- Division of Public Health Sciences, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Adam C Naj
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Henry R Kranzler
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania and the VISN 4 MIRECC, Crescenz VAMC, Philadelphia, PA, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|