2
|
Bakken S, Sang E, de Brujin B. Returning value to communities from the All of Us Research Program through innovative approaches for data use, analysis, dissemination, and research capacity building. J Am Med Inform Assoc 2024; 31:2773-2780. [PMID: 39657747 PMCID: PMC11631058 DOI: 10.1093/jamia/ocae276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 10/21/2024] [Indexed: 12/12/2024] Open
Affiliation(s)
- Suzanne Bakken
- School of Nursing, Columbia University, New York, NY 10032, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
- Data Science Institute, Columbia University, New York, NY 10032, United States
| | - Elaine Sang
- NewCourtland Center for Transitions and Health, School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, United States
- Leonard Davis Institute of Health Economics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Berry de Brujin
- Digital Technologies Research Centre, National Research Council Canada, Ottawa, ON K1A 0R6, Canada
| |
Collapse
|
3
|
Tong J, Shen Y, Xu A, He X, Luo C, Edmondson M, Zhang D, Lu Y, Yan C, Li R, Siegel L, Sun L, Shenkman EA, Morton SC, Malin BA, Bian J, Asch DA, Chen Y. Evaluating site-of-care-related racial disparities in kidney graft failure using a novel federated learning framework. J Am Med Inform Assoc 2024; 31:1303-1312. [PMID: 38713006 PMCID: PMC11105132 DOI: 10.1093/jamia/ocae075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 01/09/2024] [Accepted: 03/26/2024] [Indexed: 05/08/2024] Open
Abstract
OBJECTIVES Racial disparities in kidney transplant access and posttransplant outcomes exist between non-Hispanic Black (NHB) and non-Hispanic White (NHW) patients in the United States, with the site of care being a key contributor. Using multi-site data to examine the effect of site of care on racial disparities, the key challenge is the dilemma in sharing patient-level data due to regulations for protecting patients' privacy. MATERIALS AND METHODS We developed a federated learning framework, named dGEM-disparity (decentralized algorithm for Generalized linear mixed Effect Model for disparity quantification). Consisting of 2 modules, dGEM-disparity first provides accurately estimated common effects and calibrated hospital-specific effects by requiring only aggregated data from each center and then adopts a counterfactual modeling approach to assess whether the graft failure rates differ if NHB patients had been admitted at transplant centers in the same distribution as NHW patients were admitted. RESULTS Utilizing United States Renal Data System data from 39 043 adult patients across 73 transplant centers over 10 years, we found that if NHB patients had followed the distribution of NHW patients in admissions, there would be 38 fewer deaths or graft failures per 10 000 NHB patients (95% CI, 35-40) within 1 year of receiving a kidney transplant on average. DISCUSSION The proposed framework facilitates efficient collaborations in clinical research networks. Additionally, the framework, by using counterfactual modeling to calculate the event rate, allows us to investigate contributions to racial disparities that may occur at the level of site of care. CONCLUSIONS Our framework is broadly applicable to other decentralized datasets and disparities research related to differential access to care. Ultimately, our proposed framework will advance equity in human health by identifying and addressing hospital-level racial disparities.
Collapse
Affiliation(s)
- Jiayi Tong
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Yishan Shen
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Applied Mathematics and Computational Science, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Alice Xu
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Washington University in St. Louis, St. Louis, MO 63130, United States
| | - Xing He
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - Chongliang Luo
- Division of Public Health Sciences, Department of Surgery, Washington University in St. Louis, St. Louis, MO 63110, United States
| | | | - Dazheng Zhang
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Yiwen Lu
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Chao Yan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Lianne Siegel
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Lichao Sun
- Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA 18015, United States
| | - Elizabeth A Shenkman
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - Sally C Morton
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85287, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
- Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32611, United States
| | - David A Asch
- Division of General Internal Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
- Leonard Davis Institute of Health Economics, Philadelphia, PA 19104, United States
| | - Yong Chen
- The Center for Health AI and Synthesis of Evidence (CHASE), Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Applied Mathematics and Computational Science, The University of Pennsylvania, Philadelphia, PA 19104, United States
- Leonard Davis Institute of Health Economics, Philadelphia, PA 19104, United States
| |
Collapse
|
5
|
Pilgram L, Meurers T, Malin B, Schaeffner E, Eckardt KU, Prasser F. The Costs of Anonymization: Case Study Using Clinical Data. J Med Internet Res 2024; 26:e49445. [PMID: 38657232 PMCID: PMC11079766 DOI: 10.2196/49445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/14/2024] [Accepted: 02/13/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. OBJECTIVE The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. METHODS The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. RESULTS Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. CONCLUSIONS Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. TRIAL REGISTRATION German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.1093/ndt/gfr456.
Collapse
Affiliation(s)
- Lisa Pilgram
- Junior Digital Clinician Scientist Program, Biomedical Innovation Academy, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Thierry Meurers
- Medical Informatics Group, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Elke Schaeffner
- Institute of Public Health, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Kai-Uwe Eckardt
- Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Hypertension, Universitätsklinikum Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|