1
|
Meurers T, Otte K, Abu Attieh H, Briki F, Despraz J, Halilovic M, Kaabachi B, Milicevic V, Müller A, Papapostolou G, Wirth FN, Raisaro JL, Prasser F. A quantitative analysis of the use of anonymization in biomedical research. NPJ Digit Med 2025; 8:279. [PMID: 40369095 PMCID: PMC12078711 DOI: 10.1038/s41746-025-01644-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 04/16/2025] [Indexed: 05/16/2025] Open
Abstract
Anonymized biomedical data sharing faces several challenges. This systematic review analyzes 1084 PubMed-indexed studies (2018-2022) using anonymized biomedical data to quantify usage trends across geographic, regulatory, and cultural regions to identify effective approaches and inform implementation agendas. We identified a significant yearly increase in such studies with a slope of 2.16 articles per 100,000 when normalized against the total number of PubMed-indexed articles (p = 0.021). Most studies used data from the US, UK, and Australia (78.2%). This trend remained when normalized by country-specific research output. Cross-border sharing was rare (10.5% of studies). We identified twelve common data sources, primarily in the US (seven) and UK (three), including commercial (seven) and public entities (five). The prevalence of anonymization in the US, UK, and Australia suggests their practices could guide broader adoption. Rare cross-border anonymized data sharing and differences between countries with comparable regulations underscore the need for global standards.
Collapse
Affiliation(s)
- Thierry Meurers
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
| | - Karen Otte
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Hammam Abu Attieh
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Farah Briki
- Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Jérémie Despraz
- Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Mehmed Halilovic
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bayrem Kaabachi
- Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Vladimir Milicevic
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Armin Müller
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Grigorios Papapostolou
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Felix Nikolaus Wirth
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Jean Louis Raisaro
- Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Fabian Prasser
- Health Data Science Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
2
|
Liu GS, Fereydooni S, Lee MC, Polkampally S, Huynh J, Kuchibhotla S, Shah MM, Ayoub NF, Capasso R, Chang MT, Doyle PC, Holsinger FC, Patel ZM, Pepper JP, Sung CK, Creighton FX, Blevins NH, Stankovic KM. Scoping review of deep learning research illuminates artificial intelligence chasm in otolaryngology-head and neck surgery. NPJ Digit Med 2025; 8:265. [PMID: 40346307 PMCID: PMC12064819 DOI: 10.1038/s41746-025-01693-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Accepted: 04/30/2025] [Indexed: 05/11/2025] Open
Abstract
Clinical validation studies are important to translate artificial intelligence (AI) technology in healthcare but may be underperformed in Otolaryngology - Head & Neck Surgery (OHNS). This scoping review examined deep learning publications in OHNS between 1996 and 2023. Searches on MEDLINE, EMBASE, and Web of Science databases identified 3236 articles of which 444 met inclusion criteria. Publications increased exponentially from 2012-2022 across 48 countries and were most concentrated in otology and neurotology (28%), most targeted extending health care provider capabilities (56%), and most used image input data (55%) and convolutional neural network models (63%). Strikingly, nearly all studies (99.3%) were in silico, proof of concept early-stage studies. Three (0.7%) studies conducted offline validation and zero (0%) clinical validation, illuminating the "AI chasm" in OHNS. Recommendations to cross this chasm include focusing on low complexity and low risk tasks, adhering to reporting guidelines, and prioritizing clinical translation studies.
Collapse
Affiliation(s)
- George S Liu
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA.
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA.
| | - Soraya Fereydooni
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Melissa Chaehyun Lee
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Srinidhi Polkampally
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Jeffrey Huynh
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Sravya Kuchibhotla
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Mihir M Shah
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Noel F Ayoub
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Robson Capasso
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Michael T Chang
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Philip C Doyle
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | | | - Zara M Patel
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Jon-Paul Pepper
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - C Kwang Sung
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | - Francis X Creighton
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA
| | - Nikolas H Blevins
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
3
|
Min S, Asif H, Wang X, Vaidya J. Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2025; 37:2266-2281. [PMID: 40322292 PMCID: PMC12048026 DOI: 10.1109/tkde.2025.3537403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
Federated learning (FL), a decentralized machine learning approach, offers great performance while alleviating autonomy and confidentiality concerns. Despite FL's popularity, how to deal with missing values in a federated manner is not well understood. In this work, we initiate a study of federated imputation of missing values, particularly in complex scenarios, where missing data heterogeneity exists and the state-of-the-art (SOTA) approaches for federated imputation suffer from significant loss in imputation quality. We propose Cafe, a personalized FL approach for missing data imputation. Cafe is inspired from the observation that heterogeneity can induce differences in observable and missing data distribution across clients, and that these differences can be leveraged to improve the imputation quality. Cafe computes personalized weights that are automatically calibrated for the level of heterogeneity, which can remain unknown, to develop personalized imputation models for each client. An extensive empirical evaluation over a variety of settings demonstrates that Cafe matches the performance of SOTA baselines in homogeneous settings while significantly outperforming the baselines in heterogeneous settings.
Collapse
Affiliation(s)
| | - Hafiz Asif
- Hofstra University, Long Island, NY, USA
| | | | | |
Collapse
|
4
|
Nakayama T, Kawamata Y, Toyoda A, Imakura A, Kagawa R, Sanuki M, Tsunoda R, Yamagata K, Sakurai T, Okada Y. Data collaboration for causal inference from limited medical testing and medication data. Sci Rep 2025; 15:9827. [PMID: 40118898 PMCID: PMC11928589 DOI: 10.1038/s41598-025-93509-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 03/07/2025] [Indexed: 03/24/2025] Open
Abstract
Observational studies enable causal inferences when randomized controlled trials (RCTs) are not feasible. However, integrating sensitive medical data across multiple institutions introduces significant privacy challenges. The data collaboration quasi-experiment (DC-QE) framework addresses these concerns by sharing "intermediate representations"-dimensionality-reduced data derived from raw data-instead of the raw data. Although DC-QE can estimate treatment effects, its application to medical data remains unexplored. The aim of this study was to apply the DC-QE framework to medical data from a single institution to simulate distributed data environments under independent and identically distributed (IID) and non-IID conditions. We propose a method for generating intermediate representations within the DC-QE framework. Experimental results show that DC-QE consistently outperformed individual analyses across various accuracy metrics, closely approximating the performance of centralized analysis. The proposed method further improved performance, particularly under non-IID conditions. These outcomes highlight the potential of the DC-QE framework as a robust approach for privacy-preserving causal inferences in healthcare. Broader adoption of this framework and increased use of intermediate representations could grant researchers access to larger, more diverse datasets while safeguarding patient confidentiality. This approach may ultimately aid in identifying previously unrecognized causal relationships, support drug repurposing efforts, and enhance therapeutic interventions for rare diseases.
Collapse
Affiliation(s)
- Tomoru Nakayama
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Yuji Kawamata
- Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan.
| | - Akihiro Toyoda
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Akira Imakura
- Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
| | - Rina Kagawa
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
| | - Masaru Sanuki
- Faculty of Medicine, Department of Clinical Medicine, University of Tsukuba, Tsukuba, Japan
| | - Ryoya Tsunoda
- Faculty of Medicine, Department of Nephrology, University of Tsukuba, Tsukuba, Japan
| | - Kunihiro Yamagata
- Faculty of Medicine, Department of Nephrology, University of Tsukuba, Tsukuba, Japan
| | - Tetsuya Sakurai
- Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
| | - Yukihiko Okada
- Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
5
|
Noonan VK, Humphreys S, Biering-Sørensen F, Charlifue S, Chen Y, Guest JD, Jones LAT, French J, Widerström-Noga E, Lemmon VP, Heinemann AW, Schwab JM, Phillips AA, Rizi MM, Kramer JLK, Jutzeler CR, Torres-Espin A. Enhancing data standards to advance translation in spinal cord injury. Exp Neurol 2025; 384:115048. [PMID: 39522801 DOI: 10.1016/j.expneurol.2024.115048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 11/03/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024]
Abstract
Data standards are available for spinal cord injury (SCI). The International SCI Data Sets were created in 2002 and there are currently 27 freely available. In 2014 the National Institute of Neurological Disorders and Stroke developed clinical common data elements to promote clinical data sharing in SCI. The objective of this paper is to provide an overview of SCI data standards, describe learnings from the traumatic brain injury (TBI) field using data to enhance research and care, and discuss future opportunities in SCI. Given the complexity of SCI, frameworks such as a systems medicine approach and Big Data perspective have been advanced. Implementation of these frameworks require multi-modal data and a shift towards open science and principles such as requiring data to be FAIR (Findable, Accessible, Interoperable and Reusable). Advanced analytics such as artificial intelligence require data to be interoperable so data can be exchanged among different technology systems and software applications. The TBI field has multiple ongoing initiatives to promote sharing and data reuse for both pre-clinical and clinical studies, which is an opportunity for the SCI field given these injuries can often occur concomitantly. The adoption of interoperable standards, data sharing, open science, and the use of advanced analytics in SCI is needed to facilitate translation in research and care. It is critical that people with lived experience are engaged to ensure data are relevant and enhances quality of life.
Collapse
Affiliation(s)
- Vanessa K Noonan
- Praxis Spinal Cord Institute, Vancouver, BC, Canada; International Collaboration on Repair Discoveries (ICORD), University of British Columbia, Vancouver, BC, Canada.
| | | | - Fin Biering-Sørensen
- Department of Clinical Medicine, University of Copenhagen and Department of Brain and Spinal Cord Injuries, Bodil Eskesen Center, Rigshospitalet, Glostrup, Denmark
| | | | - Yuying Chen
- University of Alabama at Birmingham, Birmingham, AL, USA
| | - James D Guest
- Department of Neurosurgery and the Miami Project to Cure Paralysis, the Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Linda A T Jones
- Center for Outcomes and Measurement, Jefferson College of Rehabilitation Sciences, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jennifer French
- Neurotech Network, Saint Petersburg, FL, USA; North American Spinal Cord Injury Consortium, Niagara Falls, NY, USA
| | - Eva Widerström-Noga
- Miami Project to Cure Paralysis, University of Miami, Miami, FL, USA; Neuroscience Graduate Program, University of Miami, Miami, FL, USA
| | - Vance P Lemmon
- Miami Project to Cure Paralysis, University of Miami, Miami, FL, USA; Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Allen W Heinemann
- Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Evanston, IL, USA; Center for Rehabilitation Outcomes Research, Shirley Ryan AbilityLab, Chicago, IL, USA
| | - Jan M Schwab
- Department of Neurology and Experimental Neurology, Clinical and Experimental Spinal Cord Injury Research, Charité - Universitätsmedizin Berlin, Berlin, Germany; Department of Neurology, Spinal Cord Injury Division, The Ohio State University, Wexner Medical Center, Columbus, OH, USA; Belford Center for Spinal Cord Injury, Departments of Physical Medicine and Rehabilitation and Neuroscience, The Ohio State University, Wexner Medical Center, Columbus, OH, USA
| | - Aaron A Phillips
- Departments of Physiology and Pharmacology, Clinical Neurosciences, Cardiac Sciences, Biomedical Engineering, Hotchkiss Brain Institute, Libin Cardiovascular Institute, Restore Network, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | | | - John L K Kramer
- International Collaboration on Repair Discoveries (ICORD), University of British Columbia, Vancouver, BC, Canada; Department of Anesthesiology, Pharmacology, & Therapeutics, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Catherine R Jutzeler
- Department of Health Sciences and Technology (D-HEST), ETH Zurich, Universitätstrasse 2, 8092 Zürich, Switzerland; SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, 1015 Lausanne, Switzerland
| | - Abel Torres-Espin
- School of Public Health Sciences, University of Waterloo, Waterloo, ON, Canada; Department of Physical Therapy, Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, AB, Canada; Department of Neurosurgery, Brain and Spinal Injury Center, Weill Institutes for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
6
|
Limpoco MAA, Faes C, Hens N. Linear Mixed Modeling of Federated Data When Only the Mean, Covariance, and Sample Size Are Available. Stat Med 2025; 44:e10300. [PMID: 39663139 DOI: 10.1002/sim.10300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 10/08/2024] [Accepted: 11/21/2024] [Indexed: 12/13/2024]
Abstract
In medical research, individual-level patient data provide invaluable information, but the patients' right to confidentiality remains of utmost priority. This poses a huge challenge when estimating statistical models such as a linear mixed model, which is an extension of linear regression models that can account for potential heterogeneity whenever data come from different data providers. Federated learning tackles this hurdle by estimating parameters without retrieving individual-level data. Instead, iterative communication of parameter estimate updates between the data providers and analysts is required. In this article, we propose an alternative framework to federated learning for fitting linear mixed models. Specifically, our approach only requires the mean, covariance, and sample size of multiple covariates from different data providers once. Using the principle of statistical sufficiency within the likelihood framework as theoretical support, this proposed strategy achieves estimates identical to those derived from actual individual-level data. We demonstrate this approach through real data on 15 068 patient records from 70 clinics at the Children's Hospital of Pennsylvania. Assuming that each clinic only shares summary statistics once, we model the COVID-19 polymerase chain reaction test cycle threshold as a function of patient information. Simplicity, communication efficiency, generalisability, and wider scope of implementation in any statistical software distinguish our approach from existing strategies in the literature.
Collapse
Affiliation(s)
- Marie Analiz April Limpoco
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Data Science Institute (DSI), Hasselt University, Hasselt, Belgium
| | - Christel Faes
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Data Science Institute (DSI), Hasselt University, Hasselt, Belgium
| | - Niel Hens
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Data Science Institute (DSI), Hasselt University, Hasselt, Belgium
- Centre for Health Economic Research and Modelling Infectious Diseases (CHERMID), Vaccine & Infectious Disease Institute, Antwerp University, Antwerp, Belgium
| |
Collapse
|
7
|
Calvino G, Peconi C, Strafella C, Trastulli G, Megalizzi D, Andreucci S, Cascella R, Caltagirone C, Zampatti S, Giardina E. Federated Learning: Breaking Down Barriers in Global Genomic Research. Genes (Basel) 2024; 15:1650. [PMID: 39766917 PMCID: PMC11728131 DOI: 10.3390/genes15121650] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 12/15/2024] [Accepted: 12/20/2024] [Indexed: 01/11/2025] Open
Abstract
Recent advancements in Next-Generation Sequencing (NGS) technologies have revolutionized genomic research, presenting unprecedented opportunities for personalized medicine and population genetics. However, issues such as data silos, privacy concerns, and regulatory challenges hinder large-scale data integration and collaboration. Federated Learning (FL) has emerged as a transformative solution, enabling decentralized data analysis while preserving privacy and complying with regulations such as the General Data Protection Regulation (GDPR). This review explores the potential use of FL in genomics, detailing its methodology, including local model training, secure aggregation, and iterative improvement. Key challenges, such as heterogeneous data integration and cybersecurity risks, are examined alongside regulations like GDPR. In conclusion, successful implementations of FL in global and national initiatives demonstrate its scalability and role in supporting collaborative research. Finally, we discuss future directions, including AI integration and the necessity of education and training, to fully harness the potential of FL in advancing precision medicine and global health initiatives.
Collapse
Affiliation(s)
- Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
- Department of Systems Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| | - Sarah Andreucci
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1010 Tirana, Albania
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
| | - Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
8
|
Li S, Miao D, Wu Q, Hong C, D’Agostino D, Li X, Ning Y, Shang Y, Wang Z, Liu M, Fu H, Ong MEH, Haddadi H, Liu N. Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis. HEALTH DATA SCIENCE 2024; 4:0196. [PMID: 39635226 PMCID: PMC11615161 DOI: 10.34133/hds.0196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 09/12/2024] [Accepted: 09/13/2024] [Indexed: 12/07/2024]
Abstract
Background: Federated learning (FL) holds promise for safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also developed privacy-preserving algorithms, though these are less recognized. Our goal was to bridge this gap with the first comprehensive comparison of FL frameworks from both domains. Methods: We assessed 7 FL frameworks, encompassing both engineering-based and statistical FL algorithms, and compared them against local and centralized modeling of logistic regression and least absolute shrinkage and selection operator (Lasso). Our evaluation utilized both simulated data and real-world emergency department data, focusing on comparing both estimated model coefficients and the performance of model predictions. Results: The findings reveal that statistical FL algorithms produce much less biased estimates of model coefficients. Conversely, engineering-based methods can yield models with slightly better prediction performance, occasionally outperforming both centralized and statistical FL models. Conclusion: This study underscores the relative strengths and weaknesses of both types of methods, providing recommendations for their selection based on distinct study characteristics. Furthermore, we emphasize the critical need to raise awareness of and integrate these methods into future applications of FL within the healthcare domain.
Collapse
Affiliation(s)
- Siqi Li
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Di Miao
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Qiming Wu
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics,
Duke University, Durham, NC, USA
| | - Danny D’Agostino
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Xin Li
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Yuqing Shang
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Ziwen Wang
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
| | - Molei Liu
- Department of Biostatistics,
Columbia University Mailman School of Public Health, New York, NY, USA
| | - Huazhu Fu
- Institute of High Performance Computing,
Agency for Science, Technology and Research, Singapore, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research,
Duke-NUS Medical School, Singapore, Singapore
- Health Services Research Centre,
Singapore Health Services, Singapore, Singapore
- Department of Emergency Medicine,
Singapore General Hospital, Singapore, Singapore
| | - Hamed Haddadi
- Department of Computing,
Imperial College London, London, England, UK
| | - Nan Liu
- Centre for Quantitative Medicine,
Duke-NUS Medical School, Singapore, Singapore
- Programme in Health Services and Systems Research,
Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science,
National University of Singapore, Singapore, Singapore
| |
Collapse
|
9
|
Tajabadi M, Martin R, Heider D. Privacy-preserving decentralized learning methods for biomedical applications. Comput Struct Biotechnol J 2024; 23:3281-3287. [PMID: 39296807 PMCID: PMC11408144 DOI: 10.1016/j.csbj.2024.08.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/26/2024] [Accepted: 08/26/2024] [Indexed: 09/21/2024] Open
Abstract
In recent years, decentralized machine learning has emerged as a significant advancement in biomedical applications, offering robust solutions for data privacy, security, and collaboration across diverse healthcare environments. In this review, we examine various decentralized learning methodologies, including federated learning, split learning, swarm learning, gossip learning, edge learning, and some of their applications in the biomedical field. We delve into the underlying principles, network topologies, and communication strategies of each approach, highlighting their advantages and limitations. Ultimately, the selection of a suitable method should be based on specific needs, infrastructures, and computational capabilities.
Collapse
Affiliation(s)
- Mohammad Tajabadi
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| | - Roman Martin
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| | - Dominik Heider
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| |
Collapse
|
10
|
Zwiers LC, Grobbee DE, Uijl A, Ong DSY. Federated learning as a smart tool for research on infectious diseases. BMC Infect Dis 2024; 24:1327. [PMID: 39573994 PMCID: PMC11580691 DOI: 10.1186/s12879-024-10230-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 11/14/2024] [Indexed: 11/25/2024] Open
Abstract
BACKGROUND The use of real-world data has become increasingly popular, also in the field of infectious disease (ID), particularly since the COVID-19 pandemic emerged. While much useful data for research is being collected, these data are generally stored across different sources. Privacy concerns limit the possibility to store the data centrally, thereby also limiting the possibility of fully leveraging the potential power of combined data. Federated learning (FL) has been suggested to overcome privacy issues by making it possible to perform research on data from various sources without those data leaving local servers. In this review, we discuss existing applications of FL in ID research, as well as the most relevant opportunities and challenges of this method. METHODS References for this review were identified through searches of MEDLINE/PubMed, Google Scholar, Embase and Scopus until July 2023. We searched for studies using FL in different applications related to ID. RESULTS Thirty references were included and divided into four sub-topics: disease screening, prediction of clinical outcomes, infection epidemiology, and vaccine research. Most research was related to COVID-19. In all studies, FL achieved good accuracy when predicting diseases and outcomes, also in comparison to non-federated methods. However, most studies did not make use of real-world federated data, but rather showed the potential of FL by using data that was manually partitioned. CONCLUSIONS FL is a promising methodology which allows using data from several sources, potentially generating stronger and more generalisable results. However, further exploration of FL application possibilities in ID research is needed.
Collapse
Affiliation(s)
- Laura C Zwiers
- Julius Global Health, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
- Julius Clinical, Zeist, The Netherlands.
| | - Diederick E Grobbee
- Julius Global Health, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Julius Clinical, Zeist, The Netherlands
| | - Alicia Uijl
- Julius Global Health, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Department of Cardiology, Amsterdam University Medical Centers, Amsterdam Cardiovascular Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Division of Cardiology, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - David S Y Ong
- Julius Global Health, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Julius Clinical, Zeist, The Netherlands
- Department of Medical Microbiology and Infection Control, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands
| |
Collapse
|
11
|
Alves CL, Martinelli T, Sallum LF, Rodrigues FA, Toutain TGLDO, Porto JAM, Thielemann C, Aguiar PMDC, Moeckel M. Multiclass classification of Autism Spectrum Disorder, attention deficit hyperactivity disorder, and typically developed individuals using fMRI functional connectivity analysis. PLoS One 2024; 19:e0305630. [PMID: 39418298 PMCID: PMC11486369 DOI: 10.1371/journal.pone.0305630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 06/03/2024] [Indexed: 10/19/2024] Open
Abstract
Neurodevelopmental conditions, such as Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD), present unique challenges due to overlapping symptoms, making an accurate diagnosis and targeted intervention difficult. Our study employs advanced machine learning techniques to analyze functional magnetic resonance imaging (fMRI) data from individuals with ASD, ADHD, and typically developed (TD) controls, totaling 120 subjects in the study. Leveraging multiclass classification (ML) algorithms, we achieve superior accuracy in distinguishing between ASD, ADHD, and TD groups, surpassing existing benchmarks with an area under the ROC curve near 98%. Our analysis reveals distinct neural signatures associated with ASD and ADHD: individuals with ADHD exhibit altered connectivity patterns of regions involved in attention and impulse control, whereas those with ASD show disruptions in brain regions critical for social and cognitive functions. The observed connectivity patterns, on which the ML classification rests, agree with established diagnostic approaches based on clinical symptoms. Furthermore, complex network analyses highlight differences in brain network integration and segregation among the three groups. Our findings pave the way for refined, ML-enhanced diagnostics in accordance with established practices, offering a promising avenue for developing trustworthy clinical decision-support systems.
Collapse
Affiliation(s)
- Caroline L. Alves
- Laboratory for Hybrid Modeling, Aschaffenburg University of Applied Sciences, Aschaffenburg, Bayern, Germany
| | - Tiago Martinelli
- Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, São Paulo, Brazil
| | - Loriz Francisco Sallum
- Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, São Paulo, Brazil
| | | | | | - Joel Augusto Moura Porto
- Institute of Physics of São Carlos (IFSC), University of São Paulo (USP), São Carlos, São Paulo, Brazil
- Institute of Biological Information Processing, Heinrich Heine University Düsseldorf, Düsseldorf, North Rhine–Westphalia Land, Germany
| | - Christiane Thielemann
- BioMEMS Lab, Aschaffenburg University of Applied Sciences, Aschaffenburg, Bayern, Germany
| | - Patrícia Maria de Carvalho Aguiar
- Hospital Israelita Albert Einstein, São Paulo, São Paulo, Brazil
- Department of Neurology and Neurosurgery, Federal University of São Paulo, São Paulo, São Paulo, Brazil
| | - Michael Moeckel
- Laboratory for Hybrid Modeling, Aschaffenburg University of Applied Sciences, Aschaffenburg, Bayern, Germany
| |
Collapse
|
12
|
Pollard S, Ehman M, Hermansen A, Weymann D, Krebs E, Ho C, Lim HJ, Jones S, Bombard Y, Hanna TP, Hessels C, Longstaff H, Cook-Deegan R, Bubela T, Regier DA. "I Just Assumed This Was Already Being Done": Canadian Patient Preferences for Enhanced Data Sharing for Precision Oncology. JCO Precis Oncol 2024; 8:e2400184. [PMID: 39116357 PMCID: PMC11371116 DOI: 10.1200/po.24.00184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/02/2024] [Accepted: 06/11/2024] [Indexed: 08/10/2024] Open
Abstract
PURPOSE In Canada, health data are siloed, slowing bioinnovation and evidence generation for personalized cancer care. Secured data-sharing platforms (SDSPs) can enable data analysis across silos through rapid concatenation across trial and real-world settings and timely researcher access. To motivate patient participation and trust in research, it is critical to ensure that SDSP design and oversight align with patients' values and address their concerns. We sought to qualitatively characterize patient preferences for the design of a pan-Canadian SDSP. METHODS Between January 2022 and July 2023, we conducted pan-Canadian virtual focus groups with individuals who had a personal history of cancer. Following each focus group, participants were invited to provide feedback on early-phase analysis results via a member-checking survey. Three trained qualitative researchers analyzed data using thematic analysis. RESULTS Twenty-eight individuals participated across five focus groups. Four focus groups were conducted in English and one in French. Thematic analysis generated two major and five minor themes. Analytic themes spanned personal and population implications of data sharing and willingness to manage perceived risks. Participants were supportive of increasing access to health data for precision oncology research, while voicing concerns about unintended data use, reidentification, and inequitable access to costly therapeutics. To mitigate perceived risks, participants highlighted the value of data access oversight and governance and informational transparency. CONCLUSION Strategies for secured data sharing should anticipate and mitigate the risks that patients perceive. Participants supported enhancing timely research capability while ensuring safeguards to protect patient autonomy and privacy. Our study informs the development of data-governance and data-sharing frameworks that integrate real-world and trial data, informed by evidence from direct patient input.
Collapse
Affiliation(s)
- Samantha Pollard
- Cancer Control Research, BC Cancer Research Institute, Vancouver, BC, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Morgan Ehman
- Cancer Control Research, BC Cancer Research Institute, Vancouver, BC, Canada
| | - Anna Hermansen
- Cancer Control Research, BC Cancer Research Institute, Vancouver, BC, Canada
- School of Population and Public Health, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Deirdre Weymann
- Cancer Control Research, BC Cancer Research Institute, Vancouver, BC, Canada
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Emanuel Krebs
- Cancer Control Research, BC Cancer Research Institute, Vancouver, BC, Canada
| | - Cheryl Ho
- Department of Medical Oncology, BC Cancer, Vancouver, BC, Canada
- Department of Medicine, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Howard J. Lim
- Department of Medical Oncology, BC Cancer, Vancouver, BC, Canada
- Department of Medicine, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Steven Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Yvonne Bombard
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- Genomics Health Services Research Program, Li Ka Shing Knowledge Institute of St Michael's Hospital, Unity Health Toronto, Toronto, ON, Canada
| | - Timothy P. Hanna
- Department of Oncology, Queen's University, Kingston, ON, Canada
- Department of Public Health Science, Queen's University, Kingston, ON, Canada
| | - Chiquita Hessels
- Li-Fraumeni Syndrome Association Canada, British Columbia, Canada
| | | | | | - Tania Bubela
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Dean A. Regier
- Cancer Control Research, BC Cancer Research Institute, Vancouver, BC, Canada
- School of Population and Public Health, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
13
|
Pirmani A, Oldenhof M, Peeters LM, De Brouwer E, Moreau Y. Accessible Ecosystem for Clinical Research (Federated Learning for Everyone): Development and Usability Study. JMIR Form Res 2024; 8:e55496. [PMID: 39018557 PMCID: PMC11292148 DOI: 10.2196/55496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/25/2024] [Accepted: 05/15/2024] [Indexed: 07/19/2024] Open
Abstract
BACKGROUND The integrity and reliability of clinical research outcomes rely heavily on access to vast amounts of data. However, the fragmented distribution of these data across multiple institutions, along with ethical and regulatory barriers, presents significant challenges to accessing relevant data. While federated learning offers a promising solution to leverage insights from fragmented data sets, its adoption faces hurdles due to implementation complexities, scalability issues, and inclusivity challenges. OBJECTIVE This paper introduces Federated Learning for Everyone (FL4E), an accessible framework facilitating multistakeholder collaboration in clinical research. It focuses on simplifying federated learning through an innovative ecosystem-based approach. METHODS The "degree of federation" is a fundamental concept of FL4E, allowing for flexible integration of federated and centralized learning models. This feature provides a customizable solution by enabling users to choose the level of data decentralization based on specific health care settings or project needs, making federated learning more adaptable and efficient. By using an ecosystem-based collaborative learning strategy, FL4E encourages a comprehensive platform for managing real-world data, enhancing collaboration and knowledge sharing among its stakeholders. RESULTS Evaluating FL4E's effectiveness using real-world health care data sets has highlighted its ecosystem-oriented and inclusive design. By applying hybrid models to 2 distinct analytical tasks-classification and survival analysis-within real-world settings, we have effectively measured the "degree of federation" across various contexts. These evaluations show that FL4E's hybrid models not only match the performance of fully federated models but also avoid the substantial overhead usually linked with these models. Achieving this balance greatly enhances collaborative initiatives and broadens the scope of analytical possibilities within the ecosystem. CONCLUSIONS FL4E represents a significant step forward in collaborative clinical research by merging the benefits of centralized and federated learning. Its modular ecosystem-based design and the "degree of federation" feature make it an inclusive, customizable framework suitable for a wide array of clinical research scenarios, promising to revolutionize the field through improved collaboration and data use. Detailed implementation and analyses are available on the associated GitHub repository.
Collapse
Affiliation(s)
- Ashkan Pirmani
- ESAT-STADIUS, KU Leuven, Leuven, Belgium
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
- University Multiple Sclerosis Center, Hasselt University, Diepenbeek, Belgium
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
| | | | - Liesbet M Peeters
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
- University Multiple Sclerosis Center, Hasselt University, Diepenbeek, Belgium
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
| | | | | |
Collapse
|
14
|
Zhang F, Kreuter D, Chen Y, Dittmer S, Tull S, Shadbahr T, Preller J, Rudd JH, Aston JA, Schönlieb CB, Gleadall N, Roberts M. Recent methodological advances in federated learning for healthcare. PATTERNS (NEW YORK, N.Y.) 2024; 5:101006. [PMID: 39005485 PMCID: PMC11240178 DOI: 10.1016/j.patter.2024.101006] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
For healthcare datasets, it is often impossible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data have many simultaneous challenges, such as highly siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables, that require new methodologies to address. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus published between January 2015 and February 2023 that describe new federated learning methodologies for addressing challenges with healthcare data. We reviewed 89 papers meeting these criteria. Significant systemic issues were identified throughout the literature, compromising many methodologies reviewed. We give detailed recommendations to help improve methodology development for federated learning in healthcare.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Daniel Kreuter
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Yichen Chen
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Sören Dittmer
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- ZeTeM, University of Bremen, Bremen, Germany
| | - Samuel Tull
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Tolou Shadbahr
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jacobus Preller
- Addenbrooke’s Hospital, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - James H.F. Rudd
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - John A.D. Aston
- Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Carola-Bibiane Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | | | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
15
|
D'Amico S, Dall’Olio L, Rollo C, Alonso P, Prada-Luengo I, Dall’Olio D, Sala C, Sauta E, Asti G, Lanino L, Maggioni G, Campagna A, Zazzetti E, Delleani M, Bicchieri ME, Morandini P, Savevski V, Arroyo B, Parras J, Zhao LP, Platzbecker U, Diez-Campelo M, Santini V, Fenaux P, Haferlach T, Krogh A, Zazo S, Fariselli P, Sanavia T, Della Porta MG, Castellani G. MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers. JCO Clin Cancer Inform 2024; 8:e2400008. [PMID: 38875514 PMCID: PMC11371092 DOI: 10.1200/cci.24.00008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/14/2024] [Accepted: 04/15/2024] [Indexed: 06/16/2024] Open
Abstract
PURPOSE Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)-based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities. METHODS We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure. RESULTS UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models. CONCLUSION MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.
Collapse
Affiliation(s)
- Saverio D'Amico
- Humanitas Clinical and Research Center—IRCCS, Milan, Italy
- Train s.r.l., Milan, Italy
| | | | - Cesare Rollo
- Computational Biomedicine Unit, Department of Medical Sciences, University of Turin, Turin, Italy
| | - Patricia Alonso
- Department of Signals, Systems and Radiocommunications, Polytechnic University of Madrid, Madrid, Spain
| | | | | | - Claudia Sala
- Experimental, Diagnostic and Specialty Medicine—DIMES, Bologna, Italy
| | | | - Gianluca Asti
- Humanitas Clinical and Research Center—IRCCS, Milan, Italy
| | - Luca Lanino
- Humanitas Clinical and Research Center—IRCCS, Milan, Italy
| | | | | | - Elena Zazzetti
- Humanitas Clinical and Research Center—IRCCS, Milan, Italy
| | | | | | | | | | - Borja Arroyo
- Department of Signals, Systems and Radiocommunications, Polytechnic University of Madrid, Madrid, Spain
| | - Juan Parras
- Department of Signals, Systems and Radiocommunications, Polytechnic University of Madrid, Madrid, Spain
| | - Lin Pierre Zhao
- Hematology and Bone Marrow Transplantation, Hôpital Saint-Louis/University Paris 7, Paris, France
| | - Uwe Platzbecker
- Medical Clinic and Policlinic 1, Hematology and Cellular Therapy, University Hospital Leipzig, Leipzig, Germany
| | - Maria Diez-Campelo
- Hematology Department, Hospital Universitario de Salamanca, Salamanca, Spain
| | - Valeria Santini
- Hematology, Azienda Ospedaliero-Universitaria Careggi & University of Florence, Florence, Italy
| | - Pierre Fenaux
- Hematology and Bone Marrow Transplantation, Hôpital Saint-Louis/University Paris 7, Paris, France
| | | | | | - Santiago Zazo
- Department of Signals, Systems and Radiocommunications, Polytechnic University of Madrid, Madrid, Spain
| | - Piero Fariselli
- Computational Biomedicine Unit, Department of Medical Sciences, University of Turin, Turin, Italy
| | - Tiziana Sanavia
- Computational Biomedicine Unit, Department of Medical Sciences, University of Turin, Turin, Italy
| | - Matteo Giovanni Della Porta
- Humanitas Clinical and Research Center—IRCCS, Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
| | - Gastone Castellani
- Department of Physics and Astronomy (DIFA), Bologna, Italy
- Experimental, Diagnostic and Specialty Medicine—DIMES, Bologna, Italy
| |
Collapse
|
16
|
Acharya N, Natarajan K. Development and Validation of an Individual Socioeconomic Deprivation Index (ISDI) in the NIH's All of Us Data Network. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:36-45. [PMID: 38827060 PMCID: PMC11141807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Many of the existing composite social determinant of health indices, such as Area Deprivation Index, are constrained by their reliance on geographic approximations and American Community Survey data. This study builds on the body of literature around deprivation indices to construct an individual socioeconomic deprivation index (ISDI) within the NIH's All of Us Data Network by using weighted multiple correspondence analysis on SDOH data elements collected at the participant level. In this study, the correlation between ISDI and another area-approximated index is assessed to the extent possible, along with the changes in an AI models performance due to stratified sampling based on ISDI quintiles. Individual level deprivation indices may have a wide range of utility particularly in the context of precision medicine in both centralized and distributed data networks.
Collapse
Affiliation(s)
- Nripendra Acharya
- Columbia University Medical Center, Department of Biomedical Informatics, New York, New York
| | - Karthik Natarajan
- Columbia University Medical Center, Department of Biomedical Informatics, New York, New York
| |
Collapse
|
17
|
Su C, Wei J, Lei Y, Xuan H, Li J. Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering. PLoS One 2024; 19:e0298261. [PMID: 38598458 PMCID: PMC11006173 DOI: 10.1371/journal.pone.0298261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/22/2024] [Indexed: 04/12/2024] Open
Abstract
In the realm of targeted advertising, the demand for precision is paramount, and the traditional centralized machine learning paradigm fails to address this necessity effectively. Two critical challenges persist in the current advertising ecosystem: the data privacy concerns leading to isolated data islands and the complexity in handling non-Independent and Identically Distributed (non-IID) data and concept drift due to the specificity and diversity in user behavior data. Current federated learning frameworks struggle to overcome these hurdles satisfactorily. This paper introduces Fed-GANCC, an innovative federated learning framework that synergizes Generative Adversarial Networks (GANs) and Group Clustering. The framework incorporates a user data augmentation algorithm predicated on adversarial generative networks to enrich user behavior data, curtail the impact of non-uniform data distribution, and enhance the applicability of the global machine learning model. Unlike traditional approaches, our framework offers user data augmentation algorithms based on adversarial generative networks, which not only enriches user behavior data but also reduces the challenges posed by non-uniform data distribution, thereby enhancing the applicability of the global machine learning (ML) model. The effectiveness of Fed-GANCC is distinctly showcased through experimental results, outperforming contemporary methods like FED-AVG and FED-SGD in terms of accuracy, loss value, and receiver operating characteristic (ROC) indicators within the same computing time. Experimental results vindicate the effectiveness of Fed-GANCC, revealing substantial enhancements in accuracy, loss value, and receiver operating characteristic (ROC) metrics compared to FED-AVG and FED-SGD given the same computational time. These outcomes underline Fed-GANCC's exceptional prowess in mitigating issues such as isolated data islands, non-IID data, and concept drift. With its novel approach to addressing the prevailing challenges in targeted advertising such as isolated data islands, non-IID data, and concept drift, the Fed-GANCC framework stands as a benchmark, paving the way for future advancements in federated learning solutions tailored for the advertising domain. The Fed-GANCC framework promises to offer pivotal insights for the future development of efficient and advanced federated learning solutions for targeted advertising.
Collapse
Affiliation(s)
- Caiyu Su
- Guangxi Vocational & Technical Institute of Industry, Nanning, Guangxi, China
| | - Jinri Wei
- Guangxi Vocational & Technical Institute of Industry, Nanning, Guangxi, China
| | - Yuan Lei
- Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - Hongkun Xuan
- Guangxi University of Foreign Languages, Nanning, Guangxi, China
| | - Jiahui Li
- Guangxi University of Foreign Languages, Nanning, Guangxi, China
| |
Collapse
|
18
|
Teo ZL, Jin L, Li S, Miao D, Zhang X, Ng WY, Tan TF, Lee DM, Chua KJ, Heng J, Liu Y, Goh RSM, Ting DSW. Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Rep Med 2024; 5:101419. [PMID: 38340728 PMCID: PMC10897620 DOI: 10.1016/j.xcrm.2024.101419] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/17/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024]
Abstract
Federated learning (FL) is a distributed machine learning framework that is gaining traction in view of increasing health data privacy protection needs. By conducting a systematic review of FL applications in healthcare, we identify relevant articles in scientific, engineering, and medical journals in English up to August 31st, 2023. Out of a total of 22,693 articles under review, 612 articles are included in the final analysis. The majority of articles are proof-of-concepts studies, and only 5.2% are studies with real-life application of FL. Radiology and internal medicine are the most common specialties involved in FL. FL is robust to a variety of machine learning models and data types, with neural networks and medical imaging being the most common, respectively. We highlight the need to address the barriers to clinical translation and to assess its real-world impact in this new digital data-driven healthcare scene.
Collapse
Affiliation(s)
- Zhen Ling Teo
- Singapore National Eye Centre, Singapore, Singapore; Singapore Eye Research Institute, Singapore, Singapore
| | - Liyuan Jin
- Singapore Eye Research Institute, Singapore, Singapore; Duke-NUS Medical School, Singapore, Singapore
| | - Siqi Li
- Singapore Eye Research Institute, Singapore, Singapore; Duke-NUS Medical School, Singapore, Singapore
| | - Di Miao
- Singapore Eye Research Institute, Singapore, Singapore; Duke-NUS Medical School, Singapore, Singapore
| | - Xiaoman Zhang
- Singapore Eye Research Institute, Singapore, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Wei Yan Ng
- Singapore National Eye Centre, Singapore, Singapore; Singapore Eye Research Institute, Singapore, Singapore
| | - Ting Fang Tan
- Singapore National Eye Centre, Singapore, Singapore; Singapore Eye Research Institute, Singapore, Singapore
| | - Deborah Meixuan Lee
- Singapore Eye Research Institute, Singapore, Singapore; Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore, Singapore
| | - Kai Jie Chua
- Singapore National Eye Centre, Singapore, Singapore; Singapore Eye Research Institute, Singapore, Singapore
| | - John Heng
- Singapore National Eye Centre, Singapore, Singapore; Singapore Eye Research Institute, Singapore, Singapore
| | - Yong Liu
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Rick Siow Mong Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Daniel Shu Wei Ting
- Singapore National Eye Centre, Singapore, Singapore; Singapore Eye Research Institute, Singapore, Singapore; Duke-NUS Medical School, Singapore, Singapore.
| |
Collapse
|
19
|
Li A, Mullin S, Elkin PL. Improving Prediction of Survival for Extremely Premature Infants Born at 23 to 29 Weeks Gestational Age in the Neonatal Intensive Care Unit: Development and Evaluation of Machine Learning Models. JMIR Med Inform 2024; 12:e42271. [PMID: 38354033 PMCID: PMC10902770 DOI: 10.2196/42271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/02/2023] [Accepted: 12/28/2023] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Infants born at extremely preterm gestational ages are typically admitted to the neonatal intensive care unit (NICU) after initial resuscitation. The subsequent hospital course can be highly variable, and despite counseling aided by available risk calculators, there are significant challenges with shared decision-making regarding life support and transition to end-of-life care. Improving predictive models can help providers and families navigate these unique challenges. OBJECTIVE Machine learning methods have previously demonstrated added predictive value for determining intensive care unit outcomes, and their use allows consideration of a greater number of factors that potentially influence newborn outcomes, such as maternal characteristics. Machine learning-based models were analyzed for their ability to predict the survival of extremely preterm neonates at initial admission. METHODS Maternal and newborn information was extracted from the health records of infants born between 23 and 29 weeks of gestation in the Medical Information Mart for Intensive Care III (MIMIC-III) critical care database. Applicable machine learning models predicting survival during the initial NICU admission were developed and compared. The same type of model was also examined using only features that would be available prepartum for the purpose of survival prediction prior to an anticipated preterm birth. Features most correlated with the predicted outcome were determined when possible for each model. RESULTS Of included patients, 37 of 459 (8.1%) expired. The resulting random forest model showed higher predictive performance than the frequently used Score for Neonatal Acute Physiology With Perinatal Extension II (SNAPPE-II) NICU model when considering extremely preterm infants of very low birth weight. Several other machine learning models were found to have good performance but did not show a statistically significant difference from previously available models in this study. Feature importance varied by model, and those of greater importance included gestational age; birth weight; initial oxygenation level; elements of the APGAR (appearance, pulse, grimace, activity, and respiration) score; and amount of blood pressure support. Important prepartum features also included maternal age, steroid administration, and the presence of pregnancy complications. CONCLUSIONS Machine learning methods have the potential to provide robust prediction of survival in the context of extremely preterm births and allow for consideration of additional factors such as maternal clinical and socioeconomic information. Evaluation of larger, more diverse data sets may provide additional clarity on comparative performance.
Collapse
Affiliation(s)
- Angie Li
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| | - Sarah Mullin
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| | - Peter L Elkin
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
20
|
Soltan AAS, Thakur A, Yang J, Chauhan A, D'Cruz LG, Dickson P, Soltan MA, Thickett DR, Eyre DW, Zhu T, Clifton DA. A scalable federated learning solution for secondary care using low-cost microcomputing: privacy-preserving development and evaluation of a COVID-19 screening test in UK hospitals. Lancet Digit Health 2024; 6:e93-e104. [PMID: 38278619 DOI: 10.1016/s2589-7500(23)00226-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 10/17/2023] [Accepted: 10/30/2023] [Indexed: 01/28/2024]
Abstract
BACKGROUND Multicentre training could reduce biases in medical artificial intelligence (AI); however, ethical, legal, and technical considerations can constrain the ability of hospitals to share data. Federated learning enables institutions to participate in algorithm development while retaining custody of their data but uptake in hospitals has been limited, possibly as deployment requires specialist software and technical expertise at each site. We previously developed an artificial intelligence-driven screening test for COVID-19 in emergency departments, known as CURIAL-Lab, which uses vital signs and blood tests that are routinely available within 1 h of a patient's arrival. Here we aimed to federate our COVID-19 screening test by developing an easy-to-use embedded system-which we introduce as full-stack federated learning-to train and evaluate machine learning models across four UK hospital groups without centralising patient data. METHODS We supplied a Raspberry Pi 4 Model B preloaded with our federated learning software pipeline to four National Health Service (NHS) hospital groups in the UK: Oxford University Hospitals NHS Foundation Trust (OUH; through the locally linked research University, University of Oxford), University Hospitals Birmingham NHS Foundation Trust (UHB), Bedfordshire Hospitals NHS Foundation Trust (BH), and Portsmouth Hospitals University NHS Trust (PUH). OUH, PUH, and UHB participated in federated training, training a deep neural network and logistic regressor over 150 rounds to form and calibrate a global model to predict COVID-19 status, using clinical data from patients admitted before the pandemic (COVID-19-negative) and testing positive for COVID-19 during the first wave of the pandemic. We conducted a federated evaluation of the global model for admissions during the second wave of the pandemic at OUH, PUH, and externally at BH. For OUH and PUH, we additionally performed local fine-tuning of the global model using the sites' individual training data, forming a site-tuned model, and evaluated the resultant model for admissions during the second wave of the pandemic. This study included data collected between Dec 1, 2018, and March 1, 2021; the exact date ranges used varied by site. The primary outcome was overall model performance, measured as the area under the receiver operating characteristic curve (AUROC). Removable micro secure digital (microSD) storage was destroyed on study completion. FINDINGS Clinical data from 130 941 patients (1772 COVID-19-positive), routinely collected across three hospital groups (OUH, PUH, and UHB), were included in federated training. The evaluation step included data from 32 986 patients (3549 COVID-19-positive) attending OUH, PUH, or BH during the second wave of the pandemic. Federated training of a global deep neural network classifier improved upon performance of models trained locally in terms of AUROC by a mean of 27·6% (SD 2·2): AUROC increased from 0·574 (95% CI 0·560-0·589) at OUH and 0·622 (0·608-0·637) at PUH using the locally trained models to 0·872 (0·862-0·882) at OUH and 0·876 (0·865-0·886) at PUH using the federated global model. Performance improvement was smaller for a logistic regression model, with a mean increase in AUROC of 13·9% (0·5%). During federated external evaluation at BH, AUROC for the global deep neural network model was 0·917 (0·893-0·942), with 89·7% sensitivity (83·6-93·6) and 76·6% specificity (73·9-79·1). Site-specific tuning of the global model did not significantly improve performance (change in AUROC <0·01). INTERPRETATION We developed an embedded system for federated learning, using microcomputing to optimise for ease of deployment. We deployed full-stack federated learning across four UK hospital groups to develop a COVID-19 screening test without centralising patient data. Federation improved model performance, and the resultant global models were generalisable. Full-stack federated learning could enable hospitals to contribute to AI development at low cost and without specialist technical expertise at each site. FUNDING The Wellcome Trust, University of Oxford Medical and Life Sciences Translational Fund.
Collapse
Affiliation(s)
- Andrew A S Soltan
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Department of Oncology, University of Oxford, Oxford, UK; Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK; Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK; Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK.
| | - Anshul Thakur
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Jenny Yang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Anoop Chauhan
- Portsmouth Hospitals University NHS Trust, Portsmouth, UK
| | - Leon G D'Cruz
- Portsmouth Hospitals University NHS Trust, Portsmouth, UK
| | | | - Marina A Soltan
- The Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
| | - David R Thickett
- The Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
| | - David W Eyre
- Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK; NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford and Public Health England, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Tingting Zhu
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
| | - David A Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford, UK; Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| |
Collapse
|
21
|
Pan W, Xu Z, Rajendran S, Wang F. An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals. PATTERNS (NEW YORK, N.Y.) 2024; 5:100898. [PMID: 38264713 PMCID: PMC10801228 DOI: 10.1016/j.patter.2023.100898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 09/06/2023] [Accepted: 11/21/2023] [Indexed: 01/25/2024]
Abstract
Clinical risk prediction with electronic health records (EHR) using machine learning has attracted lots of attentions in recent years, where one of the key challenges is how to protect data privacy. Federated learning (FL) provides a promising framework for building predictive models by leveraging the data from multiple institutions without sharing them. However, data distribution drift across different institutions greatly impacts the performance of FL. In this paper, an adaptive FL framework was proposed to address this challenge. Our framework separated the input features into stable, domain-specific, and conditional-irrelevant parts according to their relationships to clinical outcomes. We evaluate this framework on the tasks of predicting the onset risk of sepsis and acute kidney injury (AKI) for patients in the intensive care unit (ICU) from multiple clinical institutions. The results showed that our framework can achieve better prediction performance compared with existing FL baselines and provide reasonable feature interpretations.
Collapse
Affiliation(s)
- Weishen Pan
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| |
Collapse
|
22
|
Saeedi M, Gorji HT, Vasefi F, Tavakolian K. Federated Versus Central Machine Learning on Diabetic Foot Ulcer Images: Comparative Simulations. IEEE ACCESS 2024; 12:58960-58971. [DOI: 10.1109/access.2024.3392916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
Affiliation(s)
- Mahdi Saeedi
- Department of Biomedical Engineering, University of North Dakota, Grand Forks, ND, USA
| | - Hamed Taheri Gorji
- Department of Biomedical Engineering, University of North Dakota, Grand Forks, ND, USA
| | | | - Kouhyar Tavakolian
- Department of Biomedical Engineering, University of North Dakota, Grand Forks, ND, USA
| |
Collapse
|
23
|
Choi G, Cha WC, Lee SU, Shin SY. Survey of Medical Applications of Federated Learning. Healthc Inform Res 2024; 30:3-15. [PMID: 38359845 PMCID: PMC10879826 DOI: 10.4258/hir.2024.30.1.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 02/17/2024] Open
Abstract
OBJECTIVES Medical artificial intelligence (AI) has recently attracted considerable attention. However, training medical AI models is challenging due to privacy-protection regulations. Among the proposed solutions, federated learning (FL) stands out. FL involves transmitting only model parameters without sharing the original data, making it particularly suitable for the medical field, where data privacy is paramount. This study reviews the application of FL in the medical domain. METHODS We conducted a literature search using the keywords "federated learning" in combination with "medical," "healthcare," or "clinical" on Google Scholar and PubMed. After reviewing titles and abstracts, 58 papers were selected for analysis. These FL studies were categorized based on the types of data used, the target disease, the use of open datasets, the local model of FL, and the neural network model. We also examined issues related to heterogeneity and security. RESULTS In the investigated FL studies, the most commonly used data type was image data, and the most studied target diseases were cancer and COVID-19. The majority of studies utilized open datasets. Furthermore, 72% of the FL articles addressed heterogeneity issues, while 50% discussed security concerns. CONCLUSIONS FL in the medical domain appears to be in its early stages, with most research using open data and focusing on specific data types and diseases for performance verification purposes. Nonetheless, medical FL research is anticipated to be increasingly applied and to become a vital component of multi-institutional research.
Collapse
Affiliation(s)
- Geunho Choi
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul,
Korea
| | - Won Chul Cha
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul,
Korea
- Department of Emergency Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul,
Korea
| | - Se Uk Lee
- Department of Emergency Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul,
Korea
| | - Soo-Yong Shin
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul,
Korea
| |
Collapse
|
24
|
Salluh JIF, Quintairos A, Dongelmans DA, Aryal D, Bagshaw S, Beane A, Burghi G, López MDPA, Finazzi S, Guidet B, Hashimoto S, Ichihara N, Litton E, Lone NI, Pari V, Sendagire C, Vijayaraghavan BKT, Haniffa R, Pisani L, Pilcher D. National ICU Registries as Enablers of Clinical Research and Quality Improvement. Crit Care Med 2024; 52:125-135. [PMID: 37698452 DOI: 10.1097/ccm.0000000000006050] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
OBJECTIVES Clinical quality registries (CQRs) have been implemented worldwide by several medical specialties aiming to generate a better characterization of epidemiology, treatments, and outcomes of patients. National ICU registries were created almost 3 decades ago to improve the understanding of case-mix, resource use, and outcomes of critically ill patients. This narrative review describes the challenges, proposed solutions, and evidence generated by National ICU registries as facilitators for research and quality improvement. DATA SOURCES English language articles were identified in PubMed using phrases related to ICU registries, CQRs, outcomes, and case-mix. STUDY SELECTION Original research, review articles, letters, and commentaries, were considered. DATA EXTRACTION Data from relevant literature were identified, reviewed, and integrated into a concise narrative review. DATA SYNTHESIS CQRs have been implemented worldwide by several medical specialties aiming to generate a better characterization of epidemiology, treatments, and outcomes of patients. National ICU registries were created almost 3 decades ago to improve the understanding of case-mix, resource use, and outcomes of critically ill patients. The initial experience in European countries and in Oceania ensured that through locally generated data, ICUs could assess their performances by using risk-adjusted measures and compare their results through fair and validated benchmarking metrics with other ICUs contributing to the CQR. The accomplishment of these initiatives, coupled with the increasing adoption of information technology, resulted in a broad geographic expansion of CQRs as well as their use in quality improvement studies, clinical trials as well as international comparisons, and benchmarking for ICUs. CONCLUSIONS ICU registries have provided increased knowledge of case-mix and outcomes of ICU patients based on real-world data and contributed to improve care delivery through quality improvement initiatives and trials. Recent increases in adoption of new technologies (i.e., cloud-based structures, artificial intelligence, machine learning) will ensure a broader and better use of data for epidemiology, healthcare policies, quality improvement, and clinical trials.
Collapse
Affiliation(s)
- Jorge I F Salluh
- D'Or Institute for Research and Education, Rio de Janeiro, Brazil
- Post-Graduation Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Amanda Quintairos
- D'Or Institute for Research and Education, Rio de Janeiro, Brazil
- Department of Critical and Intensive Care Medicine, Academic Hospital Fundación Santa Fe de Bogota, Bogota, Colombia
| | - Dave A Dongelmans
- Amsterdam UMC location University of Amsterdam, Department of Intensive Care Medicine, Amsterdam, The Netherlands
- National Intensive Care Evaluation (NICE) Foundation, Amsterdam, The Netherlands
| | - Diptesh Aryal
- National Coordinator, Nepal Intensive Care Research Foundation, Kathmandu, Nepal
| | - Sean Bagshaw
- Department of Medicine, Faculty of Medicine and Dentistry (Ling, Bagshaw), University of Alberta and Alberta Health Services, Edmonton, AB, Canada
- Division of Internal Medicine (Villeneuve), Department of Critical Care Medicine, Faculty of Medicine and Dentistry and School of Public Health, University of Alberta and Grey Nuns Hospitals, Edmonton, AB, Canada
| | - Abigail Beane
- Critical Care, Mahidol Oxford Tropical Medicine Research Unit, Bangkok, Thailand
- Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom
| | | | - Maria Del Pilar Arias López
- Argentine Society of Intensive Care (SATI). SATI-Q Program, Buenos Aires, Argentina
- Intermediate Care Unit, Hospital de Niños Ricardo Gutierrez, Buenos Aires, Argentina
| | - Stefano Finazzi
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Ranica, Italy
- Associazione GiViTI, c/o Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Bertrand Guidet
- Sorbonne Université, INSERM, Institut Pierre Louis d'Epidémiologie et de Santé Publique, AP-HP, Hôpital Saint-Antoine, service de réanimation, Paris, France
| | - Satoru Hashimoto
- Division of Intensive Care, Department of Anesthesiology and Intensive Care Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Nao Ichihara
- Department of Healthcare Quality Assessment, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Edward Litton
- Fiona Stanley Hospital, Perth, WA
- The University of Western Australia, Perth, WA
| | - Nazir I Lone
- Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Scottish Intensive Care Society Audit Group, United Kingdom
| | - Vrindha Pari
- Chennai Critical Care Consultants, Pvt Ltd, Chennai, India
| | - Cornelius Sendagire
- D'Or Institute for Research and Education, Rio de Janeiro, Brazil
- Anesthesia and Critical Care, Makerere University College of Health Sciences, Kampala, Uganda
| | | | - Rashan Haniffa
- Critical Care, Mahidol Oxford Tropical Medicine Research Unit, Bangkok, Thailand
- Crit Care Asia, Network for Improving Critical Care Systems and Training, Colombo, Sri Lanka
- Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, United Kingdom
| | - Luigi Pisani
- Critical Care, Mahidol Oxford Tropical Medicine Research Unit, Bangkok, Thailand
| | - David Pilcher
- University College Hospital, London, United Kingdom
- Department of Intensive Care, Alfred Health, Prahran, VIC, Australia
- The Australian and New Zealand Intensive Care Society (ANZICS) Centre for Outcome and Resource Evaluation, Camberwell, Australia
| |
Collapse
|
25
|
Guckenberger M, Andratschke N, Chung C, Fuller D, Tanadini-Lang S, Jaffray DA. The Future of MR-Guided Radiation Therapy. Semin Radiat Oncol 2024; 34:135-144. [PMID: 38105088 DOI: 10.1016/j.semradonc.2023.10.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Magnetic resonance image guided radiation therapy (MRIgRT) is a relatively new technology that has already shown outcomes benefits but that has not yet reached its clinical potential. The improved soft-tissue contrast provided with MR, coupled with the immediacy of image acquisition with respect to the treatment, enables expansion of on-table adaptive protocols, currently at a cost of increased treatment complexity, use of human resources, and longer treatment slot times, which translate to decreased throughput. Many approaches are being investigated to meet these challenges, including the development of artificial intelligence (AI) algorithms to accelerate and automate much of the workflow and improved technology that parallelizes workflow tasks, as well as improvements in image acquisition speed and quality. This article summarizes limitations of current available integrated MRIgRT systems and gives an outlook about scientific developments to further expand the use of MRIgRT.
Collapse
Affiliation(s)
- Matthias Guckenberger
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland..
| | - Nicolaus Andratschke
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Caroline Chung
- Division of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX
| | - Dave Fuller
- Division of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX
| | - Stephanie Tanadini-Lang
- Department of Radiation Oncology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - David A Jaffray
- Division of Radiation Oncology, University of Texas MD Anderson Cancer Center, Houston, TX
| |
Collapse
|
26
|
Sharma S, Guleria K. A comprehensive review on federated learning based models for healthcare applications. Artif Intell Med 2023; 146:102691. [PMID: 38042608 DOI: 10.1016/j.artmed.2023.102691] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 10/22/2023] [Accepted: 10/22/2023] [Indexed: 12/04/2023]
Abstract
A disease is an abnormal condition that negatively impacts the functioning of the human body. Pathology determines the causes behind the disease and identifies its development mechanism and functional consequences. Each disease has different identification methods, including X-ray scans for pneumonia, covid-19, and lung cancer, whereas biopsy and CT-scan can identify the presence of skin cancer and Alzheimer's disease, respectively. Early disease detection leads to effective treatment and avoids abiding complications. Deep learning has provided a vast number of applications in medical sectors resulting in accurate and reliable early disease predictions. These models are utilized in the healthcare industry to provide supplementary assistance to doctors in identifying the presence of diseases. Majorly, these models are trained through secondary data sources since healthcare institutions refrain from sharing patients' private data to ensure confidentiality, which limits the effectiveness of deep learning models due to the requirement of extensive datasets for training to achieve optimal results. Federated learning deals with the data in such a way that it doesn't exploit the privacy of a patient's data. In this work, a wide variety of disease detection models trained through federated learning have been rigorously reviewed. This meta-analysis provides an in-depth review of the federated learning architectures, federated learning types, hyperparameters, dataset utilization details, aggregation techniques, performance measures, and augmentation methods applied in the existing models during the development phase. The review also highlights various open challenges associated with the disease detection models trained through federated learning for future research.
Collapse
Affiliation(s)
- Shagun Sharma
- Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura 140401, Punjab, India
| | - Kalpna Guleria
- Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura 140401, Punjab, India.
| |
Collapse
|
27
|
Li S, Liu P, Nascimento GG, Wang X, Leite FRM, Chakraborty B, Hong C, Ning Y, Xie F, Teo ZL, Ting DSW, Haddadi H, Ong MEH, Peres MA, Liu N. Federated and distributed learning applications for electronic health records and structured medical data: a scoping review. J Am Med Inform Assoc 2023; 30:2041-2049. [PMID: 37639629 PMCID: PMC10654866 DOI: 10.1093/jamia/ocad170] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/19/2023] [Indexed: 08/31/2023] Open
Abstract
OBJECTIVES Federated learning (FL) has gained popularity in clinical research in recent years to facilitate privacy-preserving collaboration. Structured data, one of the most prevalent forms of clinical data, has experienced significant growth in volume concurrently, notably with the widespread adoption of electronic health records in clinical practice. This review examines FL applications on structured medical data, identifies contemporary limitations, and discusses potential innovations. MATERIALS AND METHODS We searched 5 databases, SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL, to identify articles that applied FL to structured medical data and reported results following the PRISMA guidelines. Each selected publication was evaluated from 3 primary perspectives, including data quality, modeling strategies, and FL frameworks. RESULTS Out of the 1193 papers screened, 34 met the inclusion criteria, with each article consisting of one or more studies that used FL to handle structured clinical/medical data. Of these, 24 utilized data acquired from electronic health records, with clinical predictions and association studies being the most common clinical research tasks that FL was applied to. Only one article exclusively explored the vertical FL setting, while the remaining 33 explored the horizontal FL setting, with only 14 discussing comparisons between single-site (local) and FL (global) analysis. CONCLUSIONS The existing FL applications on structured medical data lack sufficient evaluations of clinically meaningful benefits, particularly when compared to single-site analyses. Therefore, it is crucial for future FL applications to prioritize clinical motivations and develop designs and methodologies that can effectively support and aid clinical practice and research.
Collapse
Affiliation(s)
- Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Gustavo G Nascimento
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Xinru Wang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Fabio Renato Manzolli Leite
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Zhen Ling Teo
- Singapore National Eye Centre, Singapore, Singapore Eye Research Institute, Singapore 168751, Singapore
| | - Daniel Shu Wei Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Singapore National Eye Centre, Singapore, Singapore Eye Research Institute, Singapore 168751, Singapore
| | - Hamed Haddadi
- Department of Computing, Imperial College London, London SW7 2AZ, England, United Kingdom
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Department of Emergency Medicine, Singapore General Hospital, Singapore 169608, Singapore
| | - Marco Aurélio Peres
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore
| |
Collapse
|
28
|
Maniar KM, Lassarén P, Rana A, Yao Y, Tewarie IA, Gerstl JVE, Recio Blanco CM, Power LH, Mammi M, Mattie H, Smith TR, Mekary RA. Traditional Machine Learning Methods versus Deep Learning for Meningioma Classification, Grading, Outcome Prediction, and Segmentation: A Systematic Review and Meta-Analysis. World Neurosurg 2023; 179:e119-e134. [PMID: 37574189 DOI: 10.1016/j.wneu.2023.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/06/2023] [Indexed: 08/15/2023]
Abstract
BACKGROUND Meningiomas are common intracranial tumors. Machine learning (ML) algorithms are emerging to improve accuracy in 4 primary domains: classification, grading, outcome prediction, and segmentation. Such algorithms include both traditional approaches that rely on hand-crafted features and deep learning (DL) techniques that utilize automatic feature extraction. The aim of this study was to evaluate the performance of published traditional ML versus DL algorithms in classification, grading, outcome prediction, and segmentation of meningiomas. METHODS A systematic review and meta-analysis were conducted. Major databases were searched through September 2021 for publications evaluating traditional ML versus DL models on meningioma management. Performance measures including pooled sensitivity, specificity, F1-score, area under the receiver-operating characteristic curve, positive and negative likelihood ratios (LR+, LR-) along with their respective 95% confidence intervals (95% CIs) were derived using random-effects models. RESULTS Five hundred thirty-four records were screened, and 43 articles were included, regarding classification (3 articles), grading (29), outcome prediction (7), and segmentation (6) of meningiomas. Of the 29 studies that reported on grading, 10 could be meta-analyzed with 2 DL models (sensitivity 0.89, 95% CI: 0.74-0.96; specificity 0.91, 95% CI: 0.45-0.99; LR+ 10.1, 95% CI: 1.33-137; LR- 0.12, 95% CI: 0.04-0.59) and 8 traditional ML (sensitivity 0.74, 95% CI: 0.62-0.83; specificity 0.93, 95% CI: 0.79-0.98; LR+ 10.5, 95% CI: 2.91-39.5; and LR- 0.28, 95% CI: 0.17-0.49). The insufficient performance metrics reported precluded further statistical analysis of other performance metrics. CONCLUSIONS ML on meningiomas is mostly carried out with traditional methods. For meningioma grading, traditional ML methods generally had a higher LR+, while DL models a lower LR-.
Collapse
Affiliation(s)
- Krish M Maniar
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Philipp Lassarén
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| | - Aakanksha Rana
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Boston, Massachusetts, United States
| | - Yuxin Yao
- Department of Pharmaceutical Business and Administrative Sciences, School of Pharmacy, Massachusetts College of Pharmacy and Health Sciences University, Boston, Massachusetts, United States
| | - Ishaan A Tewarie
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; Department of Neurosurgery, Haaglanden Medical Center, The Hague, The Netherlands; Faculty of Medicine, Erasmus University Rotterdam/Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Jakob V E Gerstl
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Camila M Recio Blanco
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; Northeast National University, Corrientes, Argentina; Prisma Salud, Puerto San Julian, Santa Cruz, Argentina
| | - Liam H Power
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; School of Medicine, Tufts University, Boston, Massachusetts, United States
| | - Marco Mammi
- Neurosurgery Unit, S. Croce e Carle Hospital, Cuneo, Italy
| | - Heather Mattie
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States
| | - Timothy R Smith
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; Department of Neurosurgery, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, United States
| | - Rania A Mekary
- Department of Neurosurgery, Computational Neurosciences Outcomes Center (CNOC), Harvard Medical School, Brigham and Women's Hospital, Boston, Massachusetts, United States; Department of Pharmaceutical Business and Administrative Sciences, School of Pharmacy, Massachusetts College of Pharmacy and Health Sciences University, Boston, Massachusetts, United States.
| |
Collapse
|
29
|
Sandhu SS, Gorji HT, Tavakolian P, Tavakolian K, Akhbardeh A. Medical Imaging Applications of Federated Learning. Diagnostics (Basel) 2023; 13:3140. [PMID: 37835883 PMCID: PMC10572559 DOI: 10.3390/diagnostics13193140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/03/2023] [Accepted: 10/03/2023] [Indexed: 10/15/2023] Open
Abstract
Since its introduction in 2016, researchers have applied the idea of Federated Learning (FL) to several domains ranging from edge computing to banking. The technique's inherent security benefits, privacy-preserving capabilities, ease of scalability, and ability to transcend data biases have motivated researchers to use this tool on healthcare datasets. While several reviews exist detailing FL and its applications, this review focuses solely on the different applications of FL to medical imaging datasets, grouping applications by diseases, modality, and/or part of the body. This Systematic Literature review was conducted by querying and consolidating results from ArXiv, IEEE Xplorer, and PubMed. Furthermore, we provide a detailed description of FL architecture, models, descriptions of the performance achieved by FL models, and how results compare with traditional Machine Learning (ML) models. Additionally, we discuss the security benefits, highlighting two primary forms of privacy-preserving techniques, including homomorphic encryption and differential privacy. Finally, we provide some background information and context regarding where the contributions lie. The background information is organized into the following categories: architecture/setup type, data-related topics, security, and learning types. While progress has been made within the field of FL and medical imaging, much room for improvement and understanding remains, with an emphasis on security and data issues remaining the primary concerns for researchers. Therefore, improvements are constantly pushing the field forward. Finally, we highlighted the challenges in deploying FL in medical imaging applications and provided recommendations for future directions.
Collapse
Affiliation(s)
- Sukhveer Singh Sandhu
- Biomedical Engineering Program, University of North Dakota, Grand Forks, ND 58202, USA; (H.T.G.); (P.T.)
| | - Hamed Taheri Gorji
- Biomedical Engineering Program, University of North Dakota, Grand Forks, ND 58202, USA; (H.T.G.); (P.T.)
- SafetySpect Inc., 4200 James Ray Dr., Grand Forks, ND 58202, USA
| | - Pantea Tavakolian
- Biomedical Engineering Program, University of North Dakota, Grand Forks, ND 58202, USA; (H.T.G.); (P.T.)
| | - Kouhyar Tavakolian
- Biomedical Engineering Program, University of North Dakota, Grand Forks, ND 58202, USA; (H.T.G.); (P.T.)
| | | |
Collapse
|
30
|
Rehman MHU, Hugo Lopez Pinaya W, Nachev P, Teo JT, Ourselin S, Cardoso MJ. Federated learning for medical imaging radiology. Br J Radiol 2023; 96:20220890. [PMID: 38011227 PMCID: PMC10546441 DOI: 10.1259/bjr.20220890] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 07/31/2023] [Accepted: 08/02/2023] [Indexed: 11/29/2023] Open
Abstract
Federated learning (FL) is gaining wide acceptance across the medical AI domains. FL promises to provide a fairly acceptable clinical-grade accuracy, privacy, and generalisability of machine learning models across multiple institutions. However, the research on FL for medical imaging AI is still in its early stages. This paper presents a review of recent research to outline the difference between state-of-the-art [SOTA] (published literature) and state-of-the-practice [SOTP] (applied research in realistic clinical environments). Furthermore, the review outlines the future research directions considering various factors such as data, learning models, system design, governance, and human-in-loop to translate the SOTA into SOTP and effectively collaborate across multiple institutions.
Collapse
Affiliation(s)
| | | | - Parashkev Nachev
- Institute of Neurology, University College London, London, United Kingdom
| | - James T. Teo
- King’s College Hospital, NHS Foundation Trust, London, United Kingdom
| | | | | |
Collapse
|
31
|
Li S, Ning Y, Ong MEH, Chakraborty B, Hong C, Xie F, Yuan H, Liu M, Buckland DM, Chen Y, Liu N. FedScore: A privacy-preserving framework for federated scoring system development. J Biomed Inform 2023; 146:104485. [PMID: 37660960 DOI: 10.1016/j.jbi.2023.104485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 08/08/2023] [Accepted: 08/31/2023] [Indexed: 09/05/2023]
Abstract
OBJECTIVE We propose FedScore, a privacy-preserving federated learning framework for scoring system generation across multiple sites to facilitate cross-institutional collaborations. MATERIALS AND METHODS The FedScore framework includes five modules: federated variable ranking, federated variable transformation, federated score derivation, federated model selection and federated model evaluation. To illustrate usage and assess FedScore's performance, we built a hypothetical global scoring system for mortality prediction within 30 days after a visit to an emergency department using 10 simulated sites divided from a tertiary hospital in Singapore. We employed a pre-existing score generator to construct 10 local scoring systems independently at each site and we also developed a scoring system using centralized data for comparison. RESULTS We compared the acquired FedScore model's performance with that of other scoring models using the receiver operating characteristic (ROC) analysis. The FedScore model achieved an average area under the curve (AUC) value of 0.763 across all sites, with a standard deviation (SD) of 0.020. We also calculated the average AUC values and SDs for each local model, and the FedScore model showed promising accuracy and stability with a high average AUC value which was closest to the one of the pooled model and SD which was lower than that of most local models. CONCLUSION This study demonstrates that FedScore is a privacy-preserving scoring system generator with potentially good generalizability.
Collapse
Affiliation(s)
- Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore; Health Services Research Centre, Singapore Health Services, Singapore, Singapore; Department of Emergency Medicine, Singapore General Hospital, Singapore, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore; Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Han Yuan
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Mingxuan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Daniel M Buckland
- Department of Emergency Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore; Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore; Institute of Data Science, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
32
|
Li W, Kim M, Zhang K, Chen H, Jiang X, Harmanci A. COLLAGENE enables privacy-aware federated and collaborative genomic data analysis. Genome Biol 2023; 24:204. [PMID: 37697426 PMCID: PMC10496350 DOI: 10.1186/s13059-023-03039-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 08/16/2023] [Indexed: 09/13/2023] Open
Abstract
Growing regulatory requirements set barriers around genetic data sharing and collaborations. Moreover, existing privacy-aware paradigms are challenging to deploy in collaborative settings. We present COLLAGENE, a tool base for building secure collaborative genomic data analysis methods. COLLAGENE protects data using shared-key homomorphic encryption and combines encryption with multiparty strategies for efficient privacy-aware collaborative method development. COLLAGENE provides ready-to-run tools for encryption/decryption, matrix processing, and network transfers, which can be immediately integrated into existing pipelines. We demonstrate the usage of COLLAGENE by building a practical federated GWAS protocol for binary phenotypes and a secure meta-analysis protocol. COLLAGENE is available at https://zenodo.org/record/8125935 .
Collapse
Affiliation(s)
- Wentao Li
- Center for Secure Artificial Intelligence For hEalthcare (SAFE), D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Miran Kim
- Department of Mathematics, Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
- Research Institute for Convergence of Basic Science, Hanyang University, Seoul, 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, 04763, Republic of Korea
| | - Kai Zhang
- Center for Secure Artificial Intelligence For hEalthcare (SAFE), D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Center for Precision Health, D. Bradley McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Xiaoqian Jiang
- Center for Secure Artificial Intelligence For hEalthcare (SAFE), D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Arif Harmanci
- Center for Secure Artificial Intelligence For hEalthcare (SAFE), D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
- Center for Precision Health, D. Bradley McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
33
|
Diniz JM, Vasconcelos H, Souza J, Rb-Silva R, Ameijeiras-Rodriguez C, Freitas A. Comparing Decentralized Learning Methods for Health Data Models to Nondecentralized Alternatives: Protocol for a Systematic Review. JMIR Res Protoc 2023; 12:e45823. [PMID: 37335606 PMCID: PMC10337426 DOI: 10.2196/45823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/27/2023] [Accepted: 04/28/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Considering the soaring health-related costs directed toward a growing, aging, and comorbid population, the health sector needs effective data-driven interventions while managing rising care costs. While health interventions using data mining have become more robust and adopted, they often demand high-quality big data. However, growing privacy concerns have hindered large-scale data sharing. In parallel, recently introduced legal instruments require complex implementations, especially when it comes to biomedical data. New privacy-preserving technologies, such as decentralized learning, make it possible to create health models without mobilizing data sets by using distributed computation principles. Several multinational partnerships, including a recent agreement between the United States and the European Union, are adopting these techniques for next-generation data science. While these approaches are promising, there is no clear and robust evidence synthesis of health care applications. OBJECTIVE The main aim is to compare the performance among health data models (eg, automated diagnosis and mortality prediction) developed using decentralized learning approaches (eg, federated and blockchain) to those using centralized or local methods. Secondary aims are comparing the privacy compromise and resource use among model architectures. METHODS We will conduct a systematic review using the first-ever registered research protocol for this topic following a robust search methodology, including several biomedical and computational databases. This work will compare health data models differing in development architecture, grouping them according to their clinical applications. For reporting purposes, a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram will be presented. CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies)-based forms will be used for data extraction and to assess the risk of bias, alongside PROBAST (Prediction Model Risk of Bias Assessment Tool). All effect measures in the original studies will be reported. RESULTS The queries and data extractions are expected to start on February 28, 2023, and end by July 31, 2023. The research protocol was registered with PROSPERO, under the number 393126, on February 3, 2023. With this protocol, we detail how we will conduct the systematic review. With that study, we aim to summarize the progress and findings from state-of-the-art decentralized learning models in health care in comparison to their local and centralized counterparts. Results are expected to clarify the consensuses and heterogeneities reported and help guide the research and development of new robust and sustainable applications to address the health data privacy problem, with applicability in real-world settings. CONCLUSIONS We expect to clearly present the status quo of these privacy-preserving technologies in health care. With this robust synthesis of the currently available scientific evidence, the review will inform health technology assessment and evidence-based decisions, from health professionals, data scientists, and policy makers alike. Importantly, it should also guide the development and application of new tools in service of patients' privacy and future research. TRIAL REGISTRATION PROSPERO 393126; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=393126. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/45823.
Collapse
Affiliation(s)
- José Miguel Diniz
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- PhD Program in Health Data Science, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Henrique Vasconcelos
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Júlio Souza
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Rita Rb-Silva
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Carolina Ameijeiras-Rodriguez
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Alberto Freitas
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| |
Collapse
|
34
|
Tsai HF, Podder S, Chen PY. Microsystem Advances through Integration with Artificial Intelligence. MICROMACHINES 2023; 14:826. [PMID: 37421059 PMCID: PMC10141994 DOI: 10.3390/mi14040826] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 04/04/2023] [Accepted: 04/06/2023] [Indexed: 07/09/2023]
Abstract
Microfluidics is a rapidly growing discipline that involves studying and manipulating fluids at reduced length scale and volume, typically on the scale of micro- or nanoliters. Under the reduced length scale and larger surface-to-volume ratio, advantages of low reagent consumption, faster reaction kinetics, and more compact systems are evident in microfluidics. However, miniaturization of microfluidic chips and systems introduces challenges of stricter tolerances in designing and controlling them for interdisciplinary applications. Recent advances in artificial intelligence (AI) have brought innovation to microfluidics from design, simulation, automation, and optimization to bioanalysis and data analytics. In microfluidics, the Navier-Stokes equations, which are partial differential equations describing viscous fluid motion that in complete form are known to not have a general analytical solution, can be simplified and have fair performance through numerical approximation due to low inertia and laminar flow. Approximation using neural networks trained by rules of physical knowledge introduces a new possibility to predict the physicochemical nature. The combination of microfluidics and automation can produce large amounts of data, where features and patterns that are difficult to discern by a human can be extracted by machine learning. Therefore, integration with AI introduces the potential to revolutionize the microfluidic workflow by enabling the precision control and automation of data analysis. Deployment of smart microfluidics may be tremendously beneficial in various applications in the future, including high-throughput drug discovery, rapid point-of-care-testing (POCT), and personalized medicine. In this review, we summarize key microfluidic advances integrated with AI and discuss the outlook and possibilities of combining AI and microfluidics.
Collapse
Affiliation(s)
- Hsieh-Fu Tsai
- Department of Biomedical Engineering, Chang Gung University, Taoyuan City 333, Taiwan;
- Department of Neurosurgery, Chang Gung Memorial Hospital, Keelung, Keelung City 204, Taiwan
- Center for Biomedical Engineering, Chang Gung University, Taoyuan City 333, Taiwan
| | - Soumyajit Podder
- Department of Biomedical Engineering, Chang Gung University, Taoyuan City 333, Taiwan;
| | - Pin-Yuan Chen
- Department of Biomedical Engineering, Chang Gung University, Taoyuan City 333, Taiwan;
- Department of Neurosurgery, Chang Gung Memorial Hospital, Keelung, Keelung City 204, Taiwan
| |
Collapse
|
35
|
Rajendran S, Xu Z, Pan W, Ghosh A, Wang F. Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care. PLOS DIGITAL HEALTH 2023; 2:e0000117. [PMID: 36920974 PMCID: PMC10016691 DOI: 10.1371/journal.pdig.0000117] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 02/10/2023] [Indexed: 03/16/2023]
Abstract
With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis' high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites' data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model's parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity.
Collapse
Affiliation(s)
- Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, New York, New York, United States of America
| | - Zhenxing Xu
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America
| | - Weishen Pan
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America
| | - Arnab Ghosh
- Departments of Medicine, Weill Cornell Medical College, Cornell University, New York, New York, United States of America
| | - Fei Wang
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America
| |
Collapse
|
36
|
Federated machine learning in data-protection-compliant research. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00601-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
37
|
Walker SB, Badke CM, Carroll MS, Honegger KS, Fawcett A, Weese-Mayer DE, Sanchez-Pinto LN. Novel approaches to capturing and using continuous cardiorespiratory physiological data in hospitalized children. Pediatr Res 2023; 93:396-404. [PMID: 36329224 DOI: 10.1038/s41390-022-02359-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 08/16/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022]
Abstract
Continuous cardiorespiratory physiological monitoring is a cornerstone of care in hospitalized children. The data generated by monitoring devices coupled with machine learning could transform the way we provide care. This scoping review summarizes existing evidence on novel approaches to continuous cardiorespiratory monitoring in hospitalized children. We aimed to identify opportunities for the development of monitoring technology and the use of machine learning to analyze continuous physiological data to improve the outcomes of hospitalized children. We included original research articles published on or after January 1, 2001, involving novel approaches to collect and use continuous cardiorespiratory physiological data in hospitalized children. OVID Medline, PubMed, and Embase databases were searched. We screened 2909 articles and performed full-text extraction of 105 articles. We identified 58 articles describing novel devices or approaches, which were generally small and single-center. In addition, we identified 47 articles that described the use of continuous physiological data in prediction models, but only 7 integrated multidimensional data (e.g., demographics, laboratory results). We identified three areas for development: (1) further validation of promising novel devices; (2) more studies of models integrating multidimensional data with continuous cardiorespiratory data; and (3) further dissemination, implementation, and validation of prediction models using continuous cardiorespiratory data. IMPACT: We performed a comprehensive scoping review of novel approaches to capture and use continuous cardiorespiratory physiological data for monitoring, diagnosis, providing care, and predicting events in hospitalized infants and children, from novel devices to machine learning-based prediction models. We identified three key areas for future development: (1) further validation of promising novel devices; (2) more studies of models integrating multidimensional data with continuous cardiorespiratory data; and (3) further dissemination, implementation, and validation of prediction models using cardiorespiratory data.
Collapse
Affiliation(s)
- Sarah B Walker
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA. .,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA.
| | - Colleen M Badke
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Michael S Carroll
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Kyle S Honegger
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Andrea Fawcett
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Debra E Weese-Mayer
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - L Nelson Sanchez-Pinto
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| |
Collapse
|
38
|
Seastedt KP, Schwab P, O’Brien Z, Wakida E, Herrera K, Marcelo PGF, Agha-Mir-Salim L, Frigola XB, Ndulue EB, Marcelo A, Celi LA. Global healthcare fairness: We should be sharing more, not less, data. PLOS DIGITAL HEALTH 2022; 1:e0000102. [PMID: 36812599 PMCID: PMC9931202 DOI: 10.1371/journal.pdig.0000102] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost-measured in terms of access to future medical innovations and clinical software-of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization. This cost is especially great for developing countries where the barriers preventing inclusion in such databases will continue to rise, further excluding these populations and increasing existing biases that favor high-income countries. Preventing artificial intelligence's progress towards precision medicine and sliding back to clinical practice dogma may pose a larger threat than concerns of potential patient reidentification within publicly available datasets. While the risk to patient privacy should be minimized, we believe this risk will never be zero, and society has to determine an acceptable risk threshold below which data sharing can occur-for the benefit of a global medical knowledge system.
Collapse
Affiliation(s)
- Kenneth P. Seastedt
- Beth Israel Deaconess Medical Center, Department of Surgery, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Patrick Schwab
- GlaxoSmithKline, Artificial Intelligence & Machine Learning, Zug, Switzerland
| | - Zach O’Brien
- Australian and New Zealand Intensive Care Research Centre (ANZIC-RC), Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Edith Wakida
- Mbarara University of Science and Technology, Mbarara, Uganda
| | - Karen Herrera
- Quality and Patient Safety, Hospital Militar, Managua, Nicaragua
| | - Portia Grace F. Marcelo
- Department of Family & Community Medicine, University of the Philippines, Manila, Philippines
| | - Louis Agha-Mir-Salim
- Institute of Medical Informatics, Charité—Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Berlin, Germany
- Laboratory for Computational Physiology, Harvard-MIT Division of Health Sciences & Technology, Cambridge, Massachusetts, United States of America
| | - Xavier Borrat Frigola
- Laboratory for Computational Physiology, Harvard-MIT Division of Health Sciences & Technology, Cambridge, Massachusetts, United States of America
- Anesthesiology and Critical Care Department, Hospital Clinic de Barcelona, Barcelona, Spain
| | - Emily Boardman Ndulue
- Department of Journalism, Northeastern University, Boston, Massachusetts, United States of America
| | - Alvin Marcelo
- Department of Surgery, University of the Philippines, Manila, Philippines
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Harvard-MIT Division of Health Sciences & Technology, Cambridge, Massachusetts, United States of America
- Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biostatistics Harvard T.H, Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
39
|
Gottlieb ER, Samuel M, Bonventre JV, Celi LA, Mattie H. Machine Learning for Acute Kidney Injury Prediction in the Intensive Care Unit. Adv Chronic Kidney Dis 2022; 29:431-438. [PMID: 36253026 PMCID: PMC9586459 DOI: 10.1053/j.ackd.2022.06.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/01/2022] [Accepted: 06/22/2022] [Indexed: 01/25/2023]
Abstract
Machine learning is the field of artificial intelligence in which computers are trained to make predictions or to identify patterns in data through complex mathematical algorithms. It has great potential in critical care to predict outcomes, such as acute kidney injury, and can be used for prognosis and to suggest management strategies. Machine learning can also be used as a research tool to advance our clinical and biochemical understanding of acute kidney injury. In this review, we introduce basic concepts in machine learning and review recent research in each of these domains.
Collapse
Affiliation(s)
- Eric R Gottlieb
- Renal Section, Brigham and Women's Hospital, Boston, MA; Harvard Medical School, Boston, MA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA.
| | | | - Joseph V Bonventre
- Renal Section, Brigham and Women's Hospital, Boston, MA; Harvard Medical School, Boston, MA
| | - Leo A Celi
- Harvard Medical School, Boston, MA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA; MIT Critical Data, Cambridge, MA; Harvard T.H. Chan School of Public Health, Boston, MA; Beth Israel Deaconess Medical Center, Boston, MA
| | | |
Collapse
|