1
|
Borazjani K, Khosravan N, Ying L, Hosseinalipour S. Multi-Modal Federated Learning for Cancer Staging Over Non-IID Datasets With Unbalanced Modalities. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:556-573. [PMID: 39196746 DOI: 10.1109/tmi.2024.3450855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2024]
Abstract
The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information. Our results further unveil the impact and severity of class-based vs type-based heterogeneity across institutions on the model performance, which widens the perspective to the notion of data heterogeneity in multi-modal FL literature.
Collapse
|
2
|
Tajabadi M, Martin R, Heider D. Privacy-preserving decentralized learning methods for biomedical applications. Comput Struct Biotechnol J 2024; 23:3281-3287. [PMID: 39296807 PMCID: PMC11408144 DOI: 10.1016/j.csbj.2024.08.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/26/2024] [Accepted: 08/26/2024] [Indexed: 09/21/2024] Open
Abstract
In recent years, decentralized machine learning has emerged as a significant advancement in biomedical applications, offering robust solutions for data privacy, security, and collaboration across diverse healthcare environments. In this review, we examine various decentralized learning methodologies, including federated learning, split learning, swarm learning, gossip learning, edge learning, and some of their applications in the biomedical field. We delve into the underlying principles, network topologies, and communication strategies of each approach, highlighting their advantages and limitations. Ultimately, the selection of a suitable method should be based on specific needs, infrastructures, and computational capabilities.
Collapse
Affiliation(s)
- Mohammad Tajabadi
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| | - Roman Martin
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| | - Dominik Heider
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| |
Collapse
|
3
|
Johnvictor AC, Poonkodi M, Prem Sankar N, VS T. TinyML-Based Lightweight AI Healthcare Mobile Chatbot Deployment. J Multidiscip Healthc 2024; 17:5091-5104. [PMID: 39539515 PMCID: PMC11559246 DOI: 10.2147/jmdh.s483247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction In healthcare applications, AI-driven innovations are set to revolutionise patient interactions and care, with the aim of improving patient satisfaction. Recent advancements in Artificial Intelligence have significantly affected nursing, assistive management, medical diagnoses, and other critical medical procedures. Purpose Many artificial intelligence (AI) solutions operate online, posing potential risks to patient data security. To address these security concerns and ensure swift operation, this study has developed a chatbot tailored for hospital environments, running on a local server, and utilising TinyML for processing patient data. Patients and Methods Edge computing technology enables secure on-site data processing. The implementation includes patient identification using a Histogram of Gradient (HOG)-based classification, followed by basic patient care tasks, such as temperature measurement and demographic recording. Results The classification accuracy of patient detection was 95.8%. An autonomous temperature-sensing unit equipped with a medical-grade infrared temperature scanner detected and recorded patient temperature. Following the temperature assessment, the tinyML-powered chatbot engaged patients in a series of questions customised by doctors to train the model for diagnostic scenarios. Patients' responses, recorded as "yes" or "no", are stored and printed in their case sheet. The accuracy of the TinyML model is 95.3% and the on-device processing time is 217 ms. The implemented TinyML model uses only 8.8Kb RAM and 50.3Kb Flash memory, with a latency of only 4 ms. Conclusion Each patient was assigned a unique ID, and their data were securely stored for further consultation and diagnosis via hospital management. This research demonstrates faster patient data recording and increased security compared to existing AI-based healthcare solutions, as all processes occur within the local host.
Collapse
Affiliation(s)
| | - M Poonkodi
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - N Prem Sankar
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
| | - Thinesh VS
- Arista Networks Pvt Ltd, Bangalore, India
| |
Collapse
|
4
|
Hu G, Fang X. FLCMC: Federated Learning Approach for Chinese Medicinal Text Classification. ENTROPY (BASEL, SWITZERLAND) 2024; 26:871. [PMID: 39451948 PMCID: PMC11507499 DOI: 10.3390/e26100871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 09/19/2024] [Accepted: 10/03/2024] [Indexed: 10/26/2024]
Abstract
Addressing the privacy protection and data sharing issues in Chinese medical texts, this paper introduces a federated learning approach named FLCMC for Chinese medical text classification. The paper first discusses the data heterogeneity issue in federated language modeling. Then, it proposes two perturbed federated learning algorithms, FedPA and FedPAP, based on the self-attention mechanism. In these algorithms, the self-attention mechanism is incorporated within the model aggregation module, while a perturbation term, which measures the differences between the client and the server, is added to the local update module along with a customized PAdam optimizer. Secondly, to enable a fair comparison of algorithms' performance, existing federated algorithms are improved by integrating a customized Adam optimizer. Through experiments, this paper first conducts experimental analyses on hyperparameters, data heterogeneity, and validity on synthetic datasets, which proves that the proposed federated learning algorithm has significant advantages in classification performance and convergence stability when dealing with heterogeneous data. Then, the algorithm is applied to Chinese medical text datasets to verify its effectiveness on real datasets. The comparative analysis of algorithm performance and communication efficiency shows that the algorithm exhibits strong generalization ability on deep learning models for Chinese medical texts. As for the synthetic dataset, upon comparing with comparison algorithms FedAvg, FedProx, FedAtt, and their improved versions, the experimental results show that for data with general heterogeneity, both FedPA and FedPAP show significantly more accurate and stable convergence behavior. On the real Chinese medical dataset of doctor-patient conversations, IMCS-V2, with logistic regression and long short-term memory network as training models, the experiment results show that in comparison to the above three comparison algorithms and their improved versions, FedPA and FedPAP both possess the best accuracy performance and display significantly more stable and accurate convergence behavior, proving that the method in this paper has better classification effects for Chinese medical texts.
Collapse
Affiliation(s)
- Guang Hu
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
- School of Computer Science, Fudan University, Shanghai 200438, China
- Shanghai Key Laboratory of Data Science, Shanghai 200438, China
| | - Xin Fang
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620, China
| |
Collapse
|
5
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Comput Biol Med 2024; 179:108734. [PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/06/2024]
Abstract
Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
| | - Azim Ansari
- Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Gondur, Dhule, 424002, Maharashtra, India.
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia.
| | | |
Collapse
|
6
|
Stripelis D, Gupta U, Saleem H, Dhinagar N, Ghai T, Anastasiou C, Sánchez R, Steeg GV, Ravi S, Naveed M, Thompson PM, Ambite JL. A federated learning architecture for secure and private neuroimaging analysis. PATTERNS (NEW YORK, N.Y.) 2024; 5:101031. [PMID: 39233693 PMCID: PMC11368680 DOI: 10.1016/j.patter.2024.101031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 04/04/2024] [Accepted: 06/06/2024] [Indexed: 09/06/2024]
Abstract
The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use federated learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its private data for some time and then shares the neural network parameters (i.e., weights and/or gradients) with a federation controller, which in turn aggregates the local models and sends the resulting community model back to each site, and the process repeats. Our federated learning architecture, MetisFL, provides strong security and privacy. First, sample data never leave a site. Second, neural network parameters are encrypted before transmission and the global neural model is computed under fully homomorphic encryption. Finally, we use information-theoretic methods to limit information leakage from the neural model to prevent a "curious" site from performing model inversion or membership attacks. We present a thorough evaluation of the performance of secure, private federated learning in neuroimaging tasks, including for predicting Alzheimer's disease and for brain age gap estimation (BrainAGE) from magnetic resonance imaging (MRI) studies in challenging, heterogeneous federated environments where sites have different amounts of data and statistical distributions.
Collapse
Affiliation(s)
- Dimitris Stripelis
- University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292, USA
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | - Umang Gupta
- University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292, USA
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | - Hamza Saleem
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | - Nikhil Dhinagar
- University of Southern California, Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, Marina del Rey, CA 90292, USA
| | - Tanmay Ghai
- University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292, USA
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | | | - Rafael Sánchez
- University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292, USA
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | - Greg Ver Steeg
- University of California, Riverside, Riverside, CA 92521, USA
| | - Srivatsan Ravi
- University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292, USA
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | - Muhammad Naveed
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| | - Paul M. Thompson
- University of Southern California, Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, Marina del Rey, CA 90292, USA
| | - José Luis Ambite
- University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292, USA
- University of Southern California, Computer Science Department, Los Angeles, CA 90089, USA
| |
Collapse
|
7
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
8
|
Chai H, Huang Y, Xu L, Song X, He M, Wang Q. A decentralized federated learning-based cancer survival prediction method with privacy protection. Heliyon 2024; 10:e31873. [PMID: 38845954 PMCID: PMC11153246 DOI: 10.1016/j.heliyon.2024.e31873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 05/18/2024] [Accepted: 05/23/2024] [Indexed: 06/09/2024] Open
Abstract
Background Survival prediction is one of the crucial goals in precision medicine, as accurate survival assessment can aid physicians in selecting appropriate treatment for individual patients. To achieve this aim, extensive data must be utilized to train the prediction model and prevent overfitting. However, the collection of patient data for disease prediction is challenging due to potential variations in data sources across institutions and concerns regarding privacy and ownership issues in data sharing. To facilitate the integration of cancer data from different institutions without violating privacy laws, we developed a federated learning-based data integration framework called AdFed, which can be used to evaluate patients' survival while considering the privacy protection problem by utilizing the decentralized federated learning technology and regularization method. Results AdFed was tested on different cancer datasets that contain the patients' information from different institutions. The experimental results show that AdFed using distributed data can achieve better performance in cancer survival prediction (AUC = 0.605) than the compared federated-learning-based methods (average AUC = 0.554). Additionally, to assess the biological interpretability of our method, in the case study we list 10 identified genes related to liver cancer selected by AdFed, among which 5 genes have been proved by literature review. Conclusions The results indicate that AdFed outperforms better than other federated-learning-based methods, and the interpretable algorithm can select biologically significant genes and pathways while ensuring the confidentiality and integrity of data.
Collapse
Affiliation(s)
- Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Yiqian Huang
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Lekai Xu
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Xinpeng Song
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Qingyong Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Hefei, 230036, China
| |
Collapse
|
9
|
Zhang F, Kreuter D, Chen Y, Dittmer S, Tull S, Shadbahr T, Preller J, Rudd JH, Aston JA, Schönlieb CB, Gleadall N, Roberts M. Recent methodological advances in federated learning for healthcare. PATTERNS (NEW YORK, N.Y.) 2024; 5:101006. [PMID: 39005485 PMCID: PMC11240178 DOI: 10.1016/j.patter.2024.101006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
For healthcare datasets, it is often impossible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data have many simultaneous challenges, such as highly siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables, that require new methodologies to address. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus published between January 2015 and February 2023 that describe new federated learning methodologies for addressing challenges with healthcare data. We reviewed 89 papers meeting these criteria. Significant systemic issues were identified throughout the literature, compromising many methodologies reviewed. We give detailed recommendations to help improve methodology development for federated learning in healthcare.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Daniel Kreuter
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Yichen Chen
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Sören Dittmer
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- ZeTeM, University of Bremen, Bremen, Germany
| | - Samuel Tull
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Tolou Shadbahr
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jacobus Preller
- Addenbrooke’s Hospital, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - James H.F. Rudd
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - John A.D. Aston
- Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Carola-Bibiane Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | | | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
10
|
Ullah F, Srivastava G, Xiao H, Ullah S, Lin JCW, Zhao Y. A Scalable Federated Learning Approach for Collaborative Smart Healthcare Systems With Intermittent Clients Using Medical Imaging. IEEE J Biomed Health Inform 2024; 28:3293-3304. [PMID: 37279135 DOI: 10.1109/jbhi.2023.3282955] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The healthcare industry is one of the most vulnerable to cybercrime and privacy violations because health data is very sensitive and spread out in many places. Recent confidentiality trends and a rising number of infringements in different sectors make it crucial to implement new methods that protect data privacy while maintaining accuracy and sustainability. Moreover, the intermittent nature of remote clients with imbalanced datasets poses a significant obstacle for decentralized healthcare systems. Federated learning (FL) is a decentralized and privacy-protecting approach to deep learning and machine learning models. In this article, we implement a scalable FL framework for interactive smart healthcare systems with intermittent clients using chest X-ray images. Remote hospitals may have imbalanced datasets with intermittent clients communicating with the FL global server. The data augmentation method is used to balance datasets for local model training. In practice, some clients may leave the training process while others join due to technical or connectivity issues. The proposed method is tested with five to eighteen clients and different testing data sizes to evaluate performance in various situations. The experiments show that the proposed FL approach produces competitive results when dealing with two distinct problems, such as intermittent clients and imbalanced data. These findings would encourage medical institutions to collaborate and use rich private data to quickly develop a powerful patient diagnostic model.
Collapse
|
11
|
Zhang L, Xu J, Sivaraman A, Deborah Lazarus J, Sharma PK, Pandi V. A Two-Stage Differential Privacy Scheme for Federated Learning Based on Edge Intelligence. IEEE J Biomed Health Inform 2024; 28:3349-3360. [PMID: 37594867 DOI: 10.1109/jbhi.2023.3306425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2023]
Abstract
The issue of data privacy protection must be considered in distributed federated learning (FL) so as to ensure that sensitive information is not leaked. In this article, we propose a two-stage differential privacy (DP) framework for FL based on edge intelligence. Various levels of privacy preservation can be provided according to the degree of data sensitivity. In the first stage, the randomized response mechanism is used to perturb the original feature data by the user terminal for data desensitization, and the user can self-regulate the level of privacy preservation. In the second stage, noise is added to the local models by the edge server to further guarantee the privacy of the models. Finally, the model updates are aggregated in the cloud. In order to evaluate the performance of the proposed end-edge-cloud FL framework in terms of training accuracy and convergence, extensive experiments are conducted on a real electrocardiogram (ECG) signal dataset. Bi-directional long-short-term memory (BiLSTM) neural network is adopted to training classification model. The effect of different combinations of feature perturbation and noise addition on the model accuracy is analyzed depending on different privacy budgets and parameters. The experimental results demonstrate that the proposed privacy-preserving framework provides good accuracy and convergence while ensuring privacy.
Collapse
|
12
|
Fu S, Jia H, Vassilaki M, Keloth VK, Dang Y, Zhou Y, Garg M, Petersen RC, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. J Biomed Inform 2024; 152:104623. [PMID: 38458578 PMCID: PMC11005095 DOI: 10.1016/j.jbi.2024.104623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/12/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Abstract
INTRODUCTION Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.
Collapse
Affiliation(s)
- Sunyang Fu
- Mayo Clinic, Rochester, MN, United States; University of Texas Health Science Center, Houston, TX, United States.
| | - Heling Jia
- Mayo Clinic, Rochester, MN, United States.
| | | | | | - Yifang Dang
- University of Texas Health Science Center, Houston, TX, United States.
| | - Yujia Zhou
- University of Texas Health Science Center, Houston, TX, United States.
| | | | | | | | | | - Liwei Wang
- Mayo Clinic, Rochester, MN, United States.
| | - Andrew Wen
- University of Texas Health Science Center, Houston, TX, United States.
| | - Fang Li
- University of Texas Health Science Center, Houston, TX, United States.
| | - Hua Xu
- Yale University, New Haven, CT, United States.
| | - Cui Tao
- University of Texas Health Science Center, Houston, TX, United States.
| | | | - Hongfang Liu
- Mayo Clinic, Rochester, MN, United States; University of Texas Health Science Center, Houston, TX, United States.
| | | |
Collapse
|
13
|
Rogers MP, Janjua HM, Walczak S, Baker M, Read M, Cios K, Velanovich V, Pietrobon R, Kuo PC. Artificial Intelligence in Surgical Research: Accomplishments and Future Directions. Am J Surg 2024; 230:82-90. [PMID: 37981516 DOI: 10.1016/j.amjsurg.2023.10.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 10/22/2023] [Indexed: 11/21/2023]
Abstract
MINI-ABSTRACT The study introduces various methods of performing conventional ML and their implementation in surgical areas, and the need to move beyond these traditional approaches given the advent of big data. OBJECTIVE Investigate current understanding and future directions of machine learning applications, such as risk stratification, clinical data analytics, and decision support, in surgical practice. SUMMARY BACKGROUND DATA The advent of the electronic health record, near unlimited computing, and open-source computational packages have created an environment for applying artificial intelligence, machine learning, and predictive analytic techniques to healthcare. The "hype" phase has passed, and algorithmic approaches are being developed for surgery patients through all stages of care, involving preoperative, intraoperative, and postoperative components. Surgeons must understand and critically evaluate the strengths and weaknesses of these methodologies. METHODS The current body of AI literature was reviewed, emphasizing on contemporary approaches important in the surgical realm. RESULTS AND CONCLUSIONS The unrealized impacts of AI on clinical surgery and its subspecialties are immense. As this technology continues to pervade surgical literature and clinical applications, knowledge of its inner workings and shortcomings is paramount in determining its appropriate implementation.
Collapse
Affiliation(s)
- Michael P Rogers
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Haroon M Janjua
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Steven Walczak
- School of Information & Florida Center for Cybersecurity, University of South Florida, Tampa, FL, USA
| | - Marshall Baker
- Department of Surgery, Loyola University Medical Center, Maywood, IL, USA
| | - Meagan Read
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Konrad Cios
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | - Vic Velanovich
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA
| | | | - Paul C Kuo
- Department of Surgery, University of South Florida Morsani College of Medicine, Tampa, FL, USA.
| |
Collapse
|
14
|
Darzi E, Sijtsema NM, van Ooijen PMA. A comparative study of federated learning methods for COVID-19 detection. Sci Rep 2024; 14:3944. [PMID: 38365940 PMCID: PMC10873416 DOI: 10.1038/s41598-024-54323-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 02/11/2024] [Indexed: 02/18/2024] Open
Abstract
Deep learning has proven to be highly effective in diagnosing COVID-19; however, its efficacy is contingent upon the availability of extensive data for model training. The data sharing among hospitals, which is crucial for training robust models, is often restricted by privacy regulations. Federated learning (FL) emerges as a solution by enabling model training across multiple hospitals while preserving data privacy. However, the deployment of FL can be resource-intensive, necessitating efficient utilization of computational and network resources. In this study, we evaluate the performance and resource efficiency of five FL algorithms in the context of COVID-19 detection using Convolutional Neural Networks (CNNs) in a decentralized setting. The evaluation involves varying the number of participating entities, the number of federated rounds, and the selection algorithms. Our findings indicate that the Cyclic Weight Transfer algorithm exhibits superior performance, particularly when the number of participating hospitals is limited. These insights hold practical implications for the deployment of FL algorithms in COVID-19 detection and broader medical image analysis.
Collapse
Affiliation(s)
- Erfan Darzi
- Harvard Medical school, Harvard University, 300 Longwood avenue, Boston, United States.
| | - Nanna M Sijtsema
- Department of Radiotherapy, University Medical Center Groningen, University of Groningen, Hanzeplein 1, Groningen, The Netherlands
- Machine Learning Lab, Data Science Center in Health (DASH), University Medical Groningen, University of Groningen, Hanzeplein 1, Groningen, The Netherlands
| | - P M A van Ooijen
- Department of Radiotherapy, University Medical Center Groningen, University of Groningen, Hanzeplein 1, Groningen, The Netherlands
- Machine Learning Lab, Data Science Center in Health (DASH), University Medical Groningen, University of Groningen, Hanzeplein 1, Groningen, The Netherlands
| |
Collapse
|
15
|
Song J, Song Z, Zhang J, Gong Y. Privacy-Preserving Identification of Cancer Subtype-Specific Driver Genes Based on Multigenomics Data with Privatedriver. J Comput Biol 2024; 31:99-116. [PMID: 38271572 DOI: 10.1089/cmb.2023.0115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024] Open
Abstract
Identifying cancer subtype-specific driver genes from a large number of irrelevant passengers is crucial for targeted therapy in cancer treatment. Recently, the rapid accumulation of large-scale cancer genomics data from multiple institutions has presented remarkable opportunities for identification of cancer subtype-specific driver genes. However, the insufficient subtype samples, privacy issues, and heterogenous of aberration events pose great challenges in precisely identifying cancer subtype-specific driver genes. To address this, we introduce privatedriver, the first model for identifying subtype-specific driver genes that integrates genomics data from multiple institutions in a data privacy-preserving collaboration manner. The process of identifying subtype-specific cancer driver genes using privatedriver involves the following two steps: genomics data integration and collaborative training. In the integration process, the aberration events from multiple genomics data sources are combined for each institution using the forward and backward propagation method of NetICS. In the collaborative training process, each institution utilizes the federated learning framework to upload encrypted model parameters instead of raw data of all institutions to train a global model by using the non-negative matrix factorization algorithm. We applied privatedriver on head and neck squamous cell and colon cancer from The Cancer Genome Atlas website and evaluated it with two benchmarks using macro-Fscore. The comparison analysis demonstrates that privatedriver achieves comparable results to centralized learning models and outperforms most other nonprivacy preserving models, all while ensuring the confidentiality of patient information. We also demonstrate that, for varying predicted driver gene distributions in subtype, our model fully considers the heterogeneity of subtype and identifies subtype-specific driver genes corresponding to the given prognosis and therapeutic effect. The success of privatedriver reveals the feasibility and effectiveness of identifying cancer subtype-specific driver genes in a data protection manner, providing new insights for future privacy-preserving driver gene identification studies.
Collapse
Affiliation(s)
- Junrong Song
- School of Information; Kunming, P.R. China
- Yunnan Key Laboratory of Service Computing; Yunnan University of Finance and Economics, Kunming, P.R. China
| | - Zhiming Song
- School of Information; Kunming, P.R. China
- Yunnan Key Laboratory of Service Computing; Yunnan University of Finance and Economics, Kunming, P.R. China
| | - Jinpeng Zhang
- School of Information; Kunming, P.R. China
- Yunnan Key Laboratory of Service Computing; Yunnan University of Finance and Economics, Kunming, P.R. China
- The School of Computer Science and Engineering, Yunnan University, Kunming, P.R. China
| | | |
Collapse
|
16
|
Jiang S, Li Y, Firouzi F, Chakrabarty K. Federated clustered multi-domain learning for health monitoring. Sci Rep 2024; 14:903. [PMID: 38195834 PMCID: PMC10776721 DOI: 10.1038/s41598-024-51344-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 01/03/2024] [Indexed: 01/11/2024] Open
Abstract
Wearable Internet of Things (WIoT) and Artificial Intelligence (AI) are rapidly emerging technologies for healthcare. These technologies enable seamless data collection and precise analysis toward fast, resource-abundant, and personalized patient care. However, conventional machine learning workflow requires data to be transferred to the remote cloud server, which leads to significant privacy concerns. To tackle this problem, researchers have proposed federated learning, where end-point users collaboratively learn a shared model without sharing local data. However, data heterogeneity, i.e., variations in data distributions within a client (intra-client) or across clients (inter-client), degrades the performance of federated learning. Existing state-of-the-art methods mainly consider inter-client data heterogeneity, whereas intra-client variations have not received much attention. To address intra-client variations in federated learning, we propose a federated clustered multi-domain learning algorithm based on ClusterGAN, multi-domain learning, and graph neural networks. We applied the proposed algorithm to a case study on stress-level prediction, and our proposed algorithm outperforms two state-of-the-art methods by 4.4% in accuracy and 0.06 in the F1 score. In addition, we demonstrate the effectiveness of the proposed algorithm by investigating variants of its different modules.
Collapse
Affiliation(s)
- Shiyi Jiang
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA.
| | - Yuan Li
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
- Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan, 215316, Jiangsu, China
| | - Farshad Firouzi
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85281, USA
| | - Krishnendu Chakrabarty
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85281, USA
| |
Collapse
|
17
|
Shin H, Ryu K, Kim JY, Lee S. Application of privacy protection technology to healthcare big data. Digit Health 2024; 10:20552076241282242. [PMID: 39502481 PMCID: PMC11536567 DOI: 10.1177/20552076241282242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 08/23/2024] [Indexed: 11/08/2024] Open
Abstract
With the advent of the big data era, data security issues are becoming more common. Healthcare organizations have more data to use for analysis, but they lose money every year due to their inability to prevent data leakage. To overcome these challenges, research on the use of data protection technologies in healthcare is actively underway, particularly research on state-of-the-art technologies, such as federated learning announced by Google and blockchain technology, which has recently attracted attention. To learn about these research efforts, we explored the research, methods, and limitations of the most widely used privacy technologies. After investigating related papers published between 2017 and 2023 and identifying the latest technology trends, we selected related papers and reviewed related technologies. In the process, four technologies were the focus of this study: blockchain, federated learning, isomorphic encryption, and differential privacy. Overall, our analysis provides researchers with insight into privacy technology research by suggesting the limitations of current privacy technologies and suggesting future research directions.
Collapse
Affiliation(s)
- Hyunah Shin
- Department of Healthcare Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
| | - Kyeongmin Ryu
- Department of Healthcare Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
| | - Jong-Yeup Kim
- Department of Healthcare Data Science Center, Konyang University Hospital, Daejeon, Republic of Korea
- Department of Otorhinolaryngology—Head and Neck Surgery, Konyang University College of Medicine, Daejeon, Republic of Korea
- Department of Biomedical Informatics, Konyang University College of Medicine, Daejeon, Republic of Korea
| | - Suehyun Lee
- College of IT Convergence, Gachon University, Seongnam, Republic of Korea
| |
Collapse
|
18
|
Petti M, Farina L. Network medicine for patients' stratification: From single-layer to multi-omics. WIREs Mech Dis 2023; 15:e1623. [PMID: 37323106 DOI: 10.1002/wsbm.1623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 03/08/2023] [Accepted: 05/30/2023] [Indexed: 06/17/2023]
Abstract
Precision medicine research increasingly relies on the integrated analysis of multiple types of omics. In the era of big data, the large availability of different health-related information represents a great, but at the same time untapped, chance with a potentially fundamental role in the prevention, diagnosis and prognosis of diseases. Computational methods are needed to combine this data to create a comprehensive view of a given disease. Network science can model biomedical data in terms of relationships among molecular players of different nature and has been successfully proposed as a new paradigm for studying human diseases. Patient stratification is an open challenge aimed at identifying subtypes with different disease manifestations, severity, and expected survival time. Several stratification approaches based on high-throughput gene expression measurements have been successfully applied. However, few attempts have been proposed to exploit the integration of various genotypic and phenotypic data to discover novel sub-types or improve the detection of known groupings. This article is categorized under: Cancer > Biomedical Engineering Cancer > Computational Models Cancer > Genetics/Genomics/Epigenetics.
Collapse
Affiliation(s)
- Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
19
|
Wang Q, He M, Guo L, Chai H. AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration. Brief Bioinform 2023; 24:bbad269. [PMID: 37497720 DOI: 10.1093/bib/bbad269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Vertical federated learning has gained popularity as a means of enabling collaboration and information sharing between different entities while maintaining data privacy and security. This approach has potential applications in disease healthcare, cancer prognosis prediction, and other industries where data privacy is a major concern. Although using multi-omics data for cancer prognosis prediction provides more information for treatment selection, collecting different types of omics data can be challenging due to their production in various medical institutions. Data owners must comply with strict data protection regulations such as European Union (EU) General Data Protection Regulation. To share patient data across multiple institutions, privacy and security issues must be addressed. Therefore, we propose an adaptive optimized vertical federated-learning-based framework adaptive optimized vertical federated learning for heterogeneous multi-omics data integration (AFEI) to integrate multi-omics data collected from multiple institutions for cancer prognosis prediction. AFEI enables participating parties to build an accurate joint evaluation model for learning more information related to cancer patients from different perspectives, based on the distributed and encrypted multi-omics features shared by multiple institutions. The experimental results demonstrate that AFEI achieves higher prediction accuracy (6.5% on average) than using single omics data by utilizing the encrypted multi-omics data from different institutions, and it performs almost as well as prognosis prediction by directly integrating multi-omics data. Overall, AFEI can be seen as an efficient solution for breaking down barriers to multi-institutional collaboration and promoting the development of cancer prognosis prediction.
Collapse
Affiliation(s)
- Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei 230000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| | - Longyi Guo
- Guangdong Provincial Hospital of Traditional Chinese Medical, Guangzhou 510000, China
| | - Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| |
Collapse
|
20
|
Asaadi S, Martins KN, Lee MM, Pantoja JL. Artificial intelligence for the vascular surgeon. Semin Vasc Surg 2023; 36:394-400. [PMID: 37863611 DOI: 10.1053/j.semvascsurg.2023.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 04/22/2023] [Accepted: 05/24/2023] [Indexed: 10/22/2023]
Abstract
In recent years, artificial intelligence (AI) has permeated different aspects of vascular surgery to solve challenges in clinical practice. Although AI in vascular surgery is still in its early stages, there have been promising developments in its applications to vascular diagnosis, risk stratification, and outcome prediction. By establishing a baseline knowledge of AI, vascular surgeons are better equipped to use and interpret the data from these types of projects. This review aims to provide an overview of the fundamentals of AI and highlight its role in helping vascular surgeons overcome the challenges of clinical practice. In addition, we discuss the limitations of AI and how they affect AI applications.
Collapse
Affiliation(s)
- Sina Asaadi
- Veterans Administration Loma Linda Healthcare System, 11201 Benton Street, Mail Code 112, Loma Linda, CA 92357
| | | | - Mary M Lee
- Veterans Administration Loma Linda Healthcare System, 11201 Benton Street, Mail Code 112, Loma Linda, CA 92357
| | - Joe Luis Pantoja
- Veterans Administration Loma Linda Healthcare System, 11201 Benton Street, Mail Code 112, Loma Linda, CA 92357.
| |
Collapse
|
21
|
Xu X, Qi Z, Han X, Xu A, Geng Z, He X, Ren Y, Duo Z. Predicting anticancer drug sensitivity on distributed data sources using federated deep learning. Heliyon 2023; 9:e18615. [PMID: 37593639 PMCID: PMC10427996 DOI: 10.1016/j.heliyon.2023.e18615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/12/2023] [Accepted: 07/24/2023] [Indexed: 08/19/2023] Open
Abstract
Drug sensitivity prediction plays a crucial role in precision cancer therapy. Collaboration among medical institutions can lead to better performance in drug sensitivity prediction. However, patient privacy and data protection regulation remain a severe impediment to centralized prediction studies. For the first time, we proposed a federated drug sensitivity prediction model with high generalization, combining distributed data sources while protecting private data. Cell lines are first classified into three categories using the waterfall method. Focal loss for solving class imbalance is then embedded into the horizontal federated deep learning framework, i.e., HFDL-fl is presented. Applying HFDL-fl to homogeneous and heterogeneous data, we obtained HFDL-Cross and HFDL-Within. Our comprehensive experiments demonstrated that (i) collaboration by HFDL-fl outperforms private model on local data, (ii) focal loss function can effectively improve model performance to classify cell lines in sensitive and resistant categories, and (iii) HFDL-fl is not significantly affected by data heterogeneity. To summarize, HFDL-fl provides a valuable solution to break down the barriers between medical institutions for privacy-preserving drug sensitivity prediction and therefore facilitates the development of cancer precision medicine and other privacy-related biomedical research.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Xiumei Han
- College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
| | - Aiguo Xu
- Department of Oncology, The Second People's Hospital of Lianyungang, Lianyungang 222023, China
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China
| | - Xinyu He
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Yonggong Ren
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Zhaojun Duo
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| |
Collapse
|
22
|
Khalid N, Qayyum A, Bilal M, Al-Fuqaha A, Qadir J. Privacy-preserving artificial intelligence in healthcare: Techniques and applications. Comput Biol Med 2023; 158:106848. [PMID: 37044052 DOI: 10.1016/j.compbiomed.2023.106848] [Citation(s) in RCA: 84] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/21/2023] [Accepted: 03/30/2023] [Indexed: 04/14/2023]
Abstract
There has been an increasing interest in translating artificial intelligence (AI) research into clinically-validated applications to improve the performance, capacity, and efficacy of healthcare services. Despite substantial research worldwide, very few AI-based applications have successfully made it to clinics. Key barriers to the widespread adoption of clinically validated AI applications include non-standardized medical records, limited availability of curated datasets, and stringent legal/ethical requirements to preserve patients' privacy. Therefore, there is a pressing need to improvise new data-sharing methods in the age of AI that preserve patient privacy while developing AI-based healthcare applications. In the literature, significant attention has been devoted to developing privacy-preserving techniques and overcoming the issues hampering AI adoption in an actual clinical environment. To this end, this study summarizes the state-of-the-art approaches for preserving privacy in AI-based healthcare applications. Prominent privacy-preserving techniques such as Federated Learning and Hybrid Techniques are elaborated along with potential privacy attacks, security challenges, and future directions.
Collapse
Affiliation(s)
- Nazish Khalid
- Information Technology University, Lahore, Pakistan.
| | - Adnan Qayyum
- Information Technology University, Lahore, Pakistan.
| | - Muhammad Bilal
- Big Data Enterprise and Artificial Intelligence Lab (Big-DEAL), University of the West England, Bristol, United Kingdom.
| | | | | |
Collapse
|
23
|
Dasaradharami Reddy K, Gadekallu TR. A Comprehensive Survey on Federated Learning Techniques for Healthcare Informatics. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:8393990. [PMID: 36909974 PMCID: PMC9995203 DOI: 10.1155/2023/8393990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/18/2022] [Accepted: 05/18/2022] [Indexed: 03/06/2023]
Abstract
Healthcare is predominantly regarded as a crucial consideration in promoting the general physical and mental health and well-being of people around the world. The amount of data generated by healthcare systems is enormous, making it challenging to manage. Many machine learning (ML) approaches were implemented to develop dependable and robust solutions to handle the data. ML cannot fully utilize data due to privacy concerns. This primarily happens in the case of medical data. Due to a lack of precise clinical data, the application of ML for the same is challenging and may not yield desired results. Federated learning (FL), which is a recent development in ML where the computation is offloaded to the source of data, appears to be a promising solution to this problem. In this study, we present a detailed survey of applications of FL for healthcare informatics. We initiate a discussion on the need for FL in the healthcare domain, followed by a review of recent review papers. We focus on the fundamentals of FL and the major motivations behind FL for healthcare applications. We then present the applications of FL along with recent state of the art in several verticals of healthcare. Then, lessons learned, open issues, and challenges that are yet to be solved are also highlighted. This is followed by future directions that give directions to the prospective researchers willing to do their research in this domain.
Collapse
Affiliation(s)
- K. Dasaradharami Reddy
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Thippa Reddy Gadekallu
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
24
|
Ogier du Terrail J, Leopold A, Joly C, Béguier C, Andreux M, Maussion C, Schmauch B, Tramel EW, Bendjebbar E, Zaslavskiy M, Wainrib G, Milder M, Gervasoni J, Guerin J, Durand T, Livartowski A, Moutet K, Gautier C, Djafar I, Moisson AL, Marini C, Galtier M, Balazard F, Dubois R, Moreira J, Simon A, Drubay D, Lacroix-Triki M, Franchet C, Bataillon G, Heudel PE. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat Med 2023; 29:135-146. [PMID: 36658418 DOI: 10.1038/s41591-022-02155-w] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 11/23/2022] [Indexed: 01/21/2023]
Abstract
Triple-negative breast cancer (TNBC) is a rare cancer, characterized by high metastatic potential and poor prognosis, and has limited treatment options. The current standard of care in nonmetastatic settings is neoadjuvant chemotherapy (NACT), but treatment efficacy varies substantially across patients. This heterogeneity is still poorly understood, partly due to the paucity of curated TNBC data. Here we investigate the use of machine learning (ML) leveraging whole-slide images and clinical information to predict, at diagnosis, the histological response to NACT for early TNBC women patients. To overcome the biases of small-scale studies while respecting data privacy, we conducted a multicentric TNBC study using federated learning, in which patient data remain secured behind hospitals' firewalls. We show that local ML models relying on whole-slide images can predict response to NACT but that collaborative training of ML models further improves performance, on par with the best current approaches in which ML models are trained using time-consuming expert annotations. Our ML model is interpretable and is sensitive to specific histological patterns. This proof of concept study, in which federated learning is applied to real-world datasets, paves the way for future biomarker discovery using unprecedentedly large datasets.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Camille Franchet
- Institut Universitaire du Cancer de Toulouse (IUCT) Oncopole, Toulouse, France
| | | | | |
Collapse
|
25
|
Li Q, Wei X, Lin H, Liu Y, Chen T, Ma X. Inspecting the Running Process of Horizontal Federated Learning via Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4085-4100. [PMID: 33872152 DOI: 10.1109/tvcg.2021.3074010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As a decentralized training approach, horizontal federated learning (HFL) enables distributed clients to collaboratively learn a machine learning model while keeping personal/private information on local devices. Despite the enhanced performance and efficiency of HFL over local training, clues for inspecting the behaviors of the participating clients and the federated model are usually lacking due to the privacy-preserving nature of HFL. Consequently, the users can only conduct a shallow-level analysis of potential abnormal behaviors and have limited means to assess the contributions of individual clients and implement the necessary intervention. Visualization techniques have been introduced to facilitate the HFL process inspection, usually by providing model metrics and evaluation results as a dashboard representation. Although the existing visualization methods allow a simple examination of the HFL model performance, they cannot support the intensive exploration of the HFL process. In this article, strictly following the HFL privacy-preserving protocol, we design an exploratory visual analytics system for the HFL process termed HFLens, which supports comparative visual interpretation at the overview, communication round, and client instance levels. Specifically, the proposed system facilitates the investigation of the overall process involving all clients, the correlation analysis of clients' information in one or different communication round(s), the identification of potential anomalies, and the contribution assessment of each HFL client. Two case studies confirm the efficacy of our system. Experts' feedback suggests that our approach indeed helps in understanding and diagnosing the HFL process better.
Collapse
|
26
|
Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng 2022; 6:1330-1345. [PMID: 35788685 PMCID: PMC12063568 DOI: 10.1038/s41551-022-00898-y] [Citation(s) in RCA: 123] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 05/03/2022] [Indexed: 01/14/2023]
Abstract
In the past decade, the application of machine learning (ML) to healthcare has helped drive the automation of physician tasks as well as enhancements in clinical capabilities and access to care. This progress has emphasized that, from model development to model deployment, data play central roles. In this Review, we provide a data-centric view of the innovations and challenges that are defining ML for healthcare. We discuss deep generative models and federated learning as strategies to augment datasets for improved model performance, as well as the use of the more recent transformer models for handling larger datasets and enhancing the modelling of clinical text. We also discuss data-focused problems in the deployment of ML, emphasizing the need to efficiently deliver data to ML models for timely clinical predictions and to account for natural data shifts that can deteriorate model performance.
Collapse
Affiliation(s)
- Angela Zhang
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA.
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA.
- Greenstone Biosciences, Palo Alto, CA, USA.
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Lei Xing
- Department of Radiation Oncology, School of Medicine, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Informatics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Joseph C Wu
- Stanford Cardiovascular Institute, School of Medicine, Stanford University, Stanford, CA, USA.
- Greenstone Biosciences, Palo Alto, CA, USA.
- Departments of Medicine, Division of Cardiovascular Medicine Stanford University, Stanford, CA, USA.
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
27
|
Abstract
In the big data era, vast volumes of data are generated daily as the foundation of data-driven scientific discovery. Thanks to the recent open data movement, much of these data are being made available to the public, significantly advancing scientific research and accelerating socio-technical development. However, not all data are suitable for opening or sharing because of concerns over privacy, ownership, trust, and incentive. Therefore, data sharing remains a challenge for specific data types and holders, making a bottleneck for further unleashing the potential of these "closed data." To address this challenge, in this perspective, we conceptualize the current practices and technologies in data collaboration in a data-sharing-free manner and propose a concept of the model-sharing strategy for using closed data without sharing them. Supported by emerging advances in artificial intelligence, this strategy will unleash the large potential in closed data. Moreover, we show the advantages of the model-sharing strategy and explain how it will lead to a new paradigm of big data governance and collaboration.
Collapse
|
28
|
Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS. Federated learning-based AI approaches in smart healthcare: concepts, taxonomies, challenges and open issues. CLUSTER COMPUTING 2022; 26:1-41. [PMID: 35996680 PMCID: PMC9385101 DOI: 10.1007/s10586-022-03658-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 05/10/2022] [Accepted: 06/17/2022] [Indexed: 06/15/2023]
Abstract
Federated Learning (FL), Artificial Intelligence (AI), and Explainable Artificial Intelligence (XAI) are the most trending and exciting technology in the intelligent healthcare field. Traditionally, the healthcare system works based on centralized agents sharing their raw data. Therefore, huge vulnerabilities and challenges are still existing in this system. However, integrating with AI, the system would be multiple agent collaborators who are capable of communicating with their desired host efficiently. Again, FL is another interesting feature, which works decentralized manner; it maintains the communication based on a model in the preferred system without transferring the raw data. The combination of FL, AI, and XAI techniques can be capable of minimizing several limitations and challenges in the healthcare system. This paper presents a complete analysis of FL using AI for smart healthcare applications. Initially, we discuss contemporary concepts of emerging technologies such as FL, AI, XAI, and the healthcare system. We integrate and classify the FL-AI with healthcare technologies in different domains. Further, we address the existing problems, including security, privacy, stability, and reliability in the healthcare field. In addition, we guide the readers to solving strategies of healthcare using FL and AI. Finally, we address extensive research areas as well as future potential prospects regarding FL-based AI research in the healthcare management system.
Collapse
Affiliation(s)
- Anichur Rahman
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Md. Sazzad Hossain
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Ghulam Muhammad
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dipanjali Kundu
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
| | - Tanoy Debnath
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Muaz Rahman
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
| | - Md. Saikat Islam Khan
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Prayag Tiwari
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Shahab S. Band
- Future Technology Research Center, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin, 64002 Taiwan
| |
Collapse
|
29
|
Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated Learning in Medical Imaging: Part I: Toward Multicentral Health Care Ecosystems. J Am Coll Radiol 2022; 19:969-974. [PMID: 35483439 DOI: 10.1016/j.jacr.2022.03.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/28/2022] [Accepted: 03/29/2022] [Indexed: 11/28/2022]
Abstract
With recent developments in medical imaging facilities, extensive medical imaging data are produced every day. This increasing amount of data provides an opportunity for researchers to develop data-driven methods and deliver better health care. However, data-driven models require a large amount of data to be adequately trained. Furthermore, there is always a limited amount of data available in each data center. Hence, deep learning models trained on local data centers might not reach their total performance capacity. One solution could be to accumulate all data from different centers into one center. However, data privacy regulations do not allow medical institutions to easily combine their data, and this becomes increasingly difficult when institutions from multiple countries are involved. Another solution is to use privacy-preserving algorithms, which can make use of all the data available in multiple centers while keeping the sensitive data private. Federated learning (FL) is such a mechanism that enables deploying large-scale machine learning models trained on different data centers without sharing sensitive data. In FL, instead of transferring data, a general model is trained on local data sets and transferred between data centers. FL has been identified as a promising field of research, with extensive possible uses in medical research and practice. This article introduces FL, with a comprehensive look into its concepts and recent research trends in medical imaging.
Collapse
Affiliation(s)
- Erfan Darzidehkalani
- Department of Radiotherapy, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health, University Medical Center Groningen, University of Groningen, the Netherlands.
| | - Mohammad Ghasemi-Rad
- Assistant Professor of Radiology, Department of Interventional Radiology, Baylor College of Medicine, Houston, Texas
| | - P M A van Ooijen
- Department of Radiotherapy, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; Coordinator Machine Learning Lab, Data Science Center in Health, University Medical Center Groningen, University of Groningen, the Netherlands
| |
Collapse
|
30
|
Torkzadehmahani R, Nasirigerdeh R, Blumenthal DB, Kacprowski T, List M, Matschinske J, Spaeth J, Wenke NK, Baumbach J. Privacy-Preserving Artificial Intelligence Techniques in Biomedicine. Methods Inf Med 2022; 61:e12-e27. [PMID: 35062032 PMCID: PMC9246509 DOI: 10.1055/s-0041-1740630] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 09/18/2021] [Indexed: 12/15/2022]
Abstract
BACKGROUND Artificial intelligence (AI) has been successfully applied in numerous scientific domains. In biomedicine, AI has already shown tremendous potential, e.g., in the interpretation of next-generation sequencing data and in the design of clinical decision support systems. OBJECTIVES However, training an AI model on sensitive data raises concerns about the privacy of individual participants. For example, summary statistics of a genome-wide association study can be used to determine the presence or absence of an individual in a given dataset. This considerable privacy risk has led to restrictions in accessing genomic and other biomedical data, which is detrimental for collaborative research and impedes scientific progress. Hence, there has been a substantial effort to develop AI methods that can learn from sensitive data while protecting individuals' privacy. METHOD This paper provides a structured overview of recent advances in privacy-preserving AI techniques in biomedicine. It places the most important state-of-the-art approaches within a unified taxonomy and discusses their strengths, limitations, and open problems. CONCLUSION As the most promising direction, we suggest combining federated machine learning as a more scalable approach with other additional privacy-preserving techniques. This would allow to merge the advantages to provide privacy guarantees in a distributed way for biomedical applications. Nonetheless, more research is necessary as hybrid approaches pose new challenges such as additional network or computation overhead.
Collapse
Affiliation(s)
- Reihaneh Torkzadehmahani
- Institute for Artificial Intelligence in Medicine and Healthcare, Technical University of Munich, Munich, Germany
| | - Reza Nasirigerdeh
- Institute for Artificial Intelligence in Medicine and Healthcare, Technical University of Munich, Munich, Germany
- Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - David B. Blumenthal
- Department of Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Tim Kacprowski
- Division of Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Medical School Hannover, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, Technical University of Munich, Munich, Germany
| | - Julian Matschinske
- E.U. Horizon2020 FeatureCloud Project Consortium
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Julian Spaeth
- E.U. Horizon2020 FeatureCloud Project Consortium
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Nina Kerstin Wenke
- E.U. Horizon2020 FeatureCloud Project Consortium
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- E.U. Horizon2020 FeatureCloud Project Consortium
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
31
|
Bharati S, Mondal MRH, Podder P, Prasath VS. Federated learning: Applications, challenges and future directions. INTERNATIONAL JOURNAL OF HYBRID INTELLIGENT SYSTEMS 2022; 18:19-35. [DOI: 10.3233/his-220006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Federated learning (FL) refers to a system in which a central aggregator coordinates the efforts of several clients to solve the issues of machine learning. This setting allows the training data to be dispersed in order to protect the privacy of each device. This paper provides an overview of federated learning systems, with a focus on healthcare. FL is reviewed in terms of its frameworks, architectures and applications. It is shown here that FL solves the preceding issues with a shared global deep learning (DL) model via a central aggregator server. Inspired by the rapid growth of FL research, this paper examines recent developments and provides a comprehensive list of unresolved issues. Several privacy methods including secure multiparty computation, homomorphic encryption, differential privacy and stochastic gradient descent are described in the context of FL. Moreover, a review is provided for different classes of FL such as horizontal and vertical FL and federated transfer learning. FL has applications in wireless communication, service recommendation, intelligent medical diagnosis system and healthcare, which we review in this paper. We also present a comprehensive review of existing FL challenges for example privacy protection, communication cost, systems heterogeneity, unreliable model upload, followed by future research directions.
Collapse
Affiliation(s)
- Subrato Bharati
- Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - M. Rubaiyat Hossain Mondal
- Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Prajoy Podder
- Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - V.B. Surya Prasath
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
32
|
A Review on Federated Learning and Machine Learning Approaches: Categorization, Application Areas, and Blockchain Technology. INFORMATION 2022. [DOI: 10.3390/info13050263] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Federated learning (FL) is a scheme in which several consumers work collectively to unravel machine learning (ML) problems, with a dominant collector synchronizing the procedure. This decision correspondingly enables the training data to be distributed, guaranteeing that the individual device’s data are secluded. The paper systematically reviewed the available literature using the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) guiding principle. The study presents a systematic review of appliable ML approaches for FL, reviews the categorization of FL, discusses the FL application areas, presents the relationship between FL and Blockchain Technology (BT), and discusses some existing literature that has used FL and ML approaches. The study also examined applicable machine learning models for federated learning. The inclusion measures were (i) published between 2017 and 2021, (ii) written in English, (iii) published in a peer-reviewed scientific journal, and (iv) Preprint published papers. Unpublished studies, thesis and dissertation studies, (ii) conference papers, (iii) not in English, and (iv) did not use artificial intelligence models and blockchain technology were all removed from the review. In total, 84 eligible papers were finally examined in this study. Finally, in recent years, the amount of research on ML using FL has increased. Accuracy equivalent to standard feature-based techniques has been attained, and ensembles of many algorithms may yield even better results. We discovered that the best results were obtained from the hybrid design of an ML ensemble employing expert features. However, some additional difficulties and issues need to be overcome, such as efficiency, complexity, and smaller datasets. In addition, novel FL applications should be investigated from the standpoint of the datasets and methodologies.
Collapse
|
33
|
Danilevicz MF, Gill M, Anderson R, Batley J, Bennamoun M, Bayer PE, Edwards D. Plant Genotype to Phenotype Prediction Using Machine Learning. Front Genet 2022; 13:822173. [PMID: 35664329 PMCID: PMC9159391 DOI: 10.3389/fgene.2022.822173] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/07/2022] [Indexed: 12/13/2022] Open
Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
- *Correspondence: David Edwards,
| |
Collapse
|
34
|
Che S, Kong Z, Peng H, Sun L, Leow A, Chen Y, He L. Federated Multi-View Learning for Private Medical Data Integration and Analysis. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3501816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in medical field. Two critical challenges are identified: Firstly, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Secondly, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this paper, we present a generic Federated Multi-View Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-View Learning (V-FedMV) and Horizontal Federated Multi-View Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated that the proposed approach can make full use of multi-view data in a privacy-preserving way, and both V-FedMV and H-FedMV perform better than their single-view and pairwise counterparts. Besides, the framework can be easily adapted to deal with multi-view sequential data. We have developed a sequential model (S-FedMV) that takes sequence of multi-view data as input and demonstrated it experimentally. To the best of our knowledge, this framework is the first to consider both vertical and horizontal diversification in the multi-view setting, as well as their sequential federated learning.
Collapse
Affiliation(s)
| | | | | | | | - Alex Leow
- University of Illinois at Chicago, USA
| | | | | |
Collapse
|
35
|
Navaz AN, T. El-Kassabi H, Serhani MA, Oulhaj A, Khalil K. A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine. J Pers Med 2022; 12:768. [PMID: 35629190 PMCID: PMC9144142 DOI: 10.3390/jpm12050768] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 05/02/2022] [Indexed: 02/05/2023] Open
Abstract
Precision medicine can be defined as the comparison of a new patient with existing patients that have similar characteristics and can be referred to as patient similarity. Several deep learning models have been used to build and apply patient similarity networks (PSNs). However, the challenges related to data heterogeneity and dimensionality make it difficult to use a single model to reduce data dimensionality and capture the features of diverse data types. In this paper, we propose a multi-model PSN that considers heterogeneous static and dynamic data. The combination of deep learning models and PSN allows ample clinical evidence and information extraction against which similar patients can be compared. We use the bidirectional encoder representations from transformers (BERT) to analyze the contextual data and generate word embedding, where semantic features are captured using a convolutional neural network (CNN). Dynamic data are analyzed using a long-short-term-memory (LSTM)-based autoencoder, which reduces data dimensionality and preserves the temporal features of the data. We propose a data fusion approach combining temporal and clinical narrative data to estimate patient similarity. The experiments we conducted proved that our model provides a higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms.
Collapse
Affiliation(s)
- Alramzana Nujum Navaz
- Department of Information Systems and Security, College of Information Technology, UAE University, Al Ain P.O. Box 15551, United Arab Emirates;
| | - Hadeel T. El-Kassabi
- Department of Computer Science and Software Engineering, Concordia University, Montreal, QC H3G 1M8, Canada;
| | - Mohamed Adel Serhani
- Department of Information Systems and Security, College of Information Technology, UAE University, Al Ain P.O. Box 15551, United Arab Emirates;
| | - Abderrahim Oulhaj
- Department of Epidemiology and Public Health, College of Medicine and Health Sciences, Khalifa University, Abu Dhabi P.O. Box 17666, United Arab Emirates;
- Institute of Public Health, College of Medicine and Health Sciences, UAE University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Khaled Khalil
- Faculty of Applied Science and Engineering, University of Toronto, Toronto, ON M5S 1A4, Canada;
| |
Collapse
|
36
|
Dang TK, Lan X, Weng J, Feng M. Federated Learning for Electronic Health Records. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3514500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In data-driven medical research, multi-center studies have long been preferred over single-center ones due to a single institute sometimes not having enough data to obtain sufficient statistical power for certain hypothesis testings as well as predictive and subgroup studies. The wide adoption of electronic health records (EHRs) has made multi-institutional collaboration much more feasible. However, concerns over infrastructures, regulations, privacy and data standardization present a challenge to data sharing across healthcare institutions. Federated Learning (FL), which allows multiple sites to collaboratively train a global model without directly sharing data, has become a promising paradigm to break the data isolation. In this study, we surveyed existing works on FL applications in EHRs and evaluated the performance of current state-of-the-art FL algorithms on two EHR machine learning tasks of significant clinical importance on a real world multi-center EHR dataset.
Collapse
Affiliation(s)
| | - Xiang Lan
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | | | - Mengling Feng
- Institute of Data Science & Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| |
Collapse
|
37
|
Palihawadana C, Wiratunga N, Wijekoon A, Kalutarage H. FedSim: Similarity guided model aggregation for Federated Learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.141] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
38
|
Jordan S, Fontaine C, Hendricks-Sturrup R. Selecting Privacy-Enhancing Technologies for Managing Health Data Use. Front Public Health 2022; 10:814163. [PMID: 35372185 PMCID: PMC8967420 DOI: 10.3389/fpubh.2022.814163] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 02/14/2022] [Indexed: 11/29/2022] Open
Abstract
Privacy protection for health data is more than simply stripping datasets of specific identifiers. Privacy protection increasingly means the application of privacy-enhancing technologies (PETs), also known as privacy engineering. Demands for the application of PETs are not yet met with ease of use or even understanding. This paper provides a scope of the current peer-reviewed evidence regarding the practical use or adoption of various PETs for managing health data privacy. We describe the state of knowledge of PETS for the use and exchange of health data specifically and build a practical perspective on the steps needed to improve the standardization of the application of PETs for diverse uses of health data.
Collapse
Affiliation(s)
- Sara Jordan
- Future of Privacy Forum, Washington, DC, United States
| | - Clara Fontaine
- Centre for Quantum Technologies at the National University of Singapore, Singapore, Singapore
| | | |
Collapse
|
39
|
Naz S, Phan KT, Chen YP. A comprehensive review of federated learning for COVID-19 detection. INT J INTELL SYST 2022; 37:2371-2392. [PMID: 37520859 PMCID: PMC9015599 DOI: 10.1002/int.22777] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 10/31/2021] [Accepted: 11/16/2021] [Indexed: 11/09/2022]
Abstract
The coronavirus of 2019 (COVID-19) was declared a global pandemic by World Health Organization in March 2020. Effective testing is crucial to slow the spread of the pandemic. Artificial intelligence and machine learning techniques can help COVID-19 detection using various clinical symptom data. While deep learning (DL) approach requiring centralized data is susceptible to a high risk of data privacy breaches, federated learning (FL) approach resting on decentralized data can preserve data privacy, a critical factor in the health domain. This paper reviews recent advances in applying DL and FL techniques for COVID-19 detection with a focus on the latter. A model FL implementation use case in health systems with a COVID-19 detection using chest X-ray image data sets is studied. We have also reviewed applications of previously published FL experiments for COVID-19 research to demonstrate the applicability of FL in tackling health research issues. Last, several challenges in FL implementation in the healthcare domain are discussed in terms of potential future work.
Collapse
Affiliation(s)
- Sadaf Naz
- Department of Computer Science and Information Technology, School of Engineering and Mathematical SciencesLa Trobe UniversityBundooraVictoriaAustralia
| | - Khoa T. Phan
- Department of Computer Science and Information Technology, School of Engineering and Mathematical SciencesLa Trobe UniversityBundooraVictoriaAustralia
| | - Yi‐Ping Phoebe Chen
- Department of Computer Science and Information Technology, School of Engineering and Mathematical SciencesLa Trobe UniversityBundooraVictoriaAustralia
| |
Collapse
|
40
|
Antunes RS, da Costa CA, Küderle A, Yari IA, Eskofier B. Federated Learning for Healthcare: Systematic Review and Architecture Proposal. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3501813] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The use of machine learning (ML) with electronic health records (EHR) is growing in popularity as a means to extract knowledge that can improve the decision-making process in healthcare. Such methods require training of high-quality learning models based on diverse and comprehensive datasets, which are hard to obtain due to the sensitive nature of medical data from patients. In this context, federated learning (FL) is a methodology that enables the distributed training of machine learning models with remotely hosted datasets without the need to accumulate data and, therefore, compromise it. FL is a promising solution to improve ML-based systems, better aligning them to regulatory requirements, improving trustworthiness and data sovereignty. However, many open questions must be addressed before the use of FL becomes widespread. This article aims at presenting a systematic literature review on current research about FL in the context of EHR data for healthcare applications. Our analysis highlights the main research topics, proposed solutions, case studies, and respective ML methods. Furthermore, the article discusses a general architecture for FL applied to healthcare data based on the main insights obtained from the literature review. The collected literature corpus indicates that there is extensive research on the privacy and confidentiality aspects of training data and model sharing, which is expected given the sensitive nature of medical data. Studies also explore improvements to the aggregation mechanisms required to generate the learning model from distributed contributions and case studies with different types of medical data.
Collapse
Affiliation(s)
| | | | | | | | - Björn Eskofier
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| |
Collapse
|
41
|
Allam A, Feuerriegel S, Rebhan M, Krauthammer M. Analyzing Patient Trajectories With Artificial Intelligence. J Med Internet Res 2021; 23:e29812. [PMID: 34870606 PMCID: PMC8686456 DOI: 10.2196/29812] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/26/2021] [Accepted: 10/29/2021] [Indexed: 01/16/2023] Open
Abstract
In digital medicine, patient data typically record health events over time (eg, through electronic health records, wearables, or other sensing technologies) and thus form unique patient trajectories. Patient trajectories are highly predictive of the future course of diseases and therefore facilitate effective care. However, digital medicine often uses only limited patient data, consisting of health events from only a single or small number of time points while ignoring additional information encoded in patient trajectories. To analyze such rich longitudinal data, new artificial intelligence (AI) solutions are needed. In this paper, we provide an overview of the recent efforts to develop trajectory-aware AI solutions and provide suggestions for future directions. Specifically, we examine the implications for developing disease models from patient trajectories along the typical workflow in AI: problem definition, data processing, modeling, evaluation, and interpretation. We conclude with a discussion of how such AI solutions will allow the field to build robust models for personalized risk scoring, subtyping, and disease pathway discovery.
Collapse
Affiliation(s)
- Ahmed Allam
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Biomedical Informatics, University Hospital of Zurich, Zurich, Switzerland
| | - Stefan Feuerriegel
- Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
- ETH Artificial Intelligence Center, ETH Zurich, Zurich, Switzerland
- Ludwig Maximilian University of Munich, Munich, Germany
| | - Michael Rebhan
- Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Michael Krauthammer
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Biomedical Informatics, University Hospital of Zurich, Zurich, Switzerland
- Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States
| |
Collapse
|
42
|
Topaloglu MY, Morrell EM, Rajendran S, Topaloglu U. In the Pursuit of Privacy: The Promises and Predicaments of Federated Learning in Healthcare. Front Artif Intell 2021; 4:746497. [PMID: 34693280 PMCID: PMC8528445 DOI: 10.3389/frai.2021.746497] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 09/15/2021] [Indexed: 11/13/2022] Open
Abstract
Artificial Intelligence and its subdomain, Machine Learning (ML), have shown the potential to make an unprecedented impact in healthcare. Federated Learning (FL) has been introduced to alleviate some of the limitations of ML, particularly the capability to train on larger datasets for improved performance, which is usually cumbersome for an inter-institutional collaboration due to existing patient protection laws and regulations. Moreover, FL may also play a crucial role in circumventing ML's exigent bias problem by accessing underrepresented groups' data spanning geographically distributed locations. In this paper, we have discussed three FL challenges, namely: privacy of the model exchange, ethical perspectives, and legal considerations. Lastly, we have proposed a model that could aide in assessing data contributions of a FL implementation. In light of the expediency and adaptability of using the Sørensen-Dice Coefficient over the more limited (e.g., horizontal FL) and computationally expensive Shapley Values, we sought to demonstrate a new paradigm that we hope, will become invaluable for sharing any profit and responsibilities that may accompany a FL endeavor.
Collapse
Affiliation(s)
| | | | - Suraj Rajendran
- Wake Forest School of Medicine, Winston Salem, NC, United States
| | - Umit Topaloglu
- Wake Forest School of Medicine, Winston Salem, NC, United States
| |
Collapse
|
43
|
Danilevicz MF, Bayer PE, Nestor BJ, Bennamoun M, Edwards D. Resources for image-based high-throughput phenotyping in crops and data sharing challenges. PLANT PHYSIOLOGY 2021; 187:699-715. [PMID: 34608963 PMCID: PMC8561249 DOI: 10.1093/plphys/kiab301] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 05/26/2021] [Indexed: 05/06/2023]
Abstract
High-throughput phenotyping (HTP) platforms are capable of monitoring the phenotypic variation of plants through multiple types of sensors, such as red green and blue (RGB) cameras, hyperspectral sensors, and computed tomography, which can be associated with environmental and genotypic data. Because of the wide range of information provided, HTP datasets represent a valuable asset to characterize crop phenotypes. As HTP becomes widely employed with more tools and data being released, it is important that researchers are aware of these resources and how they can be applied to accelerate crop improvement. Researchers may exploit these datasets either for phenotype comparison or employ them as a benchmark to assess tool performance and to support the development of tools that are better at generalizing between different crops and environments. In this review, we describe the use of image-based HTP for yield prediction, root phenotyping, development of climate-resilient crops, detecting pathogen and pest infestation, and quantitative trait measurement. We emphasize the need for researchers to share phenotypic data, and offer a comprehensive list of available datasets to assist crop breeders and tool developers to leverage these resources in order to accelerate crop breeding.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Benjamin J. Nestor
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Mohammed Bennamoun
- Department of Computer Science and Software Engineering, University of Western Australia, Perth, Western Australia 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
- Author for communication:
| |
Collapse
|
44
|
Hallock H, Marshall SE, 't Hoen PAC, Nygård JF, Hoorne B, Fox C, Alagaratnam S. Federated Networks for Distributed Analysis of Health Data. Front Public Health 2021; 9:712569. [PMID: 34660512 PMCID: PMC8514765 DOI: 10.3389/fpubh.2021.712569] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 08/16/2021] [Indexed: 11/13/2022] Open
Abstract
Access to health data, important for population health planning, basic and clinical research and health industry utilization, remains problematic. Legislation intended to improve access to personal data across national borders has proven to be a double-edged sword, where complexity and implications from misinterpretations have paradoxically resulted in data becoming more siloed. As a result, the potential for development of health specific AI and clinical decision support tools built on real-world data have yet to be fully realized. In this perspective, we propose federated networks as a solution to enable access to diverse data sets and tackle known and emerging health problems. The perspective draws on experience from the World Economic Forum Breaking Barriers to Health Data project, the Personal Health Train and Vantage6 infrastructures, and industry insights. We first define the concept of federated networks in a healthcare context, present the value they can bring to multiple stakeholders, and discuss their establishment, operation and implementation. Challenges of federated networks in healthcare are highlighted, as well as the resulting need for and value of an independent orchestrator for their safe, sustainable and scalable implementation.
Collapse
Affiliation(s)
- Harry Hallock
- Healthcare Programme, Group Research and Development, DNV, Oslo, Norway
| | | | - Peter A. C. 't Hoen
- Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Jan F. Nygård
- Department of Registry Informatics, Cancer Registry of Norway, Oslo, Norway
| | - Bert Hoorne
- Industry Technology Strategy for Western Europe Health, Microsoft, Bruges, Belgium
| | - Cameron Fox
- Platform for Shaping the Future of Health and Healthcare, World Economic Forum, New York, NY, United States
| | | |
Collapse
|
45
|
Sadilek A, Liu L, Nguyen D, Kamruzzaman M, Serghiou S, Rader B, Ingerman A, Mellem S, Kairouz P, Nsoesie EO, MacFarlane J, Vullikanti A, Marathe M, Eastham P, Brownstein JS, Arcas BAY, Howell MD, Hernandez J. Privacy-first health research with federated learning. NPJ Digit Med 2021; 4:132. [PMID: 34493770 PMCID: PMC8423792 DOI: 10.1038/s41746-021-00489-2] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/21/2021] [Indexed: 11/29/2022] Open
Abstract
Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show-on a diverse set of single and multi-site health studies-that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research-across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science-aspects that used to be at odds with each other.
Collapse
Affiliation(s)
| | | | - Dung Nguyen
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Methun Kamruzzaman
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | | | - Benjamin Rader
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA, USA
- Department of Epidemiology, Boston University, Boston, MA, USA
| | | | | | | | | | | | - Anil Vullikanti
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Madhav Marathe
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | | | - John S Brownstein
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | | | |
Collapse
|
46
|
Lee TH, Lee J, Jun CH. Bilingual autoencoder-based efficient harmonization of multi-source private data for accurate predictive modeling. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.03.064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
47
|
Abdulkareem M, Petersen SE. The Promise of AI in Detection, Diagnosis, and Epidemiology for Combating COVID-19: Beyond the Hype. Front Artif Intell 2021; 4:652669. [PMID: 34056579 PMCID: PMC8160471 DOI: 10.3389/frai.2021.652669] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/13/2021] [Indexed: 12/24/2022] Open
Abstract
COVID-19 has created enormous suffering, affecting lives, and causing deaths. The ease with which this type of coronavirus can spread has exposed weaknesses of many healthcare systems around the world. Since its emergence, many governments, research communities, commercial enterprises, and other institutions and stakeholders around the world have been fighting in various ways to curb the spread of the disease. Science and technology have helped in the implementation of policies of many governments that are directed toward mitigating the impacts of the pandemic and in diagnosing and providing care for the disease. Recent technological tools, artificial intelligence (AI) tools in particular, have also been explored to track the spread of the coronavirus, identify patients with high mortality risk and diagnose patients for the disease. In this paper, areas where AI techniques are being used in the detection, diagnosis and epidemiological predictions, forecasting and social control for combating COVID-19 are discussed, highlighting areas of successful applications and underscoring issues that need to be addressed to achieve significant progress in battling COVID-19 and future pandemics. Several AI systems have been developed for diagnosing COVID-19 using medical imaging modalities such as chest CT and X-ray images. These AI systems mainly differ in their choices of the algorithms for image segmentation, classification and disease diagnosis. Other AI-based systems have focused on predicting mortality rate, long-term patient hospitalization and patient outcomes for COVID-19. AI has huge potential in the battle against the COVID-19 pandemic but successful practical deployments of these AI-based tools have so far been limited due to challenges such as limited data accessibility, the need for external evaluation of AI models, the lack of awareness of AI experts of the regulatory landscape governing the deployment of AI tools in healthcare, the need for clinicians and other experts to work with AI experts in a multidisciplinary context and the need to address public concerns over data collection, privacy, and protection. Having a dedicated team with expertise in medical data collection, privacy, access and sharing, using federated learning whereby AI scientists hand over training algorithms to the healthcare institutions to train models locally, and taking full advantage of biomedical data stored in biobanks can alleviate some of problems posed by these challenges. Addressing these challenges will ultimately accelerate the translation of AI research into practical and useful solutions for combating pandemics.
Collapse
Affiliation(s)
- Musa Abdulkareem
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Steffen E. Petersen
- Barts Heart Centre, Barts Health National Health Service (NHS) Trust, London, United Kingdom
- National Institute for Health Research (NIHR) Barts Biomedical Research Centre, William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
- Health Data Research UK, London, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
48
|
Chen R, Zhang Y, Dou Z, Chen F, Xie K, Wang S. Data Sharing and Privacy in Pharmaceutical Studies. Curr Pharm Des 2021; 27:911-918. [PMID: 33438533 DOI: 10.2174/1381612827999210112204732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/30/2020] [Indexed: 11/22/2022]
Abstract
Adverse drug events have been a long-standing concern for the wide-ranging harms to public health, and the substantial disease burden. The key to diminish or eliminate the impacts is to build a comprehensive pharmacovigilance system. Application of the "big data" approach has been proved to assist the detection of adverse drug events by involving previously unavailable data sources and promoting health information exchange. Even though challenges and potential risks still remain. The lack of effective privacy-preserving measures in the flow of medical data is the most important Accepted: one, where urgent actions are required to prevent the threats and facilitate the construction of pharmacovigilance systems. Several privacy protection methods are reviewed in this article, which may be helpful to break the barrier.
Collapse
Affiliation(s)
- Rufan Chen
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| | - Yi Zhang
- Department of Cardiology, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Zuochao Dou
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| | - Feng Chen
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| | - Kang Xie
- Key Lab of Information Network Security of Ministry of Public Security, the Third Research Institute of Ministry of Public Security, Shanghai, China
| | - Shuang Wang
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, China
| |
Collapse
|
49
|
Kirienko M, Sollini M, Ninatti G, Loiacono D, Giacomello E, Gozzi N, Amigoni F, Mainardi L, Lanzi PL, Chiti A. Distributed learning: a reliable privacy-preserving strategy to change multicenter collaborations using AI. Eur J Nucl Med Mol Imaging 2021; 48:3791-3804. [PMID: 33847779 PMCID: PMC8041944 DOI: 10.1007/s00259-021-05339-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/24/2021] [Indexed: 12/12/2022]
Abstract
Purpose The present scoping review aims to assess the non-inferiority of distributed learning over centrally and locally trained machine learning (ML) models in medical applications. Methods We performed a literature search using the term “distributed learning” OR “federated learning” in the PubMed/MEDLINE and EMBASE databases. No start date limit was used, and the search was extended until July 21, 2020. We excluded articles outside the field of interest; guidelines or expert opinion, review articles and meta-analyses, editorials, letters or commentaries, and conference abstracts; articles not in the English language; and studies not using medical data. Selected studies were classified and analysed according to their aim(s). Results We included 26 papers aimed at predicting one or more outcomes: namely risk, diagnosis, prognosis, and treatment side effect/adverse drug reaction. Distributed learning was compared to centralized or localized training in 21/26 and 14/26 selected papers, respectively. Regardless of the aim, the type of input, the method, and the classifier, distributed learning performed close to centralized training, but two experiments focused on diagnosis. In all but 2 cases, distributed learning outperformed locally trained models. Conclusion Distributed learning resulted in a reliable strategy for model development; indeed, it performed equally to models trained on centralized datasets. Sensitive data can get preserved since they are not shared for model development. Distributed learning constitutes a promising solution for ML-based research and practice since large, diverse datasets are crucial for success.
Collapse
Affiliation(s)
- Margarita Kirienko
- Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy.,Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | - Martina Sollini
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy. .,IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy.
| | - Gaia Ninatti
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | | | | | - Noemi Gozzi
- IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
| | | | | | | | - Arturo Chiti
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy.,IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
| |
Collapse
|
50
|
Liu JC, Goetz J, Sen S, Tewari A. Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data. JMIR Mhealth Uhealth 2021; 9:e23728. [PMID: 33783362 PMCID: PMC8044739 DOI: 10.2196/23728] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 12/10/2020] [Accepted: 02/25/2021] [Indexed: 12/27/2022] Open
Abstract
Background The use of wearables facilitates data collection at a previously unobtainable scale, enabling the construction of complex predictive models with the potential to improve health. However, the highly personal nature of these data requires strong privacy protection against data breaches and the use of data in a way that users do not intend. One method to protect user privacy while taking advantage of sharing data across users is federated learning, a technique that allows a machine learning model to be trained using data from all users while only storing a user’s data on that user’s device. By keeping data on users’ devices, federated learning protects users’ private data from data leaks and breaches on the researcher’s central server and provides users with more control over how and when their data are used. However, there are few rigorous studies on the effectiveness of federated learning in the mobile health (mHealth) domain. Objective We review federated learning and assess whether it can be useful in the mHealth field, especially for addressing common mHealth challenges such as privacy concerns and user heterogeneity. The aims of this study are to describe federated learning in an mHealth context, apply a simulation of federated learning to an mHealth data set, and compare the performance of federated learning with the performance of other predictive models. Methods We applied a simulation of federated learning to predict the affective state of 15 subjects using physiological and motion data collected from a chest-worn device for approximately 36 minutes. We compared the results from this federated model with those from a centralized or server model and with the results from training individual models for each subject. Results In a 3-class classification problem using physiological and motion data to predict whether the subject was undertaking a neutral, amusing, or stressful task, the federated model achieved 92.8% accuracy on average, the server model achieved 93.2% accuracy on average, and the individual model achieved 90.2% accuracy on average. Conclusions Our findings support the potential for using federated learning in mHealth. The results showed that the federated model performed better than a model trained separately on each individual and nearly as well as the server model. As federated learning offers more privacy than a server model, it may be a valuable option for designing sensitive data collection methods.
Collapse
Affiliation(s)
- Jessica Chia Liu
- Department of Statistics, University of Michigan, Ann Arbor, MI, United States
| | - Jack Goetz
- Department of Statistics, University of Michigan, Ann Arbor, MI, United States
| | - Srijan Sen
- Molecular and Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, United States.,Department of Psychiatry, University of Michigan, Ann Arbor, MI, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|