1
|
Ankolekar A, Eppings L, Bottari F, Pinho IF, Howard K, Baker R, Nan Y, Xing X, Walsh SLF, Vos W, Yang G, Lambin P. Using artificial intelligence and predictive modelling to enable learning healthcare systems (LHS) for pandemic preparedness. Comput Struct Biotechnol J 2024; 24:412-419. [PMID: 38831762 PMCID: PMC11145382 DOI: 10.1016/j.csbj.2024.05.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/07/2024] [Accepted: 05/07/2024] [Indexed: 06/05/2024] Open
Abstract
In anticipation of potential future pandemics, we examined the challenges and opportunities presented by the COVID-19 outbreak. This analysis highlights how artificial intelligence (AI) and predictive models can support both patients and clinicians in managing subsequent infectious diseases, and how legislators and policymakers could support these efforts, to bring learning healthcare system (LHS) from guidelines to real-world implementation. This report chronicles the trajectory of the COVID-19 pandemic, emphasizing the diverse data sets generated throughout its course. We propose strategies for harnessing this data via AI and predictive modelling to enhance the functioning of LHS. The challenges faced by patients and healthcare systems around the world during this unprecedented crisis could have been mitigated with an informed and timely adoption of the three pillars of the LHS: Knowledge, Data and Practice. By harnessing AI and predictive analytics, we can develop tools that not only detect potential pandemic-prone diseases early on but also assist in patient management, provide decision support, offer treatment recommendations, deliver patient outcome triage, predict post-recovery long-term disease impacts, monitor viral mutations and variant emergence, and assess vaccine and treatment efficacy in real-time. A patient-centric approach remains paramount, ensuring patients are both informed and actively involved in disease mitigation strategies.
Collapse
Affiliation(s)
- Anshu Ankolekar
- Department of Precision Medicine, GROW School for Oncology, Maastricht University Medical Centre+, Maastricht, the Netherlands
| | - Lisanne Eppings
- Department of Precision Medicine, GROW School for Oncology, Maastricht University Medical Centre+, Maastricht, the Netherlands
| | | | | | | | | | - Yang Nan
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Xiaodan Xing
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Simon LF Walsh
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Wim Vos
- Radiomics (Oncoradiomics SA), Liege, Belgium
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
- Bioengineering Department and I-X, Imperial College London, London, United Kingdom
| | - Philippe Lambin
- Department of Precision Medicine, GROW School for Oncology, Maastricht University Medical Centre+, Maastricht, the Netherlands
| |
Collapse
|
2
|
Kong F, Wang X, Xiang J, Yang S, Wang X, Yue M, Zhang J, Zhao J, Han X, Dong Y, Zhu B, Wang F, Liu Y. Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading. Comput Struct Biotechnol J 2024; 23:1439-1449. [PMID: 38623561 PMCID: PMC11016961 DOI: 10.1016/j.csbj.2024.03.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 03/29/2024] [Accepted: 03/29/2024] [Indexed: 04/17/2024] Open
Abstract
Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data. This study introduces a federated attention-consistent learning (FACL) framework to address challenges associated with large-scale pathological images and data heterogeneity. FACL enhances model generalization by maximizing attention consistency between local clients and the server model. To ensure privacy and validate robustness, we incorporated differential privacy by introducing noise during parameter transfer. We assessed the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole-slide images of prostate cancer from multiple centers. In the diagnosis task, FACL achieved an area under the curve (AUC) of 0.9718, outperforming seven centers with an average AUC of 0.9499 when categories are relatively balanced. For the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards.
Collapse
Affiliation(s)
- Fei Kong
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xiyue Wang
- College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China
| | | | - Sen Yang
- AI Lab, Tencent, Shenzhen, 518057, China
| | - Xinran Wang
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, 050035, China
| | - Meng Yue
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, 050035, China
| | - Jun Zhang
- AI Lab, Tencent, Shenzhen, 518057, China
| | - Junhan Zhao
- Massachusetts General Hospital, Boston, MA, 02114, United States
- Harvard T.H. Chan School of Public Health, Boston, MA, 02115, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, United States
| | - Xiao Han
- AI Lab, Tencent, Shenzhen, 518057, China
| | - Yuhan Dong
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Biyue Zhu
- Department of Pharmacy, Children's Hospital of Chongqing Medical University, Chongqing, 400014, China
| | - Fang Wang
- Department of Pathology, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai, 264000, China
| | - Yueping Liu
- Department of Pathology, The Fourth Hospital of Hebei Medical University, Shijiazhuang, 050035, China
| |
Collapse
|
3
|
Qiu W, Quan C, Zhu L, Yu Y, Wang Z, Ma Y, Sun M, Chang Y, Qian K, Hu B, Yamamoto Y, Schuller BW. Heart Sound Abnormality Detection From Multi-Institutional Collaboration: Introducing a Federated Learning Framework. IEEE Trans Biomed Eng 2024; 71:2802-2813. [PMID: 38700959 DOI: 10.1109/tbme.2024.3393557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
OBJECTIVE Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. METHODS In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their "black-box" property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. CONCLUSION Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. SIGNIFICANCE This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.
Collapse
|
4
|
Tang T, Han Z, Cai Z, Yu S, Zhou X, Oseni T, Das SK. Personalized Federated Graph Learning on Non-IID Electronic Health Records. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11843-11856. [PMID: 38502617 DOI: 10.1109/tnnls.2024.3370297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Understanding the latent disease patterns embedded in electronic health records (EHRs) is crucial for making precise and proactive healthcare decisions. Federated graph learning-based methods are commonly employed to extract complex disease patterns from the distributed EHRs without sharing the client-side raw data. However, the intrinsic characteristics of the distributed EHRs are typically non-independent and identically distributed (Non-IID), significantly bringing challenges related to data imbalance and leading to a notable decrease in the effectiveness of making healthcare decisions derived from the global model. To address these challenges, we introduce a novel personalized federated learning framework named PEARL, which is designed for disease prediction on Non-IID EHRs. Specifically, PEARL incorporates disease diagnostic code attention and admission record attention to extract patient embeddings from all EHRs. Then, PEARL integrates self-supervised learning into a federated learning framework to train a global model for hierarchical disease prediction. To improve the performance of the client model, we further introduce a fine-tuning scheme to personalize the global model using local EHRs. During the global model updating process, a differential privacy (DP) scheme is implemented, providing a high-level privacy guarantee. Extensive experiments conducted on the real-world MIMIC-III dataset validate the effectiveness of PEARL, demonstrating competitive results when compared with baselines.
Collapse
|
5
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
6
|
Du H, Xu J, Du Z, Chen L, Ma S, Wei D, Wang X. MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records. Interdiscip Sci 2024; 16:489-502. [PMID: 38578388 PMCID: PMC11289171 DOI: 10.1007/s12539-024-00624-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/13/2024] [Accepted: 02/25/2024] [Indexed: 04/06/2024]
Abstract
To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER .
Collapse
Affiliation(s)
- Haoze Du
- Department of Computer Science, North Carolina State University, Raleigh, NC, 27695, USA
| | - Jiahao Xu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zhiyong Du
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
| | - Lihui Chen
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Shaohui Ma
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
| | - Dongqing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, 200240, China.
- Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiaotong University, Shanghai, 200240, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Nanyang, 473000, China.
| | - Xianfang Wang
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China.
| |
Collapse
|
7
|
Kim C, Kwon JM, Lee J, Jo H, Gwon D, Jang JH, Sung MK, Park SW, Kim C, Oh MY. Deep learning model integrating radiologic and clinical data to predict mortality after ischemic stroke. Heliyon 2024; 10:e31000. [PMID: 38826743 PMCID: PMC11141274 DOI: 10.1016/j.heliyon.2024.e31000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 06/04/2024] Open
Abstract
Objective Most prognostic indexes for ischemic stroke mortality lack radiologic information. We aimed to create and validate a deep learning-based mortality prediction model using brain diffusion weighted imaging (DWI), apparent diffusion coefficient (ADC), and clinical factors. Methods Data from patients with ischemic stroke who admitted to tertiary hospital during acute periods from 2013 to 2019 were collected and split into training (n = 1109), validation (n = 437), and internal test (n = 654). Data from patients from secondary cardiovascular center was used for external test set (n = 507). The algorithm for predicting mortality, based on DWI and ADC (DLP_DWI), was initially trained. Subsequently, important clinical factors were integrated into this model to create the integrated model (DLP_INTG). The performance of DLP_DWI and DLP_INTG was evaluated by using time-dependent area under the receiver operating characteristic curves (TD AUCs) and Harrell concordance index (C-index) at one-year mortality. Results The TD AUC of DLP_DWI was 0.643 in internal test set, and 0.785 in the external dataset. DLP_INTG had a higher performance at predicting one-year mortality than premise score in internal dataset (TD- AUC: 0.859 vs. 0.746; p = 0.046), and in external dataset (TD- AUC: 0.876 vs. 0.808; p = 0.007). DLP_DWI and DLP_INTG exhibited strong discrimination for the high-risk group for one-year mortality. Interpretation A deep learning model using brain DWI, ADC and the clinical factors was capable of predicting mortality in patients with ischemic stroke.
Collapse
Affiliation(s)
- Changi Kim
- Department of Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Joon-myoung Kwon
- Medical Research Team, Medical AI Inc, DC, USA
- Department of Critical Care Emergency Medicine, Incheon Sejong Hospital, Incheon, Republic of Korea
- Artificial Intelligence and Big Data Research Center, Sejong Medical Research Institute, Bucheon, Republic of Korea
| | - Jiyeong Lee
- Department of Neurology, Bucheon Sejong Hospital, Bucheon, Republic of Korea
| | | | - Dowan Gwon
- Department of Digital&Biohealth, Group of AI/DX Business, KT, Seoul, Republic of Korea
| | - Jae Hoon Jang
- Department of Family Medicine, College of Medicine, KyungHee University, Seoul, Republic of Korea
| | - Min Kyu Sung
- Department of Family Medicine, College of Medicine, KyungHee University, Seoul, Republic of Korea
| | - Sang Won Park
- Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon, Republic of Korea
- Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon, Republic of Korea
| | - Chulho Kim
- Department of Neurology, Hallym University College of Medicine, Chuncheon, Republic of Korea
| | - Mi-Young Oh
- Department of Neurology, Bucheon Sejong Hospital, Bucheon, Republic of Korea
| |
Collapse
|
8
|
Vo VTT, Shin TH, Yang HJ, Kang SR, Kim SH. A comparison between centralized and asynchronous federated learning approaches for survival outcome prediction using clinical and PET data from non-small cell lung cancer patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 248:108104. [PMID: 38457959 DOI: 10.1016/j.cmpb.2024.108104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 03/10/2024]
Abstract
BACKGROUND AND OBJECTIVE Survival analysis plays an essential role in the medical field for optimal treatment decision-making. Recently, survival analysis based on the deep learning (DL) approach has been proposed and is demonstrating promising results. However, developing an ideal prediction model requires integrating large datasets across multiple institutions, which poses challenges concerning medical data privacy. METHODS In this paper, we propose FedSurv, an asynchronous federated learning (FL) framework designed to predict survival time using clinical information and positron emission tomography (PET)-based features. This study used two datasets: a public radiogenic dataset of non-small cell lung cancer (NSCLC) from the Cancer Imaging Archive (RNSCLC), and an in-house dataset from the Chonnam National University Hwasun Hospital (CNUHH) in South Korea, consisting of clinical risk factors and F-18 fluorodeoxyglucose (FDG) PET images in NSCLC patients. Initially, each dataset was divided into multiple clients according to histological attributes, and each client was trained using the proposed DL model to predict individual survival time. The FL framework collected weights and parameters from the clients, which were then incorporated into the global model. Finally, the global model aggregated all weights and parameters and redistributed the updated model weights to each client. We evaluated different frameworks including single-client-based approach, centralized learning and FL. RESULTS We evaluated our method on two independent datasets. First, on the RNSCLC dataset, the mean absolute error (MAE) was 490.80±22.95 d and the C-Index was 0.69±0.01. Second, on the CNUHH dataset, the MAE was 494.25±40.16 d and the C-Index was 0.71±0.01. The FL approach achieved centralized method performance in PET-based survival time prediction and outperformed single-client-based approaches. CONCLUSIONS Our results demonstrated the feasibility and effectiveness of employing FL for individual survival prediction in NSCLC patients, using clinical information and PET-based features.
Collapse
Affiliation(s)
- Vi Thi-Tuong Vo
- Department of Artificial Intelligence Convergence, Chonnam National University, Gwangju, 61186, South Korea
| | - Tae-Ho Shin
- Interdisciplinary Program of Information Security, Chonnam National University, Gwangju, 61186, South Korea
| | - Hyung-Jeong Yang
- Department of Artificial Intelligence Convergence, Chonnam National University, Gwangju, 61186, South Korea
| | - Sae-Ryung Kang
- Department of Nuclear Medicine, Chonnam National University Hwasun Hospital and Medical School, Hwasun, 58128, South Korea.
| | - Soo-Hyung Kim
- Department of Artificial Intelligence Convergence, Chonnam National University, Gwangju, 61186, South Korea.
| |
Collapse
|
9
|
Fu S, Jia H, Vassilaki M, Keloth VK, Dang Y, Zhou Y, Garg M, Petersen RC, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. J Biomed Inform 2024; 152:104623. [PMID: 38458578 PMCID: PMC11005095 DOI: 10.1016/j.jbi.2024.104623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/12/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Abstract
INTRODUCTION Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.
Collapse
Affiliation(s)
- Sunyang Fu
- Mayo Clinic, Rochester, MN, United States; University of Texas Health Science Center, Houston, TX, United States.
| | - Heling Jia
- Mayo Clinic, Rochester, MN, United States.
| | | | | | - Yifang Dang
- University of Texas Health Science Center, Houston, TX, United States.
| | - Yujia Zhou
- University of Texas Health Science Center, Houston, TX, United States.
| | | | | | | | | | - Liwei Wang
- Mayo Clinic, Rochester, MN, United States.
| | - Andrew Wen
- University of Texas Health Science Center, Houston, TX, United States.
| | - Fang Li
- University of Texas Health Science Center, Houston, TX, United States.
| | - Hua Xu
- Yale University, New Haven, CT, United States.
| | - Cui Tao
- University of Texas Health Science Center, Houston, TX, United States.
| | | | - Hongfang Liu
- Mayo Clinic, Rochester, MN, United States; University of Texas Health Science Center, Houston, TX, United States.
| | | |
Collapse
|
10
|
Bazarov Ravshan Ugli D, Mohammed AFY, Na T, Lee J. Deep Reinforcement Learning-Empowered Cost-Effective Federated Video Surveillance Management Framework. SENSORS (BASEL, SWITZERLAND) 2024; 24:2158. [PMID: 38610369 PMCID: PMC11014212 DOI: 10.3390/s24072158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/16/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024]
Abstract
Video surveillance systems are integral to bolstering safety and security across multiple settings. With the advent of deep learning (DL), a specialization within machine learning (ML), these systems have been significantly augmented to facilitate DL-based video surveillance services with notable precision. Nevertheless, DL-based video surveillance services, which necessitate the tracking of object movement and motion tracking (e.g., to identify unusual object behaviors), can demand a significant portion of computational and memory resources. This includes utilizing GPU computing power for model inference and allocating GPU memory for model loading. To tackle the computational demands inherent in DL-based video surveillance, this study introduces a novel video surveillance management system designed to optimize operational efficiency. At its core, the system is built on a two-tiered edge computing architecture (i.e., client and server through socket transmission). In this architecture, the primary edge (i.e., client side) handles the initial processing tasks, such as object detection, and is connected via a Universal Serial Bus (USB) cable to the Closed-Circuit Television (CCTV) camera, directly at the source of the video feed. This immediate processing reduces the latency of data transfer by detecting objects in real time. Meanwhile, the secondary edge (i.e., server side) plays a vital role by hosting a dynamically controlling threshold module targeted at releasing DL-based models, reducing needless GPU usage. This module is a novel addition that dynamically adjusts the threshold time value required to release DL models. By dynamically optimizing this threshold, the system can effectively manage GPU usage, ensuring resources are allocated efficiently. Moreover, we utilize federated learning (FL) to streamline the training of a Long Short-Term Memory (LSTM) network for predicting imminent object appearances by amalgamating data from diverse camera sources while ensuring data privacy and optimized resource allocation. Furthermore, in contrast to the static threshold values or moving average techniques used in previous approaches for the controlling threshold module, we employ a Deep Q-Network (DQN) methodology to manage threshold values dynamically. This approach efficiently balances the trade-off between GPU memory conservation and the reloading latency of the DL model, which is enabled by incorporating LSTM-derived predictions as inputs to determine the optimal timing for releasing the DL model. The results highlight the potential of our approach to significantly improve the efficiency and effective usage of computational resources in video surveillance systems, opening the door to enhanced security in various domains.
Collapse
Affiliation(s)
| | - Alaelddin F. Y. Mohammed
- Department of International Studies, Dongshin University, 67, Dongshindae-gil, Naju-si 58245, Republic of Korea;
| | - Taeheum Na
- Electronics and Telecommunications Research Institute (ETRI), Yuseong-gu, Daejeon 34129, Republic of Korea;
| | - Joohyung Lee
- Department of Computing, Gachon University, Seongnam-si 13120, Republic of Korea;
| |
Collapse
|
11
|
Mitrovska A, Safari P, Ritter K, Shariati B, Fischer JK. Secure federated learning for Alzheimer's disease detection. Front Aging Neurosci 2024; 16:1324032. [PMID: 38515517 PMCID: PMC10954782 DOI: 10.3389/fnagi.2024.1324032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/22/2024] [Indexed: 03/23/2024] Open
Abstract
Machine Learning (ML) is considered a promising tool to aid and accelerate diagnosis in various medical areas, including neuroimaging. However, its success is set back by the lack of large-scale public datasets. Indeed, medical institutions possess a large amount of data; however, open-sourcing is prevented by the legal requirements to protect the patient's privacy. Federated Learning (FL) is a viable alternative that can overcome this issue. This work proposes training an ML model for Alzheimer's Disease (AD) detection based on structural MRI (sMRI) data in a federated setting. We implement two aggregation algorithms, Federated Averaging (FedAvg) and Secure Aggregation (SecAgg), and compare their performance with the centralized ML model training. We simulate heterogeneous environments and explore the impact of demographical (sex, age, and diagnosis) and imbalanced data distributions. The simulated heterogeneous environments allow us to observe these statistical differences' effect on the ML models trained using FL and highlight the importance of studying such differences when training ML models for AD detection. Moreover, as part of the evaluation, we demonstrate the increased privacy guarantees of FL with SecAgg via simulated membership inference attacks.
Collapse
Affiliation(s)
- Angela Mitrovska
- Fraunhofer-Institut fur Nachrichtentechnik, Heinrich-Hertz-Institute (HHI), Berlin, Germany
- Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Pooyan Safari
- Fraunhofer-Institut fur Nachrichtentechnik, Heinrich-Hertz-Institute (HHI), Berlin, Germany
| | - Kerstin Ritter
- Bernstein Center for Computational Neuroscience, Berlin, Germany
- Department of Psychiatry and Psychotherapy, Charite – Universitatsmedizin Berlin (corporate member of Freie Universitat Berlin, Humboldt-Universitat zu Berlin, and Berlin Institute of Health), Berlin, Germany
| | - Behnam Shariati
- Fraunhofer-Institut fur Nachrichtentechnik, Heinrich-Hertz-Institute (HHI), Berlin, Germany
| | - Johannes Karl Fischer
- Fraunhofer-Institut fur Nachrichtentechnik, Heinrich-Hertz-Institute (HHI), Berlin, Germany
| |
Collapse
|
12
|
Chweidan H, Rudyuk N, Tzur D, Goldstein C, Almoznino G. Statistical Methods and Machine Learning Algorithms for Investigating Metabolic Syndrome in Temporomandibular Disorders: A Nationwide Study. Bioengineering (Basel) 2024; 11:134. [PMID: 38391620 PMCID: PMC10886027 DOI: 10.3390/bioengineering11020134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/24/2024] Open
Abstract
The objective of this study was to analyze the associations between temporomandibular disorders (TMDs) and metabolic syndrome (MetS) components, consequences, and related conditions. This research analyzed data from the Dental, Oral, Medical Epidemiological (DOME) records-based study which integrated comprehensive socio-demographic, medical, and dental databases from a nationwide sample of dental attendees aged 18-50 years at military dental clinics for 1 year. Statistical and machine learning models were performed with TMDs as the dependent variable. The independent variables included age, sex, smoking, each of the MetS components, and consequences and related conditions, including hypertension, hyperlipidemia, diabetes, impaired glucose tolerance (IGT), obesity, cardiac disease, obstructive sleep apnea (OSA), nonalcoholic fatty liver disease (NAFLD), transient ischemic attack (TIA), stroke, deep venous thrombosis (DVT), and anemia. The study included 132,529 subjects, of which 1899 (1.43%) had been diagnosed with TMDs. The following parameters retained a statistically significant positive association with TMDs in the multivariable binary logistic regression analysis: female sex [OR = 2.65 (2.41-2.93)], anemia [OR = 1.69 (1.48-1.93)], and age [OR = 1.07 (1.06-1.08)]. Features importance generated by the XGBoost machine learning algorithm ranked the significance of the features with TMDs (the target variable) as follows: sex was ranked first followed by age (second), anemia (third), hypertension (fourth), and smoking (fifth). Metabolic morbidity and anemia should be included in the systemic evaluation of TMD patients.
Collapse
Affiliation(s)
- Harry Chweidan
- Department of Prosthodontics, Oral and Maxillofacial Center, Israel Defense Forces, Medical Corps, Tel-Hashomer, Ramat Gan 02149, Israel
| | - Nikolay Rudyuk
- Department of Prosthodontics, Oral and Maxillofacial Center, Israel Defense Forces, Medical Corps, Tel-Hashomer, Ramat Gan 02149, Israel
| | - Dorit Tzur
- Medical Information Department, General Surgeon Headquarters, Israel Defense Forces, Medical Corps, Tel-Hashomer, Ramat Gan 02149, Israel
| | - Chen Goldstein
- Big Biomedical Data Research Laboratory, Dean's Office, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | - Galit Almoznino
- Big Biomedical Data Research Laboratory, Dean's Office, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
- Department of Oral Medicine, Sedation & Maxillofacial Imaging, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| |
Collapse
|
13
|
Goldstein A, Shahar Y, Weisman Raymond M, Peleg H, Ben-Chetrit E, Ben-Yehuda A, Shalom E, Goldstein C, Shiloh SS, Almoznino G. Multi-Dimensional Validation of the Integration of Syntactic and Semantic Distance Measures for Clustering Fibromyalgia Patients in the Rheumatic Monitor Big Data Study. Bioengineering (Basel) 2024; 11:97. [PMID: 38275577 PMCID: PMC10813477 DOI: 10.3390/bioengineering11010097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/28/2023] [Accepted: 01/11/2024] [Indexed: 01/27/2024] Open
Abstract
This study primarily aimed at developing a novel multi-dimensional methodology to discover and validate the optimal number of clusters. The secondary objective was to deploy it for the task of clustering fibromyalgia patients. We present a comprehensive methodology that includes the use of several different clustering algorithms, quality assessment using several syntactic distance measures (the Silhouette Index (SI), Calinski-Harabasz index (CHI), and Davies-Bouldin index (DBI)), stability assessment using the adjusted Rand index (ARI), and the validation of the internal semantic consistency of each clustering option via the performance of multiple clustering iterations after the repeated bagging of the data to select multiple partial data sets. Then, we perform a statistical analysis of the (clinical) semantics of the most stable clustering options using the full data set. Finally, the results are validated through a supervised machine learning (ML) model that classifies the patients back into the discovered clusters and is interpreted by calculating the Shapley additive explanations (SHAP) values of the model. Thus, we refer to our methodology as the clustering, distance measures and iterative statistical and semantic validation (CDI-SSV) methodology. We applied our method to the analysis of a comprehensive data set acquired from 1370 fibromyalgia patients. The results demonstrate that the K-means was highly robust in the syntactic and the internal consistent semantics analysis phases and was therefore followed by a semantic assessment to determine the optimal number of clusters (k), which suggested k = 3 as a more clinically meaningful solution, representing three distinct severity levels. the random forest model validated the results by classification into the discovered clusters with high accuracy (AUC: 0.994; accuracy: 0.946). SHAP analysis emphasized the clinical relevance of "functional problems" in distinguishing the most severe condition. In conclusion, the CDI-SSV methodology offers significant potential for improving the classification of complex patients. Our findings suggest a classification system for different profiles of fibromyalgia patients, which has the potential to improve clinical care, by providing clinical markers for the evidence-based personalized diagnosis, management, and prognosis of fibromyalgia patients.
Collapse
Affiliation(s)
- Ayelet Goldstein
- Computer Science Department, Hadassah Academic College, Jerusalem 9101001, Israel;
| | - Yuval Shahar
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel; (Y.S.)
| | - Michal Weisman Raymond
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel; (Y.S.)
| | - Hagit Peleg
- Rheumatology Unit, Hadassah Medical Center, Jerusalem 9112102, Israel
| | - Eldad Ben-Chetrit
- Rheumatology Unit, Hadassah Medical Center, Jerusalem 9112102, Israel
| | - Arie Ben-Yehuda
- Division of Internal Medicine, Hadassah Medical Center, Jerusalem 9112102, Israel
| | - Erez Shalom
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel; (Y.S.)
| | - Chen Goldstein
- Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel
| | - Shmuel Shay Shiloh
- Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel
| | - Galit Almoznino
- Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel
- Department of Oral Medicine, Sedation & Maxillofacial Imaging, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| |
Collapse
|
14
|
Jiang S, Li Y, Firouzi F, Chakrabarty K. Federated clustered multi-domain learning for health monitoring. Sci Rep 2024; 14:903. [PMID: 38195834 PMCID: PMC10776721 DOI: 10.1038/s41598-024-51344-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 01/03/2024] [Indexed: 01/11/2024] Open
Abstract
Wearable Internet of Things (WIoT) and Artificial Intelligence (AI) are rapidly emerging technologies for healthcare. These technologies enable seamless data collection and precise analysis toward fast, resource-abundant, and personalized patient care. However, conventional machine learning workflow requires data to be transferred to the remote cloud server, which leads to significant privacy concerns. To tackle this problem, researchers have proposed federated learning, where end-point users collaboratively learn a shared model without sharing local data. However, data heterogeneity, i.e., variations in data distributions within a client (intra-client) or across clients (inter-client), degrades the performance of federated learning. Existing state-of-the-art methods mainly consider inter-client data heterogeneity, whereas intra-client variations have not received much attention. To address intra-client variations in federated learning, we propose a federated clustered multi-domain learning algorithm based on ClusterGAN, multi-domain learning, and graph neural networks. We applied the proposed algorithm to a case study on stress-level prediction, and our proposed algorithm outperforms two state-of-the-art methods by 4.4% in accuracy and 0.06 in the F1 score. In addition, we demonstrate the effectiveness of the proposed algorithm by investigating variants of its different modules.
Collapse
Affiliation(s)
- Shiyi Jiang
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA.
| | - Yuan Li
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
- Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan, 215316, Jiangsu, China
| | - Farshad Firouzi
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85281, USA
| | - Krishnendu Chakrabarty
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85281, USA
| |
Collapse
|
15
|
Tan AZ, Yu H, Cui L, Yang Q. Towards Personalized Federated Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9587-9603. [PMID: 35344498 DOI: 10.1109/tnnls.2022.3160699] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In parallel with the rapid adoption of artificial intelligence (AI) empowered by advances in AI research, there has been growing awareness and concerns of data privacy. Recent significant developments in the data regulation landscape have prompted a seismic shift in interest toward privacy-preserving AI. This has contributed to the popularity of Federated Learning (FL), the leading paradigm for the training of machine learning models on data silos in a privacy-preserving manner. In this survey, we explore the domain of personalized FL (PFL) to address the fundamental challenges of FL on heterogeneous data, a universal characteristic inherent in all real-world datasets. We analyze the key motivations for PFL and present a unique taxonomy of PFL techniques categorized according to the key challenges and personalization strategies in PFL. We highlight their key ideas, challenges, opportunities, and envision promising future trajectories of research toward a new PFL architectural design, realistic PFL benchmarking, and trustworthy PFL approaches.
Collapse
|
16
|
Liu Y, Bi D. Quantitative risk analysis of treatment plans for patients with tumor by mining historical similar patients from electronic health records using federated learning. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2023; 43:2422-2449. [PMID: 36906293 DOI: 10.1111/risa.14124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 12/11/2022] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
The determination of a treatment plan for a target patient with tumor is a difficult problem due to the existence of heterogeneity in patients' responses, incomplete information about tumor states, and asymmetric knowledge between doctors and patients, and so on. In this paper, a method for quantitative risk analysis of treatment plans for patients with tumor is proposed. To reduce the impacts of the heterogeneity in patients' responses on analysis results, the method conducts risk analysis by mining historical similar patients from Electronic Health Records (EHRs) in multiple hospitals using federated learning (FL). For this, the Recursive Feature Elimination based on the Support Vector Machine (SVM) and Deep Learning Important FeaTures (DeepLIFT) are extended into the FL framework to select key features and determine key feature weights for identifying historical similar patients. Then, in the database of each collaborative hospital, the similarities between the target patient and all historical patients are calculated, and the historical similar patients are determined. According to the statistics of tumor states and treatment outcomes of historical similar patients in all collaborative hospitals, the related data (including the probabilities of different tumor states and possible outcomes of different treatment plans) for risk analysis of the alternative treatment plans can be obtained, which can eliminate the asymmetric knowledge between doctors and patients. The related data are valuable for the doctor and patient to make their decisions. Experimental studies have been conducted to verify the feasibility and effectiveness of the proposed method.
Collapse
Affiliation(s)
- Yang Liu
- School of Economics and Management, Dalian University of Technology, Dalian, China
| | - Donghai Bi
- School of Economics and Management, Dalian University of Technology, Dalian, China
| |
Collapse
|
17
|
Elhussein A, Megjhani M, Nametz D, Weiss M, Savarraj J, Kwon SB, Roh DJ, Agarwal S, Sander Connolly E, Velazquez A, Claassen J, Choi HA, Schubert GA, Park S, Gürsoy G. A generalizable physiological model for detection of Delayed Cerebral Ischemia using Federated Learning. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2023; 2023:1886-1889. [PMID: 38389717 PMCID: PMC10883332 DOI: 10.1109/bibm58861.2023.10385383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Delayed cerebral ischemia (DCI) is a complication seen in patients with subarachnoid hemorrhage stroke. It is a major predictor of poor outcomes and is detected late. Machine learning models are shown to be useful for early detection, however training such models suffers from small sample sizes due to rarity of the condition. Here we propose a Federated Learning approach to train a DCI classifier across three institutions to overcome challenges of sharing data across hospitals. We developed a framework for federated feature selection and built a federated ensemble classifier. We compared the performance of FL model to that obtained by training separate models at each site. FL significantly improved performance at only two sites. We found that this was due to feature distribution differences across sites. FL improves performance in sites with similar feature distributions, however, FL can worsen performance in sites with heterogeneous distributions. The results highlight both the benefit of FL and the need to assess dataset distribution similarity before conducting FL.
Collapse
Affiliation(s)
- Ahmed Elhussein
- Department of Biomedical Informatics, Columbia University, New York Genome Center, New York, NY, USA
| | - Murad Megjhani
- Department of Neurology, Columbia University, New York, NY, USA
| | - Daniel Nametz
- Department of Neurology, Columbia University, New York, NY, USA
| | - Miriam Weiss
- Department of Neurosurgery, RWTH Aachen University, Aachen, Germany
| | - Jude Savarraj
- Department of Neurology, UT Health, Houston, TX, USA
| | - Soon Bin Kwon
- Department of Neurology, Columbia University, New York, NY, USA
| | - David J. Roh
- Department of Neurology, Columbia University, New York, NY, USA
| | - Sachin Agarwal
- Department of Neurology, Columbia University, New York, NY, USA
| | | | | | - Jan Claassen
- Department of Neurology, Columbia University, New York, NY, USA
| | - Huimahn A. Choi
- Department of Neurosurgery, RWTH Aachen University, Aachen, Germany
| | | | - Soojin Park
- Department of Neurology, Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Department of Computer Science, Columbia University, New York Genome Center, New York, NY, USA
| |
Collapse
|
18
|
Li S, Liu P, Nascimento GG, Wang X, Leite FRM, Chakraborty B, Hong C, Ning Y, Xie F, Teo ZL, Ting DSW, Haddadi H, Ong MEH, Peres MA, Liu N. Federated and distributed learning applications for electronic health records and structured medical data: a scoping review. J Am Med Inform Assoc 2023; 30:2041-2049. [PMID: 37639629 PMCID: PMC10654866 DOI: 10.1093/jamia/ocad170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/19/2023] [Indexed: 08/31/2023] Open
Abstract
OBJECTIVES Federated learning (FL) has gained popularity in clinical research in recent years to facilitate privacy-preserving collaboration. Structured data, one of the most prevalent forms of clinical data, has experienced significant growth in volume concurrently, notably with the widespread adoption of electronic health records in clinical practice. This review examines FL applications on structured medical data, identifies contemporary limitations, and discusses potential innovations. MATERIALS AND METHODS We searched 5 databases, SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL, to identify articles that applied FL to structured medical data and reported results following the PRISMA guidelines. Each selected publication was evaluated from 3 primary perspectives, including data quality, modeling strategies, and FL frameworks. RESULTS Out of the 1193 papers screened, 34 met the inclusion criteria, with each article consisting of one or more studies that used FL to handle structured clinical/medical data. Of these, 24 utilized data acquired from electronic health records, with clinical predictions and association studies being the most common clinical research tasks that FL was applied to. Only one article exclusively explored the vertical FL setting, while the remaining 33 explored the horizontal FL setting, with only 14 discussing comparisons between single-site (local) and FL (global) analysis. CONCLUSIONS The existing FL applications on structured medical data lack sufficient evaluations of clinically meaningful benefits, particularly when compared to single-site analyses. Therefore, it is crucial for future FL applications to prioritize clinical motivations and develop designs and methodologies that can effectively support and aid clinical practice and research.
Collapse
Affiliation(s)
- Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Gustavo G Nascimento
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Xinru Wang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Fabio Renato Manzolli Leite
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Zhen Ling Teo
- Singapore National Eye Centre, Singapore, Singapore Eye Research Institute, Singapore 168751, Singapore
| | - Daniel Shu Wei Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Singapore National Eye Centre, Singapore, Singapore Eye Research Institute, Singapore 168751, Singapore
| | - Hamed Haddadi
- Department of Computing, Imperial College London, London SW7 2AZ, England, United Kingdom
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Department of Emergency Medicine, Singapore General Hospital, Singapore 169608, Singapore
| | - Marco Aurélio Peres
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore
| |
Collapse
|
19
|
Pirmani A, De Brouwer E, Geys L, Parciak T, Moreau Y, Peeters LM. The Journey of Data Within a Global Data Sharing Initiative: A Federated 3-Layer Data Analysis Pipeline to Scale Up Multiple Sclerosis Research. JMIR Med Inform 2023; 11:e48030. [PMID: 37943585 PMCID: PMC10667980 DOI: 10.2196/48030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/25/2023] [Accepted: 09/30/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Investigating low-prevalence diseases such as multiple sclerosis is challenging because of the rather small number of individuals affected by this disease and the scattering of real-world data across numerous data sources. These obstacles impair data integration, standardization, and analysis, which negatively impact the generation of significant meaningful clinical evidence. OBJECTIVE This study aims to present a comprehensive, research question-agnostic, multistakeholder-driven end-to-end data analysis pipeline that accommodates 3 prevalent data-sharing streams: individual data sharing, core data set sharing, and federated model sharing. METHODS A demand-driven methodology is employed for standardization, followed by 3 streams of data acquisition, a data quality enhancement process, a data integration procedure, and a concluding analysis stage to fulfill real-world data-sharing requirements. This pipeline's effectiveness was demonstrated through its successful implementation in the COVID-19 and multiple sclerosis global data sharing initiative. RESULTS The global data sharing initiative yielded multiple scientific publications and provided extensive worldwide guidance for the community with multiple sclerosis. The pipeline facilitated gathering pertinent data from various sources, accommodating distinct sharing streams and assimilating them into a unified data set for subsequent statistical analysis or secure data examination. This pipeline contributed to the assembly of the largest data set of people with multiple sclerosis infected with COVID-19. CONCLUSIONS The proposed data analysis pipeline exemplifies the potential of global stakeholder collaboration and underlines the significance of evidence-based decision-making. It serves as a paradigm for how data sharing initiatives can propel advancements in health care, emphasizing its adaptability and capacity to address diverse research inquiries.
Collapse
Affiliation(s)
- Ashkan Pirmani
- ESAT, STADIUS, KU Leuven, Leuven, Belgium
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
- University Multiple Sclerosis Center, Hasselt University, Diepenbeek, Belgium
| | | | - Lotte Geys
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
- University Multiple Sclerosis Center, Hasselt University, Diepenbeek, Belgium
| | - Tina Parciak
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
- University Multiple Sclerosis Center, Hasselt University, Diepenbeek, Belgium
| | | | - Liesbet M Peeters
- Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
- University Multiple Sclerosis Center, Hasselt University, Diepenbeek, Belgium
| |
Collapse
|
20
|
Menegatti D, Giuseppi A, Delli Priscoli F, Pietrabissa A, Di Giorgio A, Baldisseri F, Mattioni M, Monaco S, Lanari L, Panfili M, Suraci V. CADUCEO: A Platform to Support Federated Healthcare Facilities through Artificial Intelligence. Healthcare (Basel) 2023; 11:2199. [PMID: 37570439 PMCID: PMC10418332 DOI: 10.3390/healthcare11152199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/26/2023] [Accepted: 07/31/2023] [Indexed: 08/13/2023] Open
Abstract
Data-driven algorithms have proven to be effective for a variety of medical tasks, including disease categorization and prediction, personalized medicine design, and imaging diagnostics. Although their performance is frequently on par with that of clinicians, their widespread use is constrained by a number of obstacles, including the requirement for high-quality data that are typical of the population, the difficulty of explaining how they operate, and ethical and regulatory concerns. The use of data augmentation and synthetic data generation methodologies, such as federated learning and explainable artificial intelligence ones, could provide a viable solution to the current issues, facilitating the widespread application of artificial intelligence algorithms in the clinical application domain and reducing the time needed for prevention, diagnosis, and prognosis by up to 70%. To this end, a novel AI-based functional framework is conceived and presented in this paper.
Collapse
Affiliation(s)
| | | | | | | | | | - Federico Baldisseri
- Department of Computer, Control and Management Engineering “Antonio Ruberti”, Sapienza University of Rome, Via Ariosto 25, 00185 Rome, Italy; (D.M.); (A.G.); (F.D.P.); (A.P.); (A.D.G.); (M.M.); (S.M.); (L.L.); (M.P.); (V.S.)
| | | | | | | | | | | |
Collapse
|
21
|
Tarumi S, Suzuki M, Yoshida H, Miyauchi S, Kurazume R. Personalized Federated Learning for Institutional Prediction Model using Electronic Health Records: A Covariate Adjustment Approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083200 DOI: 10.1109/embc40787.2023.10339940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Federated learning (FL) has attracted attention as a technology that allows multiple medical institutions to collaborate on AI without disclosing each other's patient data. However, FL has the challenge of being unable to robustly learn when the data of participating clients is non-independently and non-identically distributed (Non-IID). Personalized Federated Learning (PFL), which constructs a personalized model for each client, has been proposed as a solution to this problem. However, conventional PFL methods do not ensure the interpretability of personalization, specifically, the identification of which data samples are contributed to each personalized learning, which is important for AI in medical applications. In this study, we propose a novel PFL framework, Federated Adjustment of Covariate (FedCov), which acquires a propensity score model representing the covariate shift among clients through prior FL, then learns a final model by weighting the contribution of each training sample to PFL based on the estimated propensity score. This approach enables both the learning of personalized models through covariate adjustment and the visualization of the contribution of each client to PFL. FedCov was evaluated in the prediction of in-hospital mortality across 50 hospitals in the eICU Collaborative Research Database, achieving an ROC-AUC of 0.750. This result outperformed the AUCs in the 0.720-0.735 range achieved by conventional FL methods and was closest to the AUC of 0.754 achieved by centralized learning.Clinical Relevance- This study demonstrates the feasibility of providing sophisticated and personalized AI-driven clinical decision support to any medical institution through personalized federated learning.
Collapse
|
22
|
Poulain R, Tarek MFB, Beheshti R. Improving Fairness in AI Models on Electronic Health Records: The Case for Federated Learning Methods. FACCT '23 : PROCEEDINGS OF THE 2023 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY. ASSOCIATION FOR COMPUTING MACHINERY 2023; 2023:1599-1608. [PMID: 37990734 PMCID: PMC10661580 DOI: 10.1145/3593013.3594102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Developing AI tools that preserve fairness is of critical importance, specifically in high-stakes applications such as those in healthcare. However, health AI models' overall prediction performance is often prioritized over the possible biases such models could have. In this study, we show one possible approach to mitigate bias concerns by having healthcare institutions collaborate through a federated learning paradigm (FL; which is a popular choice in healthcare settings). While FL methods with an emphasis on fairness have been previously proposed, their underlying model and local implementation techniques, as well as their possible applications to the healthcare domain remain widely underinvestigated. Therefore, we propose a comprehensive FL approach with adversarial debiasing and a fair aggregation method, suitable to various fairness metrics, in the healthcare domain where electronic health records are used. Not only our approach explicitly mitigates bias as part of the optimization process, but an FL-based paradigm would also implicitly help with addressing data imbalance and increasing the data size, offering a practical solution for healthcare applications. We empirically demonstrate our method's superior performance on multiple experiments simulating large-scale real-world scenarios and compare it to several baselines. Our method has achieved promising fairness performance with the lowest impact on overall discrimination performance (accuracy). Our code is available at https://github.com/healthylaife/FairFedAvg.
Collapse
|
23
|
Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng 2023; 7:719-742. [PMID: 37380750 PMCID: PMC10632090 DOI: 10.1038/s41551-023-01056-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2023] [Indexed: 06/30/2023]
Abstract
In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.
Collapse
Affiliation(s)
- Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Judy J Wang
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sharifa Sahai
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
24
|
Manzoor HU, Khan AR, Flynn D, Alam MM, Akram M, Imran MA, Zoha A. FedBranched: Leveraging Federated Learning for Anomaly-Aware Load Forecasting in Energy Networks. SENSORS (BASEL, SWITZERLAND) 2023; 23:3570. [PMID: 37050631 PMCID: PMC10098660 DOI: 10.3390/s23073570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/16/2023] [Accepted: 03/27/2023] [Indexed: 06/19/2023]
Abstract
Increased demand for fast edge computation and privacy concerns have shifted researchers' focus towards a type of distributed learning known as federated learning (FL). Recently, much research has been carried out on FL; however, a major challenge is the need to tackle the high diversity in different clients. Our research shows that using highly diverse data sets in FL can lead to low accuracy of some local models, which can be categorised as anomalous behaviour. In this paper, we present FedBranched, a clustering-based framework that uses probabilistic methods to create branches of clients and assigns their respective global models. Branching is performed using hidden Markov model clustering (HMM), and a round of branching depends on the diversity of the data. Clustering is performed on Euclidean distances of mean absolute percentage errors (MAPE) obtained from each client at the end of pre-defined communication rounds. The proposed framework was implemented on substation-level energy data with nine clients for short-term load forecasting using an artificial neural network (ANN). FedBranched took two clustering rounds and resulted in two different branches having individual global models. The results show a substantial increase in the average MAPE of all clients; the biggest improvement of 11.36% was observed in one client.
Collapse
Affiliation(s)
- Habib Ullah Manzoor
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK
- Department of Electrical Engineering, University of Engineering and Technology, Lahore-Faisalabad Campus, Faisalabad 38000, Pakistan
| | - Ahsan Raza Khan
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK
| | - David Flynn
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK
| | - Muhammad Mahtab Alam
- Thomas Johann Seebeck Department of Electronics, Tallinn University of Technology, 19086 Tallinn, Estonia
| | - Muhammad Akram
- Department of Electrical Engineering, University of Engineering and Technology, Lahore-Faisalabad Campus, Faisalabad 38000, Pakistan
| | - Muhammad Ali Imran
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK
| | - Ahmed Zoha
- James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, UK
| |
Collapse
|
25
|
An efficient edge/cloud medical system for rapid detection of level of consciousness in emergency medicine based on explainable machine learning models. Neural Comput Appl 2023; 35:10695-10716. [PMID: 37155550 PMCID: PMC10015549 DOI: 10.1007/s00521-023-08258-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 01/06/2023] [Indexed: 03/17/2023]
Abstract
Emergency medicine (EM) is one of the attractive research fields in which researchers investigate their efforts to diagnose and treat unforeseen illnesses or injuries. There are many tests and observations are involved in EM. Detection of the level of consciousness is one of these observations, which can be detected using several methods. Among these methods, the automatic estimation of the Glasgow coma scale (GCS) is studied in this paper. The GCS is a medical score used to describe a patient’s level of consciousness. This type of scoring system requires medical examination that may not be available with the shortage of the medical expert. Therefore, the automatic medical calculation for a patient’s level of consciousness is highly needed. Artificial intelligence has been deployed in several applications and appears to have a high performance regarding providing automatic solutions. The main objective of this work is to introduce the edge/cloud system to improve the efficiency of the consciousness measurement through efficient local data processing. Moreover, an efficient machine learning (ML) model to predict the level of consciousness of a certain patient based on the patient’s demographic, vital signs, and laboratory tests is proposed, as well as maintaining the explainability issue using Shapley additive explanations (SHAP) that provides natural language explanation in a form that helps the medical expert to understand the final prediction. The developed ML model is validated using vital signs and laboratory tests extracted from the MIMIC III dataset, and it achieves superior performance (mean absolute error (MAE) = 0.269, mean square error (MSE) = 0.625, R2 score = 0.964). The resulting model is accurate, medically intuitive, and trustworthy.
Collapse
|
26
|
Qi T, Chen L, Li G, Li Y, Wang C. FedAGCN: A traffic flow prediction framework based on federated learning and Asynchronous Graph Convolutional Network. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
27
|
Dasaradharami Reddy K, Gadekallu TR. A Comprehensive Survey on Federated Learning Techniques for Healthcare Informatics. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:8393990. [PMID: 36909974 PMCID: PMC9995203 DOI: 10.1155/2023/8393990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/18/2022] [Accepted: 05/18/2022] [Indexed: 03/06/2023]
Abstract
Healthcare is predominantly regarded as a crucial consideration in promoting the general physical and mental health and well-being of people around the world. The amount of data generated by healthcare systems is enormous, making it challenging to manage. Many machine learning (ML) approaches were implemented to develop dependable and robust solutions to handle the data. ML cannot fully utilize data due to privacy concerns. This primarily happens in the case of medical data. Due to a lack of precise clinical data, the application of ML for the same is challenging and may not yield desired results. Federated learning (FL), which is a recent development in ML where the computation is offloaded to the source of data, appears to be a promising solution to this problem. In this study, we present a detailed survey of applications of FL for healthcare informatics. We initiate a discussion on the need for FL in the healthcare domain, followed by a review of recent review papers. We focus on the fundamentals of FL and the major motivations behind FL for healthcare applications. We then present the applications of FL along with recent state of the art in several verticals of healthcare. Then, lessons learned, open issues, and challenges that are yet to be solved are also highlighted. This is followed by future directions that give directions to the prospective researchers willing to do their research in this domain.
Collapse
Affiliation(s)
- K. Dasaradharami Reddy
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Thippa Reddy Gadekallu
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
28
|
Using machine learning on clinical data to identify unexpected patterns in groups of COVID-19 patients. Sci Rep 2023; 13:2236. [PMID: 36755135 PMCID: PMC9906583 DOI: 10.1038/s41598-022-26294-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 12/13/2022] [Indexed: 02/10/2023] Open
Abstract
As clinicians are faced with a deluge of clinical data, data science can play an important role in highlighting key features driving patient outcomes, aiding in the development of new clinical hypotheses. Insight derived from machine learning can serve as a clinical support tool by connecting care providers with reliable results from big data analysis that identify previously undetected clinical patterns. In this work, we show an example of collaboration between clinicians and data scientists during the COVID-19 pandemic, identifying sub-groups of COVID-19 patients with unanticipated outcomes or who are high-risk for severe disease or death. We apply a random forest classifier model to predict adverse patient outcomes early in the disease course, and we connect our classification results to unsupervised clustering of patient features that may underpin patient risk. The paradigm for using data science for hypothesis generation and clinical decision support, as well as our triaged classification approach and unsupervised clustering methods to determine patient cohorts, are applicable to driving rapid hypothesis generation and iteration in a variety of clinical challenges, including future public health crises.
Collapse
|
29
|
Song J, Chae S, Bowles KH, McDonald MV, Barrón Y, Cato K, Collins Rossetti S, Hobensack M, Sridharan S, Evans L, Davoudi A, Topaz M. The identification of clusters of risk factors and their association with hospitalizations or emergency department visits in home health care. J Adv Nurs 2023; 79:593-604. [PMID: 36414419 PMCID: PMC10163408 DOI: 10.1111/jan.15498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/30/2022] [Accepted: 10/31/2022] [Indexed: 11/24/2022]
Abstract
AIMS To identify clusters of risk factors in home health care and determine if the clusters are associated with hospitalizations or emergency department visits. DESIGN A retrospective cohort study. METHODS This study included 61,454 patients pertaining to 79,079 episodes receiving home health care between 2015 and 2017 from one of the largest home health care organizations in the United States. Potential risk factors were extracted from structured data and unstructured clinical notes analysed by natural language processing. A K-means cluster analysis was conducted. Kaplan-Meier analysis was conducted to identify the association between clusters and hospitalizations or emergency department visits during home health care. RESULTS A total of 11.6% of home health episodes resulted in hospitalizations or emergency department visits. Risk factors formed three clusters. Cluster 1 is characterized by a combination of risk factors related to "impaired physical comfort with pain," defined as situations where patients may experience increased pain. Cluster 2 is characterized by "high comorbidity burden" defined as multiple comorbidities or other risks for hospitalization (e.g., prior falls). Cluster 3 is characterized by "impaired cognitive/psychological and skin integrity" including dementia or skin ulcer. Compared to Cluster 1, the risk of hospitalizations or emergency department visits increased by 1.95 times for Cluster 2 and by 2.12 times for Cluster 3 (all p < .001). CONCLUSION Risk factors were clustered into three types describing distinct characteristics for hospitalizations or emergency department visits. Different combinations of risk factors affected the likelihood of these negative outcomes. IMPACT Cluster-based risk prediction models could be integrated into early warning systems to identify patients at risk for hospitalizations or emergency department visits leading to more timely, patient-centred care, ultimately preventing these events. PATIENT OR PUBLIC CONTRIBUTION There was no involvement of patients in developing the research question, determining the outcome measures, or implementing the study.
Collapse
Affiliation(s)
- Jiyoun Song
- Columbia University School of Nursing, New York City, New York, USA
| | - Sena Chae
- College of Nursing, The University of Iowa, Iowa City, Iowa, USA
| | - Kathryn H. Bowles
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
| | - Margaret V. McDonald
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
| | - Yolanda Barrón
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
| | - Kenrick Cato
- Columbia University School of Nursing, New York City, New York, USA
- Emergency Medicine, Columbia University Irving Medical Center, New York City, New York, USA
| | - Sarah Collins Rossetti
- Columbia University School of Nursing, New York City, New York, USA
- Department of Biomedical Informatics, Columbia University, New York City, New York, USA
| | - Mollie Hobensack
- Columbia University School of Nursing, New York City, New York, USA
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
| | - Lauren Evans
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
| | - Anahita Davoudi
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York City, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York City, USA
- Data Science Institute, Columbia University, New York City, New York, USA
| |
Collapse
|
30
|
Jiménez-Sánchez A, Tardy M, González Ballester MA, Mateus D, Piella G. Memory-aware curriculum federated learning for breast cancer classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 229:107318. [PMID: 36592580 DOI: 10.1016/j.cmpb.2022.107318] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 12/17/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND AND OBJECTIVE For early breast cancer detection, regular screening with mammography imaging is recommended. Routine examinations result in datasets with a predominant amount of negative samples. The limited representativeness of positive cases can be problematic for learning Computer-Aided Diagnosis (CAD) systems. Collecting data from multiple institutions is a potential solution to mitigate this problem. Recently, federated learning has emerged as an effective tool for collaborative learning. In this setting, local models perform computation on their private data to update the global model. The order and the frequency of local updates influence the final global model. In the context of federated adversarial learning to improve multi-site breast cancer classification, we investigate the role of the order in which samples are locally presented to the optimizers. METHODS We define a novel memory-aware curriculum learning method for the federated setting. We aim to improve the consistency of the local models penalizing inconsistent predictions, i.e., forgotten samples. Our curriculum controls the order of the training samples prioritizing those that are forgotten after the deployment of the global model. Our approach is combined with unsupervised domain adaptation to deal with domain shift while preserving data privacy. RESULTS Two classification metrics: area under the receiver operating characteristic curve (ROC-AUC) and area under the curve for the precision-recall curve (PR-AUC) are used to evaluate the performance of the proposed method. Our method is evaluated with three clinical datasets from different vendors. An ablation study showed the improvement of each component of our method. The AUC and PR-AUC are improved on average by 5% and 6%, respectively, compared to the conventional federated setting. CONCLUSIONS We demonstrated the benefits of curriculum learning for the first time in a federated setting. Our results verified the effectiveness of the memory-aware curriculum federated learning for the multi-site breast cancer classification. Our code is publicly available at: https://github.com/ameliajimenez/curriculum-federated-learning.
Collapse
Affiliation(s)
- Amelia Jiménez-Sánchez
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain; IT University of Copenhagen, Copenhagen, Denmark.
| | - Mickael Tardy
- École Centrale Nantes, LS2N, UMR 6004, Nantes, France; Hera-MI SAS, Nantes, France
| | - Miguel A González Ballester
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain; ICREA, Barcelona, Spain
| | - Diana Mateus
- École Centrale Nantes, LS2N, UMR 6004, Nantes, France
| | - Gemma Piella
- BCN MedTech, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
31
|
Carvalho RMS, Oliveira D, Pesquita C. Knowledge Graph Embeddings for ICU readmission prediction. BMC Med Inform Decis Mak 2023; 23:12. [PMID: 36658526 PMCID: PMC9850812 DOI: 10.1186/s12911-022-02070-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 11/28/2022] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Intensive Care Unit (ICU) readmissions represent both a health risk for patients,with increased mortality rates and overall health deterioration, and a financial burden for healthcare facilities. As healthcare became more data-driven with the introduction of Electronic Health Records (EHR), machine learning methods have been applied to predict ICU readmission risk. However, these methods disregard the meaning and relationships of data objects and work blindly over clinical data without taking into account scientific knowledge and context. Ontologies and Knowledge Graphs can help bridge this gap between data and scientific context, as they are computational artefacts that represent the entities of a domain and their relationships to each other in a formalized way. METHODS AND RESULTS We have developed an approach that enriches EHR data with semantic annotations to ontologies to build a Knowledge Graph. A patient's ICU stay is represented by Knowledge Graph embeddings in a contextualized manner, which are used by machine learning models to predict 30-days ICU readmissions. This approach is based on several contributions: (1) an enrichment of the MIMIC-III dataset with patient-oriented annotations to various biomedical ontologies; (2) a Knowledge Graph that defines patient data with biomedical ontologies; (3) a predictive model of ICU readmission risk that uses Knowledge Graph embeddings; (4) a variant of the predictive model that targets different time points during an ICU stay. Our predictive approaches outperformed both a baseline and state-of-the-art works achieving a mean Area Under the Receiver Operating Characteristic Curve of 0.827 and an Area Under the Precision-Recall Curve of 0.691. The application of this novel approach to help clinicians decide whether a patient can be discharged has the potential to prevent the readmission of [Formula: see text] of Intensive Care Unit patients, without unnecessarily prolonging the stay of those who would not require it. CONCLUSION The coupling of semantic annotation and Knowledge Graph embeddings affords two clear advantages: they consider scientific context and they are able to build representations of EHR information of different types in a common format. This work demonstrates the potential for impact that integrating ontologies and Knowledge Graphs into clinical machine learning applications can have.
Collapse
Affiliation(s)
- Ricardo M. S. Carvalho
- grid.9983.b0000 0001 2181 4263LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| | - Daniela Oliveira
- grid.9983.b0000 0001 2181 4263LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| | - Catia Pesquita
- grid.9983.b0000 0001 2181 4263LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| |
Collapse
|
32
|
Li Q, Wei X, Lin H, Liu Y, Chen T, Ma X. Inspecting the Running Process of Horizontal Federated Learning via Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4085-4100. [PMID: 33872152 DOI: 10.1109/tvcg.2021.3074010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As a decentralized training approach, horizontal federated learning (HFL) enables distributed clients to collaboratively learn a machine learning model while keeping personal/private information on local devices. Despite the enhanced performance and efficiency of HFL over local training, clues for inspecting the behaviors of the participating clients and the federated model are usually lacking due to the privacy-preserving nature of HFL. Consequently, the users can only conduct a shallow-level analysis of potential abnormal behaviors and have limited means to assess the contributions of individual clients and implement the necessary intervention. Visualization techniques have been introduced to facilitate the HFL process inspection, usually by providing model metrics and evaluation results as a dashboard representation. Although the existing visualization methods allow a simple examination of the HFL model performance, they cannot support the intensive exploration of the HFL process. In this article, strictly following the HFL privacy-preserving protocol, we design an exploratory visual analytics system for the HFL process termed HFLens, which supports comparative visual interpretation at the overview, communication round, and client instance levels. Specifically, the proposed system facilitates the investigation of the overall process involving all clients, the correlation analysis of clients' information in one or different communication round(s), the identification of potential anomalies, and the contribution assessment of each HFL client. Two case studies confirm the efficacy of our system. Experts' feedback suggests that our approach indeed helps in understanding and diagnosing the HFL process better.
Collapse
|
33
|
Kabra R, Israni S, Vijay B, Baru C, Mendu R, Fellman M, Sridhar A, Mason P, Cheung JW, DiBiase L, Mahapatra S, Kalifa J, Lubitz SA, Noseworthy PA, Navara R, McManus DD, Cohen M, Chung MK, Trayanova N, Gopinathannair R, Lakkireddy D. Emerging role of artificial intelligence in cardiac electrophysiology. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2022; 3:263-275. [PMID: 36589314 PMCID: PMC9795267 DOI: 10.1016/j.cvdhj.2022.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) have significantly impacted the field of cardiovascular medicine, especially cardiac electrophysiology (EP), on multiple fronts. The goal of this review is to familiarize readers with the field of AI and ML and their emerging role in EP. The current review is divided into 3 sections. In the first section, we discuss the definitions and basics of AI, ML, and big data. In the second section, we discuss their application to EP in the context of detection, prediction, and management of arrhythmias. Finally, we discuss the regulatory issues, challenges, and future directions of AI in EP.
Collapse
Affiliation(s)
- Rajesh Kabra
- Kansas City Heart Rhythm Institute, Kansas City, Kansas
| | - Sharat Israni
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California
| | | | - Chaitanya Baru
- San Diego Supercomputer Center, University of California, San Diego, San Diego, California
| | | | | | | | - Pamela Mason
- Department of Medicine, University of Virginia, Charlottesville, Virginia
| | - Jim W. Cheung
- Division of Cardiology, Department of Medicine, Weill Cornell Medicine, New York, New York
| | - Luigi DiBiase
- Albert Einstein College of Medicine at Montefiore Hospital, New York, New York
| | - Srijoy Mahapatra
- Department of Medicine, University of Minnesota, Minneapolis, Minnesota
| | - Jerome Kalifa
- Department of Cardiology, Brown University, Providence, Rhode Island
| | - Steven A. Lubitz
- Cardiac Arrhythmia Service, Massachusetts General Hospital, Boston, Massachusetts
| | | | - Rachita Navara
- Division of Cardiac Electrophysiology, University of California, San Francisco, San Francisco, California
| | - David D. McManus
- Department of Medicine, University of Massachusetts Chan Medical School, Worcester, Massachusetts
| | - Mitchell Cohen
- Division of Pediatric Cardiology, INOVA Children’s Hospital, Fairfax, Virginia
| | - Mina K. Chung
- Division of Cardiovascular Medicine, Cleveland Clinic, Cleveland, Ohio
| | - Natalia Trayanova
- Department of Biomedical Engineering and Alliance for Cardiovascular Diagnostic and Treatment Innovation, Johns Hopkins University, Baltimore, Maryland
| | | | | |
Collapse
|
34
|
Wolff J, Matschinske J, Baumgart D, Pytlik A, Keck A, Natarajan A, von Schacky CE, Pauling JK, Baumbach J. Federated machine learning for a facilitated implementation of Artificial Intelligence in healthcare - a proof of concept study for the prediction of coronary artery calcification scores. J Integr Bioinform 2022; 19:jib-2022-0032. [PMID: 36054833 PMCID: PMC9800042 DOI: 10.1515/jib-2022-0032] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 08/03/2022] [Accepted: 08/11/2022] [Indexed: 01/09/2023] Open
Abstract
The implementation of Artificial Intelligence (AI) still faces significant hurdles and one key factor is the access to data. One approach that could support that is federated machine learning (FL) since it allows for privacy preserving data access. For this proof of concept, a prediction model for coronary artery calcification scores (CACS) has been applied. The FL was trained based on the data in the different institutions, while the centralized machine learning model was trained on one allocation of data. Both algorithms predict patients with risk scores ≥5 based on age, biological sex, waist circumference, dyslipidemia and HbA1c. The centralized model yields a sensitivity of c. 66% and a specificity of c. 70%. The FL slightly outperforms that with a sensitivity of 67% while slightly underperforming it with a specificity of 69%. It could be demonstrated that CACS prediction is feasible via both, a centralized and an FL approach, and that both show very comparable accuracy. In order to increase accuracy, additional and a higher volume of patient data is required and for that FL is utterly necessary. The developed "CACulator" serves as proof of concept, is available as research tool and shall support future research to facilitate AI implementation.
Collapse
Affiliation(s)
- Justus Wolff
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354Freising, Germany
- Syte – Strategy Institute for Digital Health, Hohe Bleichen 8, 20354Hamburg, Germany
| | - Julian Matschinske
- Chair of Computational Systems Biology, University of Hamburg, Notkestreet 9-11, 22607Hamburg, Germany
| | - Dietrich Baumgart
- Preventicum Essen, Theodor-Althoff-Str. 47 45133Essen, Germany
- Preventicum Duesseldorf, Koenigsallee 11, 40212Duesseldorf, Germany
| | - Anne Pytlik
- Preventicum Essen, Theodor-Althoff-Str. 47 45133Essen, Germany
- Preventicum Duesseldorf, Koenigsallee 11, 40212Duesseldorf, Germany
| | - Andreas Keck
- Syte – Strategy Institute for Digital Health, Hohe Bleichen 8, 20354Hamburg, Germany
| | - Arunakiry Natarajan
- Independent Researcher, Digital Health, Informatics and Data Science, Lower Saxony, Germany
| | - Claudio E. von Schacky
- Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, Technical University of Munich, Ismaningerstr. 22, 81675Munich, Germany
| | - Josch K. Pauling
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354Freising, Germany
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354Freising, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Notkestreet 9-11, 22607Hamburg, Germany
- Computational BioMedicine Lab, Institute of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230Odense, Denmark
| |
Collapse
|
35
|
Federated Learning Optimization Algorithm for Automatic Weight Optimal. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8342638. [PMID: 36407688 PMCID: PMC9668465 DOI: 10.1155/2022/8342638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/15/2022] [Accepted: 09/29/2022] [Indexed: 11/11/2022]
Abstract
Federated learning (FL), a distributed machine-learning framework, is poised to effectively protect data privacy and security, and it also has been widely applied in variety of fields in recent years. However, the system heterogeneity and statistical heterogeneity of FL pose serious obstacles to the global model's quality. This study investigates server and client resource allocation in the context of FL system resource efficiency and offers the FedAwo optimization algorithm. This approach combines adaptive learning with federated learning, and makes full use of the computing resources of the server to calculate the optimal weight value corresponding to each client. This approach aggregated the global model according to the optimal weight value, which significantly minimizes the detrimental effects of statistical and system heterogeneity. In the process of traditional FL, we found that a large number of client trainings converge earlier than the specified epoch. However, according to the provisions of traditional FL, the client still needs to be trained for the specified epoch, which leads to the meaningless of a large number of calculations in the client. To further lower the training cost, the augmentation FedAwo ∗ algorithm is proposed. The FedAwo ∗ algorithm takes into account the heterogeneity of clients and sets the criteria for local convergence. When the local model of the client reaches the criteria, it will be returned to the server immediately. In this way, the epoch of the client can dynamically be modified adaptively. A large number of experiments based on MNIST and Fashion-MNIST public datasets reveal that the global model converges faster and has higher accuracy in FedAwo and FedAwo ∗ algorithms than FedAvg, FedProx, and FedAdp baseline algorithms.
Collapse
|
36
|
Liu F, Demosthenes P. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol 2022; 22:287. [PMID: 36335315 PMCID: PMC9636688 DOI: 10.1186/s12874-022-01768-6] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/22/2022] [Indexed: 11/07/2022] Open
Abstract
Abstract
Background
The increased adoption of the internet, social media, wearable devices, e-health services, and other technology-driven services in medicine and healthcare has led to the rapid generation of various types of digital data, providing a valuable data source beyond the confines of traditional clinical trials, epidemiological studies, and lab-based experiments.
Methods
We provide a brief overview on the type and sources of real-world data and the common models and approaches to utilize and analyze real-world data. We discuss the challenges and opportunities of using real-world data for evidence-based decision making This review does not aim to be comprehensive or cover all aspects of the intriguing topic on RWD (from both the research and practical perspectives) but serves as a primer and provides useful sources for readers who interested in this topic.
Results and Conclusions
Real-world hold great potential for generating real-world evidence for designing and conducting confirmatory trials and answering questions that may not be addressed otherwise. The voluminosity and complexity of real-world data also call for development of more appropriate, sophisticated, and innovative data processing and analysis techniques while maintaining scientific rigor in research findings, and attentions to data ethics to harness the power of real-world data.
Collapse
|
37
|
Federated learning review: Fundamentals, enabling technologies, and future applications. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.103061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
38
|
Guo J, Chen Z, Liu Z, Li X, Xie Z, Wang Z, Wang Y. Neural network training method for materials science based on multi-source databases. Sci Rep 2022; 12:15326. [PMID: 36096926 PMCID: PMC9468338 DOI: 10.1038/s41598-022-19426-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Accepted: 08/29/2022] [Indexed: 12/02/2022] Open
Abstract
The fourth paradigm of science has achieved great success in material discovery and it highlights the sharing and interoperability of data. However, most material data are scattered among various research institutions, and a big data transmission will consume significant bandwidth and tremendous time. At the meanwhile, some data owners prefer to protect the data and keep their initiative in the cooperation. This dilemma gradually leads to the “data island” problem, especially in material science. To attack the problem and make full use of the material data, we propose a new strategy of neural network training based on multi-source databases. In the whole training process, only model parameters are exchanged and no any external access or connection to the local databases. We demonstrate its validity by training a model characterizing material structure and its corresponding formation energy, based on two and four local databases, respectively. The results show that the obtained model accuracy trained by this method is almost the same to that obtained from a single database combining all the local ones. Moreover, different communication frequencies between the client and server are also studied to improve the model training efficiency, and an optimal frequency is recommended.
Collapse
Affiliation(s)
- Jialong Guo
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ziyi Chen
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhiwei Liu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xianwei Li
- China Petroleum Pipeline Engineering Co., Ltd., International, Langfang, 065000, Hebei, China
| | - Zhiyuan Xie
- Department of Physics, Renmin University of China, Beijing, 100872, China
| | - Zongguo Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Yangang Wang
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100083, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
39
|
Sun C, van Soest J, Koster A, Eussen SJPM, Schram MT, Stehouwer CDA, Dagnelie PC, Dumontier M. Studying the association of diabetes and healthcare cost on distributed data from the Maastricht Study and Statistics Netherlands using a privacy-preserving federated learning infrastructure. J Biomed Inform 2022; 134:104194. [PMID: 36064113 DOI: 10.1016/j.jbi.2022.104194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 08/26/2022] [Accepted: 08/29/2022] [Indexed: 11/28/2022]
Abstract
The mining of personal data collected by multiple organizations remains challenging in the presence of technical barriers, privacy concerns, and legal and/or organizational restrictions. While a number of privacy-preserving and data mining frameworks have recently emerged, much remains to show their practical utility. In this study, we implement and utilize a secure infrastructure using data from Statistics Netherlands and the Maastricht Study to learn the association between Type 2 Diabetes Mellitus (T2DM) and healthcare expenses considering the impact of lifestyle, physical activities, and complications of T2DM. Through experiments using real-world distributed personal data, we present the feasibility and effectiveness of the secure infrastructure for practical use cases of linking and analyzing vertically partitioned data across multiple organizations. We discovered that individuals diagnosed with T2DM had significantly higher expenses than those with prediabetes, while participants with prediabetes spent more than those without T2DM in all the included healthcare categories to different degrees. We further discuss a joint effort from technical, ethical-legal, and domain-specific experts that is highly valued for applying such a secure infrastructure to real-life use cases to protect data privacy.
Collapse
Affiliation(s)
- Chang Sun
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands.
| | - Johan van Soest
- Brightlands Institute of Smart Society, Faculty of Science and Engineering, Maastricht University, Heerlen, The Netherlands
| | - Annemarie Koster
- Department of Social Medicine, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands; Care and Public Health Research Institute, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Simone J P M Eussen
- School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands; Department of Epidemiology, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Miranda T Schram
- School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands; Department of Internal Medicine, Maastricht University Medical Centre+, Maastricht, The Netherlands; School for Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands; Maastricht Heart & Vascular Center, Maastricht University Medical Center+, Maastricht, The Netherlands
| | - Coen D A Stehouwer
- School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands; Department of Internal Medicine, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Pieter C Dagnelie
- School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands; Department of Internal Medicine, Maastricht University Medical Centre+, Maastricht, The Netherlands
| | - Michel Dumontier
- Institute of Data Science, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
40
|
Wang S, Zhu X. Predictive Modeling of Hospital Readmission: Challenges and Solutions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2975-2995. [PMID: 34133285 DOI: 10.1109/tcbb.2021.3089682] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hospital readmission prediction is a study to learn models from historical medical data to predict probability of a patient returning to hospital in a certain period, e.g. 30 or 90 days, after the discharge. The motivation is to help health providers deliver better treatment and post-discharge strategies, lower the hospital readmission rate, and eventually reduce the medical costs. Due to inherent complexity of diseases and healthcare ecosystems, modeling hospital readmission is facing many challenges. By now, a variety of methods have been developed, but existing literature fails to deliver a complete picture to answer some fundamental questions, such as what are the main challenges and solutions in modeling hospital readmission; what are typical features/models used for readmission prediction; how to achieve meaningful and transparent predictions for decision making; and what are possible conflicts when deploying predictive approaches for real-world usages. In this paper, we systematically review computational models for hospital readmission prediction, and propose a taxonomy of challenges featuring four main categories: (1) data variety and complexity; (2) data imbalance, locality and privacy; (3) model interpretability; and (4) model implementation. The review summarizes methods in each category, and highlights technical solutions proposed to address the challenges. In addition, a review of datasets and resources available for hospital readmission modeling also provides firsthand materials to support researchers and practitioners to design new approaches for effective and efficient hospital readmission prediction.
Collapse
|
41
|
Review on Machine Learning Techniques for Medical Data Classification and Disease Diagnosis. REGENERATIVE ENGINEERING AND TRANSLATIONAL MEDICINE 2022. [DOI: 10.1007/s40883-022-00273-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
42
|
Palanivinayagam A, Kumar VV, Mahesh TR, Singh KK, Singh A. Machine Learning-Based COVID-19 Classification Using E-Adopted CT Scans. INTERNATIONAL JOURNAL OF E-ADOPTION 2022. [DOI: 10.4018/ijea.310001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In recent years, several machine learning models were successfully deployed in various fields. However, a huge quantity of data is required for training good machine learning. Data are distributivity stored across multiple sources and centralizing those data leads to privacy and security issues. To solve this problem, the proposed federated-based method works by exchanging the parameters of three locally trained machine learning models without compromising privacy. Each machine learning model uses the e-adoption of CT scans for improving their training knowledge. The CT scans are electronically transferred between various medical centers. Proper care is taken to prevent identify loss from the e-adopted data. To normalize the parameters, a novel weighting scheme is also exchanged along with the parameters. Thus, the global model is trained with more heterogeneous samples to increase performance. Based on the experiment, the proposed algorithm has obtained 89% of accuracy, which is 32% more than the existing machine learning models.
Collapse
|
43
|
Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS. Federated learning-based AI approaches in smart healthcare: concepts, taxonomies, challenges and open issues. CLUSTER COMPUTING 2022; 26:1-41. [PMID: 35996680 PMCID: PMC9385101 DOI: 10.1007/s10586-022-03658-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 05/10/2022] [Accepted: 06/17/2022] [Indexed: 06/15/2023]
Abstract
Federated Learning (FL), Artificial Intelligence (AI), and Explainable Artificial Intelligence (XAI) are the most trending and exciting technology in the intelligent healthcare field. Traditionally, the healthcare system works based on centralized agents sharing their raw data. Therefore, huge vulnerabilities and challenges are still existing in this system. However, integrating with AI, the system would be multiple agent collaborators who are capable of communicating with their desired host efficiently. Again, FL is another interesting feature, which works decentralized manner; it maintains the communication based on a model in the preferred system without transferring the raw data. The combination of FL, AI, and XAI techniques can be capable of minimizing several limitations and challenges in the healthcare system. This paper presents a complete analysis of FL using AI for smart healthcare applications. Initially, we discuss contemporary concepts of emerging technologies such as FL, AI, XAI, and the healthcare system. We integrate and classify the FL-AI with healthcare technologies in different domains. Further, we address the existing problems, including security, privacy, stability, and reliability in the healthcare field. In addition, we guide the readers to solving strategies of healthcare using FL and AI. Finally, we address extensive research areas as well as future potential prospects regarding FL-based AI research in the healthcare management system.
Collapse
Affiliation(s)
- Anichur Rahman
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Md. Sazzad Hossain
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Ghulam Muhammad
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dipanjali Kundu
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
| | - Tanoy Debnath
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Muaz Rahman
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
| | - Md. Saikat Islam Khan
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Prayag Tiwari
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Shahab S. Band
- Future Technology Research Center, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin, 64002 Taiwan
| |
Collapse
|
44
|
Lee SW, Lee HC, Suh J, Lee KH, Lee H, Seo S, Kim TK, Lee SW, Kim YJ. Multi-center validation of machine learning model for preoperative prediction of postoperative mortality. NPJ Digit Med 2022; 5:91. [PMID: 35821515 PMCID: PMC9276734 DOI: 10.1038/s41746-022-00625-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 06/02/2022] [Indexed: 11/09/2022] Open
Abstract
Accurate prediction of postoperative mortality is important for not only successful postoperative patient care but also for information-based shared decision-making with patients and efficient allocation of medical resources. This study aimed to create a machine-learning prediction model for 30-day mortality after a non-cardiac surgery that adapts to the manageable amount of clinical information as input features and is validated against multi-centered rather than single-centered data. Data were collected from 454,404 patients over 18 years of age who underwent non-cardiac surgeries from four independent institutions. We performed a retrospective analysis of the retrieved data. Only 12–18 clinical variables were used for model training. Logistic regression, random forest classifier, extreme gradient boosting (XGBoost), and deep neural network methods were applied to compare the prediction performances. To reduce overfitting and create a robust model, bootstrapping and grid search with tenfold cross-validation were performed. The XGBoost method in Seoul National University Hospital (SNUH) data delivers the best performance in terms of the area under receiver operating characteristic curve (AUROC) (0.9376) and the area under the precision-recall curve (0.1593). The predictive performance was the best when the SNUH model was validated with Ewha Womans University Medical Center data (AUROC, 0.941). Preoperative albumin, prothrombin time, and age were the most important features in the model for each hospital. It is possible to create a robust artificial intelligence prediction model applicable to multiple institutions through a light predictive model using only minimal preoperative information that can be automatically extracted from each hospital.
Collapse
Affiliation(s)
- Seung Wook Lee
- School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Hyung-Chul Lee
- Department of Anesthesiology and Pain Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jungyo Suh
- Department of Urology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Kyung Hyun Lee
- Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea
| | - Heonyi Lee
- Bioinformatics Collaboration Unit, Department of Biomedical Systems informatics, Yonsei University College of medicine, Seoul, Republic of Korea
| | - Suryang Seo
- Department of Nursing, SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul, South Korea
| | - Tae Kyong Kim
- Department of Anesthesiology and Pain Medicine, SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul, South Korea
| | - Sang-Wook Lee
- Department of Anesthesiology and Pain Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
| | - Yi-Jun Kim
- Institute of Convergence Medicine, Ewha Womans University Mokdong Hospital, Seoul, Republic of Korea.
| |
Collapse
|
45
|
Li Y, Hu S, Li G. CVC: A Collaborative Video Caching Framework Based on Federated Learning at the Edge. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2021.3135306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Yijing Li
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Shihong Hu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, China
| | - Guanghui Li
- School of Artificial Intelligence and Computer Science and the Research Center for IoT Technology Application Engineering (MOE), Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
46
|
Amrollahi F, Shashikumar SP, Holder AL, Nemati S. Leveraging clinical data across healthcare institutions for continual learning of predictive risk models. Sci Rep 2022; 12:8380. [PMID: 35590018 PMCID: PMC9117839 DOI: 10.1038/s41598-022-12497-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 05/11/2022] [Indexed: 01/14/2023] Open
Abstract
The inherent flexibility of machine learning-based clinical predictive models to learn from episodes of patient care at a new institution (site-specific training) comes at the cost of performance degradation when applied to external patient cohorts. To exploit the full potential of cross-institutional clinical big data, machine learning systems must gain the ability to transfer their knowledge across institutional boundaries and learn from new episodes of patient care without forgetting previously learned patterns. In this work, we developed a privacy-preserving learning algorithm named WUPERR (Weight Uncertainty Propagation and Episodic Representation Replay) and validated the algorithm in the context of early prediction of sepsis using data from over 104,000 patients across four distinct healthcare systems. We tested the hypothesis, that the proposed continual learning algorithm can maintain higher predictive performance than competing methods on previous cohorts once it has been trained on a new patient cohort. In the sepsis prediction task, after incremental training of a deep learning model across four hospital systems (namely hospitals H-A, H-B, H-C, and H-D), WUPERR maintained the highest positive predictive value across the first three hospitals compared to a baseline transfer learning approach (H-A: 39.27% vs. 31.27%, H-B: 25.34% vs. 22.34%, H-C: 30.33% vs. 28.33%). The proposed approach has the potential to construct more generalizable models that can learn from cross-institutional clinical big data in a privacy-preserving manner.
Collapse
Affiliation(s)
- Fatemeh Amrollahi
- Division of Biomedical Informatics, University of California San Diego, San Diego, USA
| | | | - Andre L Holder
- Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, Emory University School of Medicine, Atlanta, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California San Diego, San Diego, USA.
| |
Collapse
|
47
|
Che S, Kong Z, Peng H, Sun L, Leow A, Chen Y, He L. Federated Multi-View Learning for Private Medical Data Integration and Analysis. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3501816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in medical field. Two critical challenges are identified: Firstly, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Secondly, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this paper, we present a generic Federated Multi-View Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-View Learning (V-FedMV) and Horizontal Federated Multi-View Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated that the proposed approach can make full use of multi-view data in a privacy-preserving way, and both V-FedMV and H-FedMV perform better than their single-view and pairwise counterparts. Besides, the framework can be easily adapted to deal with multi-view sequential data. We have developed a sequential model (S-FedMV) that takes sequence of multi-view data as input and demonstrated it experimentally. To the best of our knowledge, this framework is the first to consider both vertical and horizontal diversification in the multi-view setting, as well as their sequential federated learning.
Collapse
Affiliation(s)
| | | | | | | | - Alex Leow
- University of Illinois at Chicago, USA
| | | | | |
Collapse
|
48
|
Safaei N, Safaei B, Seyedekrami S, Talafidaryani M, Masoud A, Wang S, Li Q, Moqri M. E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database. PLoS One 2022; 17:e0262895. [PMID: 35511882 PMCID: PMC9070907 DOI: 10.1371/journal.pone.0262895] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 01/09/2022] [Indexed: 11/19/2022] Open
Abstract
Improving the Intensive Care Unit (ICU) management network and building cost-effective and well-managed healthcare systems are high priorities for healthcare units. Creating accurate and explainable mortality prediction models helps identify the most critical risk factors in the patients' survival/death status and early detect the most in-need patients. This study proposes a highly accurate and efficient machine learning model for predicting ICU mortality status upon discharge using the information available during the first 24 hours of admission. The most important features in mortality prediction are identified, and the effects of changing each feature on the prediction are studied. We used supervised machine learning models and illness severity scoring systems to benchmark the mortality prediction. We also implemented a combination of SHAP, LIME, partial dependence, and individual conditional expectation plots to explain the predictions made by the best-performing model (CatBoost). We proposed E-CatBoost, an optimized and efficient patient mortality prediction model, which can accurately predict the patients' discharge status using only ten input features. We used eICU-CRD v2.0 to train and validate the models; the dataset contains information on over 200,000 ICU admissions. The patients were divided into twelve disease groups, and models were fitted and tuned for each group. The models' predictive performance was evaluated using the area under a receiver operating curve (AUROC). The AUROC scores were 0.86 [std:0.02] to 0.92 [std:0.02] for CatBoost and 0.83 [std:0.02] to 0.91 [std:0.03] for E-CatBoost models across the defined disease groups; if measured over the entire patient population, their AUROC scores were 7 to 18 and 2 to 12 percent higher than the baseline models, respectively. Based on SHAP explanations, we found age, heart rate, respiratory rate, blood urine nitrogen, and creatinine level as the most critical cross-disease features in mortality predictions.
Collapse
Affiliation(s)
- Nima Safaei
- Department of Business Analytics and Information Systems, Tippie College of Business, University of Iowa, Iowa City, IA, United States of America
| | - Babak Safaei
- Civil and Environmental Engineering Department, Michigan State University, East Lansing, MI, United States of America
| | - Seyedhouman Seyedekrami
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States of America
| | | | - Arezoo Masoud
- Department of Business Analytics and Information Systems, Tippie College of Business, University of Iowa, Iowa City, IA, United States of America
| | - Shaodong Wang
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States of America
| | - Qing Li
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States of America
| | - Mahdi Moqri
- Department of Information Systems and Business Analytics, Ivy College of Business, Iowa State University, Ames, IA, United States of America
| |
Collapse
|
49
|
Dang TK, Lan X, Weng J, Feng M. Federated Learning for Electronic Health Records. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3514500] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In data-driven medical research, multi-center studies have long been preferred over single-center ones due to a single institute sometimes not having enough data to obtain sufficient statistical power for certain hypothesis testings as well as predictive and subgroup studies. The wide adoption of electronic health records (EHRs) has made multi-institutional collaboration much more feasible. However, concerns over infrastructures, regulations, privacy and data standardization present a challenge to data sharing across healthcare institutions. Federated Learning (FL), which allows multiple sites to collaboratively train a global model without directly sharing data, has become a promising paradigm to break the data isolation. In this study, we surveyed existing works on FL applications in EHRs and evaluated the performance of current state-of-the-art FL algorithms on two EHR machine learning tasks of significant clinical importance on a real world multi-center EHR dataset.
Collapse
Affiliation(s)
| | - Xiang Lan
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | | | - Mengling Feng
- Institute of Data Science & Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| |
Collapse
|
50
|
Fan G, Yang S, Liu H, Xu N, Chen Y, He J, Su X, Pang M, Liu B, Han L, Rong L. Machine Learning-based Prediction of Prolonged Intensive Care Unit Stay for Critical Patients with Spinal Cord Injury. Spine (Phila Pa 1976) 2022; 47:E390-E398. [PMID: 34690328 DOI: 10.1097/brs.0000000000004267] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
STUDY DESIGN A retrospective cohort study. OBJECTIVE The objective of the study was to develop machine-learning (ML) classifiers for predicting prolonged intensive care unit (ICU)-stay and prolonged hospital-stay for critical patients with spinal cord injury (SCI). SUMMARY OF BACKGROUND DATA Critical patients with SCI in ICU need more attention. SCI patients with prolonged stay in ICU usually occupy vast medical resources and hinder the rehabilitation deployment. METHODS A total of 1599 critical patients with SCI were included in the study and labeled with prolonged stay or normal stay. All data were extracted from the eICU Collaborative Research Database and the Medical Information Mart for Intensive Care III-IV Database. The extracted data were randomly divided into training, validation and testing (6:2:2) subdatasets. A total of 91 initial ML classifiers were developed, and the top three initial classifiers with the best performance were further stacked into an ensemble classifier with logistic regressor. The area under the curve (AUC) was the main indicator to assess the prediction performance of all classifiers. The primary predicting outcome was prolonged ICU-stay, while the secondary predicting outcome was prolonged hospital-stay. RESULTS In predicting prolonged ICU-stay, the AUC of the ensemble classifier was 0.864 ± 0.021 in the three-time five-fold cross-validation and 0.802 in the independent testing. In predicting prolonged hospital-stay, the AUC of the ensemble classifier was 0.815 ± 0.037 in the three-time five-fold cross-validation and 0.799 in the independent testing. Decision curve analysis showed the merits of the ensemble classifiers, as the curves of the top three initial classifiers varied a lot in either predicting prolonged ICU-stay or discriminating prolonged hospital-stay. CONCLUSION The ensemble classifiers successfully predict the prolonged ICU-stay and the prolonged hospital-stay, which showed a high potential of assisting physicians in managing SCI patients in ICU and make full use of medical resources.Level of Evidence: 3.
Collapse
Affiliation(s)
- Guoxin Fan
- Department of Spine Surgery, Third Affiliated Hospital, Sun Yatsen University, Guangzhou, China
- Intelligent and Digital Surgery Innovation Center, Southern University of Science and Technology Hospital, Shenzhen, Guangdong, China
| | - Sheng Yang
- Department of Orthopedics, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Huaqing Liu
- Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, China
| | - Ningze Xu
- Tongji University School of Medicine, Shanghai, P. R. China
| | - Yuyong Chen
- Department of Spine Surgery, Third Affiliated Hospital, Sun Yatsen University, Guangzhou, China
- Intelligent and Digital Surgery Innovation Center, Southern University of Science and Technology Hospital, Shenzhen, Guangdong, China
| | - Jie He
- Intelligent and Digital Surgery Innovation Center, Southern University of Science and Technology Hospital, Shenzhen, Guangdong, China
| | - Xiuyun Su
- Intelligent and Digital Surgery Innovation Center, Southern University of Science and Technology Hospital, Shenzhen, Guangdong, China
| | - Mao Pang
- Department of Spine Surgery, Third Affiliated Hospital, Sun Yatsen University, Guangzhou, China
| | - Bin Liu
- Department of Spine Surgery, Third Affiliated Hospital, Sun Yatsen University, Guangzhou, China
| | - Lanqing Han
- Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, China
| | - Limin Rong
- Department of Spine Surgery, Third Affiliated Hospital, Sun Yatsen University, Guangzhou, China
| |
Collapse
|