1
|
Oskolkova A, Oskolkov B, Liu T, Liu C. Cost-Saving Data-Driven Diabetic Retinopathy Prediction via a Sampling-Empowered Incremental Learning Approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-6. [PMID: 40040100 DOI: 10.1109/embc53108.2024.10782548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Diabetic retinopathy (DR) is a serious complication of diabetes that can lead to vision impairment or even blindness if not detected and treated in the early stage. Recently, leveraging the electronic health records (EHR) data, machine learning-based DR prediction becomes a promising research direction to achieve timely diagnosis of DR. In practice, the EHR database usually increases periodically, leading to an urgent need for an approach to update the DR prediction model by incorporating the new data. However, it is costly to keep retraining the model using combined data. Therefore, this study proposes to establish an effective incremental learning framework that allows the machine learning-based DR prediction model to continuously learn from new data while retaining knowledge from previous observations. Specifically, the proposed incremental learning approach integrates a weighted sampling strategy, so that the model is able to learn new information without forgetting previously learned patterns. The proposed sampling-empowered incremental learning approach was tested on different classification models. The results demonstrated that the proposed incremental learning framework with sampling strategy enables higher efficiency and even more accurate prediction of DR, while mitigating the challenges associated with periodically updated EHR database. By leveraging this approach, healthcare providers can achieve significant cost savings and maintain DR prediction accuracy.
Collapse
|
2
|
Pezoulas VC, Tachos NS, Olivotto I, Barlocco F, Fotiadis DI. A "smart" Imputation Approach for Effective Quality Control Across Complex Clinical Data Structures. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1049-1052. [PMID: 36086027 DOI: 10.1109/embc48229.2022.9871919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The overwhelming need to improve the quality of complex data structures in healthcare is more important than ever. Although data quality has been the point of interest in many studies, none of them has focused on the development of quantitative and explainable methods for data imputation. In this work, we propose a "smart" imputation workflow to address missing data across complex data structures in the context of in silico clinical trials. AI algorithms were utilized to produce high-quality virtual patient profiles. A search algorithm was then developed to extract the best virtual patient profiles through the definition of a profile matching score (PMS). A case study was conducted, where the real dataset was randomly contaminated with multiple missing values (e.g., 10 to 50%). In total, 10000 virtual patient profiles with less than 0.02 Kullback-Leibler (KL) divergence were produced to estimate the PMS distribution. The best generator achieved the lowest average squared absolute difference (0.4) and average correlation difference (0.02) with the real dataset highlighting its increased effectiveness for data imputation across complex clinical data structures.
Collapse
|
3
|
Antunes RS, da Costa CA, Küderle A, Yari IA, Eskofier B. Federated Learning for Healthcare: Systematic Review and Architecture Proposal. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3501813] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The use of machine learning (ML) with electronic health records (EHR) is growing in popularity as a means to extract knowledge that can improve the decision-making process in healthcare. Such methods require training of high-quality learning models based on diverse and comprehensive datasets, which are hard to obtain due to the sensitive nature of medical data from patients. In this context, federated learning (FL) is a methodology that enables the distributed training of machine learning models with remotely hosted datasets without the need to accumulate data and, therefore, compromise it. FL is a promising solution to improve ML-based systems, better aligning them to regulatory requirements, improving trustworthiness and data sovereignty. However, many open questions must be addressed before the use of FL becomes widespread. This article aims at presenting a systematic literature review on current research about FL in the context of EHR data for healthcare applications. Our analysis highlights the main research topics, proposed solutions, case studies, and respective ML methods. Furthermore, the article discusses a general architecture for FL applied to healthcare data based on the main insights obtained from the literature review. The collected literature corpus indicates that there is extensive research on the privacy and confidentiality aspects of training data and model sharing, which is expected given the sensitive nature of medical data. Studies also explore improvements to the aggregation mechanisms required to generate the learning model from distributed contributions and case studies with different types of medical data.
Collapse
Affiliation(s)
| | | | | | | | - Björn Eskofier
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| |
Collapse
|
4
|
Pezoulas VC, Kourou KD, Mylona E, Papaloukas C, Liontos A, Biros D, Milionis OI, Kyriakopoulos C, Kostikas K, Milionis H, Fotiadis DI. ICU admission and mortality classifiers for COVID-19 patients based on subgroups of dynamically associated profiles across multiple timepoints. Comput Biol Med 2022; 141:105176. [PMID: 35007991 PMCID: PMC8711179 DOI: 10.1016/j.compbiomed.2021.105176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/22/2021] [Accepted: 12/23/2021] [Indexed: 01/08/2023]
Abstract
The coronavirus disease 2019 (COVID-19) which is caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) is consistently causing profound wounds in the global healthcare system due to its increased transmissibility. Currently, there is an urgent unmet need to identify the underlying dynamic associations among COVID-19 patients and distinguish patient subgroups with common clinical profiles towards the development of robust classifiers for ICU admission and mortality. To address this need, we propose a four step pipeline which: (i) enhances the quality of multiple timeseries clinical data through an automated data curation workflow, (ii) deploys Dynamic Bayesian Networks (DBNs) for the detection of features with increased connectivity based on dynamic association analysis across multiple points, (iii) utilizes Self Organizing Maps (SOMs) and trajectory analysis for the early identification of COVID-19 patients with common clinical profiles, and (iv) trains robust multiple additive regression trees (MART) for ICU admission and mortality classification based on the extracted homogeneous clusters, to identify risk factors and biomarkers for disease progression. The contribution of the extracted clusters and the dynamically associated clinical data improved the classification performance for ICU admission to sensitivity 0.83 and specificity 0.83, and for mortality to sensitivity 0.74 and specificity 0.76. Additional information was included to enhance the performance of the classifiers yielding an increase by 4% in sensitivity and specificity for mortality. According to the risk factor analysis, the number of lymphocytes, SatO2, PO2/FiO2, and O2 supply type were highlighted as risk factors for ICU admission and the percentage of neutrophils and lymphocytes, PO2/FiO2, LDH, and ALP for mortality, among others. To our knowledge, this is the first study that combines dynamic modeling with clustering analysis to identify homogeneous groups of COVID-19 patients towards the development of robust classifiers for ICU admission and mortality.
Collapse
Affiliation(s)
- Vasileios C Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - Konstantina D Kourou
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - Eugenia Mylona
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - Costas Papaloukas
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece; Dept. of Biological Applications and Technology, University of Ioannina, Ioannina, GR45100, Greece
| | - Angelos Liontos
- Dept. of Internal Medicine, School of Medicine, University of Ioannina, Ioannina, GR45110, Greece
| | - Dimitrios Biros
- Dept. of Internal Medicine, School of Medicine, University of Ioannina, Ioannina, GR45110, Greece
| | - Orestis I Milionis
- Dept. of Internal Medicine, School of Medicine, University of Ioannina, Ioannina, GR45110, Greece
| | - Chris Kyriakopoulos
- Respiratory Medicine Dept., School of Medicine, University of Ioannina, Ioannina, GR45110, Greece
| | - Kostantinos Kostikas
- Respiratory Medicine Dept., School of Medicine, University of Ioannina, Ioannina, GR45110, Greece
| | - Haralampos Milionis
- Dept. of Internal Medicine, School of Medicine, University of Ioannina, Ioannina, GR45110, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece; Institute of Biomedical Research, FORTH, Ioannina, GR45110, Greece.
| |
Collapse
|
5
|
Pezoulas VC, Kalatzis F, Exarchos TP, Chatzis L, Gandolfo S, Goules A, De Vita S, Tzioufas AG, Fotiadis DI. A federated AI strategy for the classification of patients with Mucosa Associated Lymphoma Tissue (MALT) lymphoma across multiple harmonized cohorts. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1666-1669. [PMID: 34891605 DOI: 10.1109/embc46164.2021.9630014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Mucosa Associated Lymphoma Tissue (MALT) type is an extremely rare type of lymphoma which occurs in less than 3% of patients with primary Sjögren's Syndrome (pSS). No reported studies so far have been able to investigate risk factors for MALT development across multiple cohort databases with sufficient statistical power. Here, we present a generalized, federated AI (artificial intelligence) strategy which enables the training of AI algorithms across multiple harmonized databases. A case study is conducted towards the development of MALT classification models across 17 databases on pSS. Advanced AI algorithms were developed, including federated Multinomial Naïve Bayes (FMNB), federated gradient boosting trees (FGBT), FGBT with dropouts (FDART), and the federated Multilayer Perceptron (FMLP). The FDART with dropout rate 0.3 achieved the best performance with sensitivity 0.812, and specificity 0.829, yielding 8 biomarkers as prominent for MALT development.
Collapse
|
6
|
Pezoulas VC, Grigoriadis GI, Gkois G, Tachos NS, Smole T, Bosnić Z, Pičulin M, Olivotto I, Barlocco F, Robnik-Šikonja M, Jakovljevic DG, Goules A, Tzioufas AG, Fotiadis DI. A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: A case study in two clinical domains. Comput Biol Med 2021; 134:104520. [PMID: 34118751 DOI: 10.1016/j.compbiomed.2021.104520] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 05/13/2021] [Accepted: 05/24/2021] [Indexed: 11/20/2022]
Abstract
Virtual population generation is an emerging field in data science with numerous applications in healthcare towards the augmentation of clinical research databases with significant lack of population size. However, the impact of data augmentation on the development of AI (artificial intelligence) models to address clinical unmet needs has not yet been investigated. In this work, we assess whether the aggregation of real with virtual patient data can improve the performance of the existing risk stratification and disease classification models in two rare clinical domains, namely the primary Sjögren's Syndrome (pSS) and the hypertrophic cardiomyopathy (HCM), for the first time in the literature. To do so, multivariate approaches, such as, the multivariate normal distribution (MVND), and straightforward ones, such as, the Bayesian networks, the artificial neural networks (ANNs), and the tree ensembles are compared against their performance towards the generation of high-quality virtual data. Both boosting and bagging algorithms, such as, the Gradient boosting trees (XGBoost), the AdaBoost and the Random Forests (RFs) were trained on the augmented data to evaluate the performance improvement for lymphoma classification and HCM risk stratification. Our results revealed the favorable performance of the tree ensemble generators, in both domains, yielding virtual data with goodness-of-fit 0.021 and KL-divergence 0.029 in pSS and 0.029, 0.027 in HCM, respectively. The application of the XGBoost on the augmented data revealed an increase by 10.9% in accuracy, 10.7% in sensitivity, 11.5% in specificity for lymphoma classification and 16.1% in accuracy, 16.9% in sensitivity, 13.7% in specificity in HCM risk stratification.
Collapse
Affiliation(s)
- Vasileios C Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - Grigoris I Grigoriadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - George Gkois
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - Nikolaos S Tachos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece
| | - Tim Smole
- Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia
| | - Zoran Bosnić
- Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia
| | - Matej Pičulin
- Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia
| | - Iacopo Olivotto
- Department of Experimental and Clinical Medicine, University of Florence and Cardiomyopathies Unit, Azienda Ospedaliera Careggi, Florence, Italy
| | - Fausto Barlocco
- Department of Experimental and Clinical Medicine, University of Florence and Cardiomyopathies Unit, Azienda Ospedaliera Careggi, Florence, Italy
| | - Marko Robnik-Šikonja
- Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000, Ljubljana, Slovenia
| | - Djordje G Jakovljevic
- Faculty of Medical Sciences, Newcastle University, Newcastle Upon Tyne, UK and with the Faculty of Health and Life Sciences, Coventry University, Coventry, UK
| | - Andreas Goules
- Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), GR 15772, Athens, Greece
| | - Athanasios G Tzioufas
- Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), GR 15772, Athens, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, GR45110, Greece; Department of Biomedical Research, FORTH-IMBB, Ioannina, GR45110, Greece.
| |
Collapse
|