1
|
Shukla S, Rajkumar S, Sinha A, Esha M, Elango K, Sampath V. Federated learning with differential privacy for breast cancer diagnosis enabling secure data sharing and model integrity. Sci Rep 2025; 15:13061. [PMID: 40240790 PMCID: PMC12003885 DOI: 10.1038/s41598-025-95858-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 03/24/2025] [Indexed: 04/18/2025] Open
Abstract
In the digital age, privacy preservation is of paramount importance while processing health-related sensitive information. This paper explores the integration of Federated Learning (FL) and Differential Privacy (DP) for breast cancer detection, leveraging FL's decentralized architecture to enable collaborative model training across healthcare organizations without exposing raw patient data. To enhance privacy, DP injects statistical noise into the updates made by the model. This mitigates adversarial attacks and prevents data leakage. The proposed work uses the Breast Cancer Wisconsin Diagnostic dataset to address critical challenges such as data heterogeneity, privacy-accuracy trade-offs, and computational overhead. From the experimental results, FL combined with DP achieves 96.1% accuracy with a privacy budget of ε = 1.9, ensuring strong privacy preservation with minimal performance trade-offs. In comparison, the traditional non-FL model achieved 96.0% accuracy, but at the cost of requiring centralized data storage, which poses significant privacy risks. These findings validate the feasibility of privacy-preserving artificial intelligence models in real-world clinical applications, effectively balancing data protection with reliable medical predictions.
Collapse
Affiliation(s)
- Shubhi Shukla
- School of Electrical Engineering, Vellore Institute of Technology, Vellore, 632014, India
| | - Suraksha Rajkumar
- School of Electronics Engineering, Vellore Institute of Technology, Vellore, 632014, India
| | - Aditi Sinha
- School of Electronics Engineering, Vellore Institute of Technology, Vellore, 632014, India
| | - Mohamed Esha
- School of Mechanical Engineering, Vellore Institute of Technology, Chennai, 600127, India
| | - Konguvel Elango
- School of Electronics Engineering, Vellore Institute of Technology, Vellore, 632014, India.
| | - Vidhya Sampath
- School of Electronics Engineering, Vellore Institute of Technology, Vellore, 632014, India
| |
Collapse
|
2
|
Vincent ACSR, Sengan S. Edge computing-based ensemble learning model for health care decision systems. Sci Rep 2024; 14:26997. [PMID: 39506092 PMCID: PMC11541999 DOI: 10.1038/s41598-024-78225-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 10/29/2024] [Indexed: 11/08/2024] Open
Abstract
A growing number of humans have suffered severe chronic illnesses, which has caused a boost in the requirement for diagnostic and medical treatment procedures that are both accurate and fast. Improved patient conditions and enhanced Decision-Making Systems (DMS) for healthcare professionals are the primary objectives of the Clinical Decision Support System (CDSS) recommended in this research article. The main drawback of traditional Machine Learning (ML) techniques is their failure to predict reliably. To solve this problem, the proposed model creates an Ensemble Extreme Learning Machine (EN-ELM) algorithm that combines predictors trained on several different data sets. This lowers the chance of overfitting. The suggested CDSS uses many different data processing methods, including Adaptive Synthetic (ADASYN) and isolation Forest (iForest), which fix problems like outliers and class imbalance. This approach significantly enhances the framework's classification performance. Also, the CDSS is compatible with an EC model, which enables real-time computation while minimizing the requirement for integrated systems. The recommended CDSS applies iForest and ADASYN to execute large-scale trials validating high standards of accuracy across numerous datasets. Researchers concluded that a suitable ELM classification threshold of 85% is the most effective, which substantially boosts the accuracy of the predictive model. When applied to various medical datasets, such as Hepatocellular Carcinoma (HCC), Cervical Cancer, Chronic Kidney Disease (CKD), Heart Disease, and Arrhythmia, the EN-ELM achieved accuracy rates of 99.36%, 98.15%, 97.85%, 97.06%, and 96.72%, respectively. By measuring this progress, the CDSS could dramatically improve the accuracy of chronic illness diagnosis and treatment, which similarly affects clinicians.
Collapse
Affiliation(s)
| | - Sudhakar Sengan
- Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu, 627451, India.
| |
Collapse
|
3
|
Bopche R, Gustad LT, Afset JE, Ehrnström B, Damås JK, Nytrø Ø. In-hospital mortality, readmission, and prolonged length of stay risk prediction leveraging historical electronic patient records. JAMIA Open 2024; 7:ooae074. [PMID: 39282081 PMCID: PMC11401612 DOI: 10.1093/jamiaopen/ooae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/16/2024] [Accepted: 07/26/2024] [Indexed: 09/18/2024] Open
Abstract
Objective This study aimed to investigate the predictive capabilities of historical patient records to predict patient adverse outcomes such as mortality, readmission, and prolonged length of stay (PLOS). Methods Leveraging a de-identified dataset from a tertiary care university hospital, we developed an eXplainable Artificial Intelligence (XAI) framework combining tree-based and traditional machine learning (ML) models with interpretations and statistical analysis of predictors of mortality, readmission, and PLOS. Results Our framework demonstrated exceptional predictive performance with a notable area under the receiver operating characteristic (AUROC) of 0.9625 and an area under the precision-recall curve (AUPRC) of 0.8575 for 30-day mortality at discharge and an AUROC of 0.9545 and AUPRC of 0.8419 at admission. For the readmission and PLOS risk, the highest AUROC achieved were 0.8198 and 0.9797, respectively. The tree-based models consistently outperformed the traditional ML models in all 4 prediction tasks. The key predictors were age, derived temporal features, routine laboratory tests, and diagnostic and procedural codes. Conclusion The study underscores the potential of leveraging medical history for enhanced hospital predictive analytics. We present an accurate and intuitive framework for early warning models that can be easily implemented in the current and developing digital health platforms to predict adverse outcomes accurately.
Collapse
Affiliation(s)
- Rajeev Bopche
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, 7491, Norway
| | - Lise Tuset Gustad
- Faculty of Nursing and Health Sciences, Nord University, Levanger, 7600, Norway
- Department of Medicine and Rehabilitation, Levanger Hospital, Nord-Trøndelag Hospital Trust, Levanger, 7601, Norway
| | - Jan Egil Afset
- Department of Medical Microbiology, St Olavs Hospital, Trondheim University Hospital, Trondheim, 7030, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, 7491, Norway
| | - Birgitta Ehrnström
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, 7491, Norway
- Department of Infectious Diseases, Clinic of Medicine, St Olavs Hospital, Trondheim, 7006, Norway
- Clinic of Anaesthesia and Intensive Care, St Olavs Hospital, Trondheim University Hospital, Trondheim, 7006, Norway
| | - Jan Kristian Damås
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, 7491, Norway
- Department of Infectious Diseases, Clinic of Medicine, St Olavs Hospital, Trondheim, 7006, Norway
| | - Øystein Nytrø
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, 7491, Norway
- Department of Computer Science, The Arctic University of Norway, Tromsø, 9037, Norway
| |
Collapse
|
4
|
Sewpaul R, Awe OO, Dogbey DM, Sekgala MD, Dukhi N. Classification of Obesity among South African Female Adolescents: Comparative Analysis of Logistic Regression and Random Forest Algorithms. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 21:2. [PMID: 38276791 PMCID: PMC10815679 DOI: 10.3390/ijerph21010002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 01/27/2024]
Abstract
BACKGROUND This study evaluates the performance of logistic regression (LR) and random forest (RF) algorithms to model obesity among female adolescents in South Africa. METHODS Data was analysed on 375 females aged 15-17 from the South African National Health and Nutrition Examination Survey 2011/2012. The primary outcome was obesity, defined as body mass index (BMI) ≥ 30 kg/m2. A total of 31 explanatory variables were included, ranging from socio-economic, demographic, family history, dietary and health behaviour. RF and LR models were run using imbalanced data as well as after oversampling, undersampling, and hybrid sampling of the data. RESULTS Using the imbalanced data, the RF model performed better with higher precision, recall, F1 score, and balanced accuracy. Balanced accuracy was highest with the hybrid data (0.618 for RF and 0.668 for LR). Using the hybrid balanced data, the RF model performed better (F1-score = 0.940 for RF vs. 0.798 for LR). CONCLUSION The model with the highest overall performance metrics was the RF model both before balancing the data and after applying hybrid balancing. Future work would benefit from using larger datasets on adolescent female obesity to assess the robustness of the models.
Collapse
Affiliation(s)
- Ronel Sewpaul
- Public Health, Societies and Belonging, Human Sciences Research Council, Merchant House, 2 Dock Rail Road, Cape Town 8001, South Africa;
| | - Olushina Olawale Awe
- Institute of Mathematics, Statistics and Scientific Computing (IMECC), University of Campinas, Campinas 13083-859, Brazil;
| | - Dennis Makafui Dogbey
- Medical Biotechnology and Immunotherapy Research Unit, Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7700, South Africa
| | - Machoene Derrick Sekgala
- Non-Communicable Diseases, South African Medical Research Council, Cape Town 7505, South Africa;
| | - Natisha Dukhi
- Public Health, Societies and Belonging, Human Sciences Research Council, Merchant House, 2 Dock Rail Road, Cape Town 8001, South Africa;
| |
Collapse
|
5
|
Leme DEDC, de Oliveira C. Machine Learning Models to Predict Future Frailty in Community-Dwelling Middle-Aged and Older Adults: The ELSA Cohort Study. J Gerontol A Biol Sci Med Sci 2023; 78:2176-2184. [PMID: 37209408 PMCID: PMC10613015 DOI: 10.1093/gerona/glad127] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Indexed: 05/22/2023] Open
Abstract
BACKGROUND Machine learning (ML) models can be used to predict future frailty in the community setting. However, outcome variables for epidemiologic data sets such as frailty usually have an imbalance between categories, that is, there are far fewer individuals classified as frail than as nonfrail, adversely affecting the performance of ML models when predicting the syndrome. METHODS A retrospective cohort study with participants (50 years or older) from the English Longitudinal Study of Ageing who were nonfrail at baseline (2008-2009) and reassessed for the frailty phenotype at 4-year follow-up (2012-2013). Social, clinical, and psychosocial baseline predictors were selected to predict frailty at follow-up in ML models (Logistic Regression, Random Forest [RF], Support Vector Machine, Neural Network, K-nearest neighbor, and Naive Bayes classifier). RESULTS Of all the 4 378 nonfrail participants at baseline, 347 became frail at follow-up. The proposed combined oversampling and undersampling method to adjust imbalanced data improved the performance of the models, and RF had the best performance, with areas under the receiver-operating characteristic curve and the precision-recall curve of 0.92 and 0.97, respectively, specificity of 0.83, sensitivity of 0.88, and balanced accuracy of 85.5% for balanced data. Age, chair-rise test, household wealth, balance problems, and self-rated health were the most important frailty predictors in most of the models trained with balanced data. CONCLUSIONS ML proved useful in identifying individuals who became frail over time, and this result was made possible by balancing the data set. This study highlighted factors that may be useful in the early detection of frailty.
Collapse
Affiliation(s)
| | - Cesar de Oliveira
- Department of Epidemiology and Public Health, University College London, London, UK
| |
Collapse
|
6
|
A Modified Ant Lion Optimization Method and Its Application for Instance Reduction Problem in Balanced and Imbalanced Data. AXIOMS 2022. [DOI: 10.3390/axioms11030095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Instance reduction is a pre-processing step devised to improve the task of classification. Instance reduction algorithms search for a reduced set of instances to mitigate the low computational efficiency and high storage requirements. Hence, finding the optimal subset of instances is of utmost importance. Metaheuristic techniques are used to search for the optimal subset of instances as a potential application. Antlion optimization (ALO) is a recent metaheuristic algorithm that simulates antlion’s foraging performance in finding and attacking ants. However, the ALO algorithm suffers from local optima stagnation and slow convergence speed for some optimization problems. In this study, a new modified antlion optimization (MALO) algorithm is recommended to improve the primary ALO performance by adding a new parameter that depends on the step length of each ant while revising the antlion position. Furthermore, the suggested MALO algorithm is adapted to the challenge of instance reduction to obtain better results in terms of many metrics. The results based on twenty-three benchmark functions at 500 iterations and thirteen benchmark functions at 1000 iterations demonstrate that the proposed MALO algorithm escapes the local optima and provides a better convergence rate as compared to the basic ALO algorithm and some well-known and recent optimization algorithms. In addition, the results based on 15 balanced and imbalanced datasets and 18 oversampled imbalanced datasets show that the instance reduction proposed method can statistically outperform the basic ALO algorithm and has strong competitiveness against other comparative algorithms in terms of four performance measures: Accuracy, Balanced Accuracy (BACC), Geometric mean (G-mean), and Area Under the Curve (AUC) in addition to the run time. MALO algorithm results show increment in Accuracy, BACC, G-mean, and AUC rates up to 7%, 3%, 15%, and 9%, respectively, for some datasets over the basic ALO algorithm while keeping less computational time.
Collapse
|
7
|
Magboo VPC, Magboo MSA. Machine Learning Classifiers on Breast Cancer Recurrences. PROCEDIA COMPUTER SCIENCE 2021; 192:2742-2752. [DOI: 10.1016/j.procs.2021.09.044] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|