1
|
Abas MZ, Li K, Choo WY, Wan KS, Hairi NN. Machine Learning Models for Predicting Type 2 Diabetes Complications in Malaysia. Asia Pac J Public Health 2025:10105395251332798. [PMID: 40251861 DOI: 10.1177/10105395251332798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2025]
Abstract
This study aimed to develop machine learning (ML) models to predict diabetic complications in patients with type 2 diabetes (T2D) in Malaysia. Data from the Malaysian National Diabetes Registry and Death Register were used to develop predictive models for five complications: all-cause mortality, retinopathy, nephropathy, ischemic heart disease (IHD), and cerebrovascular disease (CeVD). Accurate predictions may enable targeted preventive intervention and optimal disease management. The cohort comprised 90 933 T2D patients treated at public health clinics in southern Malaysia from 2011 to 2021. Seven ML algorithms were tested, with the Light Gradient Boosting Machine (LGBM) demonstrating the best performance. LGBM models achieved ROC-AUC scores of 0.84 for all-cause mortality, 0.71 for retinopathy, 0.71 for nephropathy, 0.66 for IHD, and 0.74 for CeVD. These findings support integrating ML models, particularly LGBM, into clinical practice for predicting diabetes complications. Further optimization and validation are necessary to enhance applicability across diverse populations.
Collapse
Affiliation(s)
- Mohamad Zulfikrie Abas
- Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Kezhi Li
- Institute of Health Informatics, University College London, London, UK
| | - Wan Yuen Choo
- Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Kim Sui Wan
- Institute of Public Health, National Institute of Health, Selangor, Malaysia
| | - Noran Naqiah Hairi
- Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
2
|
Voskergian D, Bakir-Gungor B, Yousef M. Engineering novel features for diabetes complication prediction using synthetic electronic health records. Front Genet 2025; 16:1451290. [PMID: 40309033 PMCID: PMC12041673 DOI: 10.3389/fgene.2025.1451290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 01/31/2025] [Indexed: 05/02/2025] Open
Abstract
Diabetes significantly affects millions of people worldwide, leading to substantial morbidity, disability, and mortality rates. Predicting diabetes-related complications from health records is crucial for early prevention and for the development of effective treatment plans. In order to predict four different complications of diabetes mellitus, i.e., retinopathy, chronic kidney disease, ischemic heart disease, and amputations, this study introduces a novel feature engineering approach. While developing the classification models, we utilize XGBoost feature selection method and various supervised machine learning algorithms, including Random Forest, XGBoost, LogitBoost, AdaBoost, and Decision Tree. These models were trained on synthetic electronic health records (EHR) generated by dual-adversarial autoencoders. These EHRs represent nearly 1 million synthetic patients derived from an authentic cohort of 979,308 individuals with diabetes. The variables considered in the models were the age range accompanied by chronic diseases that occur during patient visits starting from the onset of diabetes. Throughout the experiments, XGBoost and Random Forest demonstrated the best overall prediction performance. The final models, which are tailored to each complication and trained using our feature engineering approach, achieved an accuracy between 69% and 77% and an AUC between 77% and 84% using cross-validation, while the partitioned validation approach yielded an accuracy between 59% and 78% and an AUC between 66% and 85%. These findings imply that the performance of our method surpass the performance of the traditional Bag-of-Features approach, highlighting the effectiveness of our approach in enhancing model accuracy and robustness.
Collapse
Affiliation(s)
- Daniel Voskergian
- Computer Engineering Department, Al-Quds University, Jerusalem, Palestine
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Türkiye
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
| |
Collapse
|
3
|
Liu L, Bi B, Gui M, Zhang L, Ju F, Wang X, Cao L. Development and internal validation of an interpretable risk prediction model for diabetic peripheral neuropathy in type 2 diabetes: a single-centre retrospective cohort study in China. BMJ Open 2025; 15:e092463. [PMID: 40180384 PMCID: PMC11969608 DOI: 10.1136/bmjopen-2024-092463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 03/07/2025] [Indexed: 04/05/2025] Open
Abstract
OBJECTIVE Diabetic peripheral neuropathy (DPN) is a common and serious complication of diabetes, which can lead to foot deformity, ulceration, and even amputation. Early identification is crucial, as more than half of DPN patients are asymptomatic in the early stage. This study aimed to develop and validate multiple risk prediction models for DPN in patients with type 2 diabetes mellitus (T2DM) and to apply the Shapley Additive Explanation (SHAP) method to interpret the best-performing model and identify key risk factors for DPN. DESIGN A single-centre retrospective cohort study. SETTING The study was conducted at a tertiary teaching hospital in Hainan. PARTICIPANTS AND METHODS Data were retrospectively collected from the electronic medical records of patients with diabetes admitted between 1 January 2021 and 28 March 2023. After data preprocessing, 73 variables were retained for baseline analysis. Feature selection was performed using univariate analysis combined with recursive feature elimination (RFE). The dataset was split into training and test sets in an 8:2 ratio, with the training set balanced via the Synthetic Minority Over-sampling Technique. Six machine learning algorithms were applied to develop prediction models for DPN. Hyperparameters were optimised using grid search with 10-fold cross-validation. Model performance was assessed using various metrics on the test set, and the SHAP method was used to interpret the best-performing model. RESULTS The study included 3343 T2DM inpatients, with a median age of 60 years (IQR 53-69), and 88.6% (2962/3343) had DPN. The RFE method identified 12 key factors for model construction. Among the six models, XGBoost showed the best predictive performance, achieving an area under the curve of 0.960, accuracy of 0.927, precision of 0.969, recall of 0.948, F1-score of 0.958 and a G-mean of 0.850 on the test set. The SHAP analysis highlighted C reactive protein, total bile acids, gamma-glutamyl transpeptidase, age and lipoprotein(a) as the top five predictors of DPN. CONCLUSIONS The machine learning approach successfully established a DPN risk prediction model with excellent performance. The use of the interpretable SHAP method could enhance the model's clinical applicability.
Collapse
Affiliation(s)
- Lianhua Liu
- Department of Biostatistics, School of Public Health, Hainan Medical University, Haikou, Hainan, China
| | - Bo Bi
- Department of Biostatistics, School of Public Health, Hainan Medical University, Haikou, Hainan, China
| | - Mei Gui
- Department of Biostatistics, School of Public Health, Hainan Medical University, Haikou, Hainan, China
| | - Linli Zhang
- Department of Mathematics, Physics, and Chemistry teaching, Hainan University, Haikou, Hainan, China
| | - Feng Ju
- Department of Endocrinology, The Second Affiliated Hospital of Hainan Medical University, Haikou, Hainan, China
| | - Xiaodan Wang
- Department of Biostatistics, School of Public Health, Hainan Medical University, Haikou, Hainan, China
| | - Li Cao
- Department of Biostatistics, School of Public Health, Hainan Medical University, Haikou, Hainan, China
| |
Collapse
|
4
|
Kassaw EA, Sendekie AK, Enyew BM, Abate BB. Machine learning applications to classify and monitor medication adherence in patients with type 2 diabetes in Ethiopia. Front Endocrinol (Lausanne) 2025; 16:1486350. [PMID: 40182636 PMCID: PMC11965118 DOI: 10.3389/fendo.2025.1486350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 02/28/2025] [Indexed: 04/05/2025] Open
Abstract
Background Medication adherence plays a crucial role in determining the health outcomes of patients, particularly those with chronic conditions like type 2 diabetes. Despite its significance, there is limited evidence regarding the use of machine learning (ML) algorithms to predict medication adherence within the Ethiopian population. The primary objective of this study was to develop and evaluate ML models designed to classify and monitor medication adherence levels among patients with type 2 diabetes in Ethiopia, to improve patient care and health outcomes. Methods Using a random sampling technique in a cross-sectional study, we obtained data from 403 patients with type 2 diabetes at the University of Gondar Comprehensive Specialized Hospital (UoGCSH), excluding 13 subjects who were unable to respond and 6 with incomplete data from an initial cohort of 422. Medication adherence was assessed using the General Medication Adherence Scale (GMAS), an eleven-item Likert scale questionnaire. The responses served as features to train and test machine learning (ML) models. To address data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. The dataset was split using stratified K-fold cross-validation to preserve the distribution of adherence levels. Eight widely used ML algorithms were employed to develop the models, and their performance was evaluated using metrics such as accuracy, precision, recall, and F1 score. The best-performing model was subsequently deployed for further analysis. Results Out of 422 enrolled patients, 403 data samples were collected, with 11 features extracted from each respondent. To mitigate potential class imbalance, the dataset was increased to 620 samples using the Synthetic Minority Over-sampling Technique (SMOTE). Machine learning models including Logistic Regression (LR), Support Vector Machine (SVM), K Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Gradient Boost Classifier (GBC), Multilayer Perceptron (MLP), and 1D Convolutional Neural Network (1DCNN) were developed and evaluated. Although the performance differences among the models were subtle (within a range of 0.001), the SVM classifier outperformed the others, achieving a recall of 0.9979 and an AUC of 0.9998. Consequently, the SVM model was selected for deployment to monitor and detect patients' medication adherence levels, enabling timely interventions to improve patient outcomes. Conclusions This study highlights a variety of machine learning (ML) models that can be effectively used to monitor and classify medication adherence in diabetic patients in Ethiopia. However, to fully realize the potential impact of digital health applications, further studies that include patients from diverse settings are necessary. Such research could enhance the generalizability of these models and provide insights into the broader applicability of digital tools for improving medication adherence and patient outcomes in varying healthcare contexts.
Collapse
Affiliation(s)
- Ewunate Assaye Kassaw
- Department of Biomedical Engineering, Institute of Technology, University of Gondar, Gondar, Ethiopia
- Center for Biomedical Engineering, Indian Institute of Technology, Delhi, New Delhi, India
| | - Ashenafi Kibret Sendekie
- Department of Clinical Pharmacy, School of Pharmacy, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
- Curtin Medical School, Faculty of Health Sciences, Curtin University, Bentley, WA, Australia
| | - Bekele Mulat Enyew
- Department of Information Technology, College of Informatics, University of Gondar, Gondar, Ethiopia
| | - Biruk Beletew Abate
- College of Health Science, Woldia University, Woldia, Ethiopia
- School of Population Health, Curtin University, Bentley, WA, Australia
| |
Collapse
|
5
|
Li X, Yue X, Zhang L, Zheng X, Shang N. Pharmacist-led surgical medicines prescription optimization and prediction service improves patient outcomes - a machine learning based study. Front Pharmacol 2025; 16:1534552. [PMID: 40160467 PMCID: PMC11949800 DOI: 10.3389/fphar.2025.1534552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 02/25/2025] [Indexed: 04/02/2025] Open
Abstract
Background Optimizing prescription practices for surgical patients is crucial due to the complexity and sensitivity of their medication regimens. To enhance medication safety and improve patient outcomes by introducing a machine learning (ML)-based warning model integrated into a pharmacist-led Surgical Medicines Prescription Optimization and Prediction (SMPOP) service. Method A retrospective cohort design with a prospective implementation phase was used in a tertiary hospital. The study was divided into three phases: (1) Data analysis and ML model development (1 April 2019 to 31 March 2022), (2) Establishment of a pharmacist-led management model (1 April 2022 to 31 March 2023), and (3) Outcome evaluation (1 April 2023 to 31 March 2024). Key variables, including gender, age, number of comorbidities, type of surgery, surgery complexity, days from hospitalization to surgery, type of prescription, type of medication, route of administration, and prescriber's seniority were collected. The data set was divided into training set and test set in the form of 8:2. The effectiveness of the SMPOP service was evaluated based on prescription appropriateness, adverse drug reactions (ADRs), length of hospital stay, total hospitalization costs, and medication expenses. Results In Phase 1, 6,983 prescriptions were identified as potential prescription errors (PPEs) for ML model development, with 43.9% of them accepted by prescribers. The Random Forest (RF) model performed the best (AUC = 0.893) and retained high accuracy with 12 features (AUC = 0.886). External validation showed an AUC of 0.786. In Phase 2, SMPOP services were implemented, which effectively promoted effective communication between pharmacists and physicians and ensured the successful implementation of intervention measures. The SMPOP service was fully implemented. In Phase 3, the acceptance rate of pharmacist recommendations rose to 71.3%, while the length of stay, total hospitalization costs, and medication costs significantly decreased (p < 0.05), indicating overall improvement compared to Phase 1. Conclusion SMPOP service enhances prescription appropriateness, reduces ADRs, shortens stays, and lowers costs, underscoring the need for continuous innovation in healthcare.
Collapse
Affiliation(s)
- Xianlin Li
- Department of Pharmacy, The First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
- School of Pharmacy, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Xiunan Yue
- School of Pharmacy, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Lan Zhang
- School of Public Health, Capital Medical University, Beijing, China
| | - Xiaojun Zheng
- Department of Pharmacy, The First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Nan Shang
- Department of Pharmacy, The First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| |
Collapse
|
6
|
Chen Y, Zhang Y, Qin S, Yu F, Ni Y, Zhong J. The correlation between TyG-BMI and the risk of osteoporosis in middle-aged and elderly patients with type 2 diabetes mellitus. Front Nutr 2025; 12:1525105. [PMID: 40135223 PMCID: PMC11932904 DOI: 10.3389/fnut.2025.1525105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 02/26/2025] [Indexed: 03/27/2025] Open
Abstract
Background and objectives Osteoporosis (OP) has emerged as one of the most rapidly escalating complications associated with diabetes mellitus. However, the potential risk factors contributing to OP in patients with type 2 diabetes mellitus (T2DM) remain controversial. The aim of this study was to explore the relationship between triglyceride glucose-body mass index (TyG-BMI), a marker of insulin resistance calculated as Ln [triglyceride (TG, mg/dL) × fasting plasma glucose (mg/dL)/2] × BMI, and the risk of OP in T2DM patients. Methods This retrospective cross-sectional study enrolled 386 inpatients with T2DM, comprising both male and postmenopausal female participants aged 40 years or older. Individuals with significant medical histories or medications known to influence bone mineral density were excluded. Machine learning algorithms were employed to rank factors affecting OP risk. Logistic regression analysis was performed to identify independent influencing factors for OP, while subgroup analysis was conducted to evaluate the impact of TyG-BMI on OP across different subgroups. Restricted cubic spline (RCS) analysis was used to explore the dose-response relationship between TyG-BMI and OP. Additionally, the receiver operating characteristic (ROC) curve was utilized to assess the predictive efficiency of TyG-BMI for OP. Results Machine learning analysis identified TyG-BMI as the strongest predictor for type 2 diabetic osteoporosis in middle-aged and elderly patients. After adjusting for confounding factors, multivariate logistic regression analysis revealed that age, osteocalcin, and uric acid were independent influencing factors for OP. Notably, TyG-BMI also emerged as an independent risk factor for OP (95%CI 1.031-1.054, P < 0.01). Subgroup analysis demonstrated a consistent increase in OP risk with higher TyG-BMI levels across all subgroups. RCS analysis indicated a threshold effect, with the risk of OP gradually increasing when TyG-BMI exceeded 191.52. Gender-specific analysis showed increasing the risk of OP when TyG-BMI surpassed 186.21 in males and 198.46 in females, with a more pronounced trend observed in females. ROC suggested that TyG-BMI index has significant discriminative power for type 2 diabetic osteoporosis. Conclusion TyG-BMI has been identified as a robust predictive biomarker for assessing OP risk in middle-aged and elderly populations with T2DM.
Collapse
Affiliation(s)
| | | | | | | | | | - Jian Zhong
- Department of Endocrinology, The Third Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
7
|
Lee CH, Mendoza T, Huang CH, Sun TL. Vision-based postural balance assessment of sit-to-stand transitions performed by younger and older adults. Gait Posture 2025; 117:245-253. [PMID: 39798419 DOI: 10.1016/j.gaitpost.2025.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 12/17/2024] [Accepted: 01/02/2025] [Indexed: 01/15/2025]
Abstract
BACKGROUND The use of inertial measurement units (IMUs) in assessing fall risk is often limited by subject discomfort and challenges in data interpretation. Additionally, there is a scarcity of research on attitude estimation features. To address these issues, we explored novel features and representation methods in the context of sit-to-stand transitions. This study recorded sit-to-stand transition test data from three groups: community-dwelling elderly, elderly in day care centers (DCC), and college students, captured using mobile phone cameras. METHOD We employed pose estimation technology to extract key point kinematic features from the video data and used 10-fold cross-validation to train a random forest classifier, mitigating the impact of individual differences. We trained classifiers with the top 5, 10, and 15 features, calculating the average area under the receiver operating characteristic curve (AUC) for each model to compare feature importance. RESULTS Our results indicated that elbow key point features, such as (KP08) mean Y, (KP08)RMS Y, (KP09) mean Y, and (KP09) RMS Y, are crucial for distinguishing between subject groups. Statistical tests further validated the significance of these features. The application of human pose estimation and key point signals shows promise for clinical postural balance screening. The identified features can be utilized to develop non-invasive tools for assessing postural instability risk, contributing to fall prevention efforts. CONCLUSION This study lays the groundwork for integrating additional measurement modalities into sit-to-stand transition analysis to enhance clinical strategies.
Collapse
Affiliation(s)
- Chia-Hsuan Lee
- Department of Data Science, Soochow University, No.70, Linhsi Road, Shihlin District, Taipei, Taiwan
| | - Tomas Mendoza
- Department of Industrial Engineering and Management, Yuan Ze University, 135 Yuan Tung Road, Chungli District, Taoyuan, Taiwan
| | - Chien-Hua Huang
- Department of Long Term Care, Asia university, Taichung, Taiwan
| | - Tien-Lung Sun
- Department of Industrial Engineering and Management, Yuan Ze University, 135 Yuan Tung Road, Chungli District, Taoyuan, Taiwan.
| |
Collapse
|
8
|
Pedersen SM, Damslund N, Kjær T, Olsen KR. Optimising test intervals for individuals with type 2 diabetes: A machine learning approach. PLoS One 2025; 20:e0317722. [PMID: 39946322 PMCID: PMC11824975 DOI: 10.1371/journal.pone.0317722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 01/05/2025] [Indexed: 02/16/2025] Open
Abstract
BACKGROUND Chronic disease monitoring programs often adopt a one-size-fits-all approach that does not consider variation in need, potentially leading to excessive or insufficient support for patients at different risk levels. Machine learning (ML) developments offer new opportunities for personalised medicine in clinical practice. OBJECTIVE To demonstrate the potential of ML to guide resource allocation and tailored disease management, this study aims to predict the optimal testing interval for monitoring blood glucose (HbA1c) for patients with Type 2 Diabetes (T2D). We examine fairness across income and education levels and evaluate the risk of false-positives and false-negatives. DATA Danish administrative registers are linked with national clinical databases. Our population consists of all T2D patients from 2015-2018, a sample of more than 57,000. Data contains patient-level clinical measures, healthcare utilisation, medicine, and socio-demographics. METHODS We classify HbA1c test intervals into four categories (3, 6, 9, and 12 months) using three classification algorithms: logistic regression, random forest, and extreme gradient boosting (XGBoost). Feature importance is assessed with SHAP model explanations on the best-performing model, which was XGBoost. A training set comprising 80% of the data is used to predict optimal test intervals, with 20% reserved for testing. Cross-validation is employed to enhance the model's reliability and reduce overfitting. Model performance is evaluated using ROC-AUC, and optimal intervals are determined based on a "time-to-next-positive-test" concept, with different durations associated with specific intervals. RESULTS The model exhibits varying predictive accuracy, with AUC scores ranging from 0.53 to 0.89 across different test intervals. We find significant potential to free resources by prolonging the test interval for well-controlled patients. The fairness metric suggests models perform well in terms of equality. There is a sizeable risk of false negatives (predicting longer intervals than optimal), which requires attention. CONCLUSIONS We demonstrate the potential to use ML in personalised diabetes management by assisting physicians in categorising patients by testing frequencies. Clinical validation on diverse patient populations is needed to assess the model's performance in real-world settings.
Collapse
Affiliation(s)
- Sasja Maria Pedersen
- DaCHE, Department of Public Health, University of Southern Denmark, Odense, Denmark
| | - Nicolai Damslund
- DaCHE, Department of Public Health, University of Southern Denmark, Odense, Denmark
| | - Trine Kjær
- DaCHE, Department of Public Health, University of Southern Denmark, Odense, Denmark
| | - Kim Rose Olsen
- DaCHE, Department of Public Health, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
9
|
Khurshid MR, Manzoor S, Sadiq T, Hussain L, Khan MS, Dutta AK. Unveiling diabetes onset: Optimized XGBoost with Bayesian optimization for enhanced prediction. PLoS One 2025; 20:e0310218. [PMID: 39854291 PMCID: PMC11760023 DOI: 10.1371/journal.pone.0310218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 08/27/2024] [Indexed: 01/26/2025] Open
Abstract
Diabetes, a chronic condition affecting millions worldwide, necessitates early intervention to prevent severe complications. While accurately predicting diabetes onset or progression remains challenging due to complex and imbalanced datasets, recent advancements in machine learning offer potential solutions. Traditional prediction models, often limited by default parameters, have been superseded by more sophisticated approaches. Leveraging Bayesian optimization to fine-tune XGBoost, researchers can harness the power of complex data analysis to improve predictive accuracy. By identifying key factors influencing diabetes risk, personalized prevention strategies can be developed, ultimately enhancing patient outcomes. Successful implementation requires meticulous data management, stringent ethical considerations, and seamless integration into healthcare systems. This study focused on optimizing the hyperparameters of an XGBoost ensemble machine learning model using Bayesian optimization. Compared to grid search XGBoost (accuracy: 97.24%, F1-score: 95.72%, MCC: 81.02%), the XGBoost with Bayesian optimization achieved slightly improved performance (accuracy: 97.26%, F1-score: 95.72%, MCC:81.18%). Although the improvements observed in this study are modest, the optimized XGBoost model with Bayesian optimization represents a promising step towards revolutionizing diabetes prevention and treatment. This approach holds significant potential to improve outcomes for individuals at risk of developing diabetes.
Collapse
Affiliation(s)
| | - Sadaf Manzoor
- Department of Statistics, Islamia University College, Peshawar, Khyber Pakhtunkhwa, Pakistan
| | - Touseef Sadiq
- Centre for Artificial Intelligence Research (CAIR), Department of Information and Communication Technology, University of Agder, Kristiansand, Grimstad, Norway
| | - Lal Hussain
- Department of Computer Science & IT, Neelum Campus, The University of Azad Jammu and Kashmir, Athmuqam, Azad Kashmir, Pakistan
- Department of Computer Science & IT, King Abdullah Campus, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | | | - Ashit Kumar Dutta
- Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh, Kingdom of Saudi Arabia
| |
Collapse
|
10
|
Abousaber I, Abdallah HF, El-Ghaish H. Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets. Front Artif Intell 2025; 7:1499530. [PMID: 39839971 PMCID: PMC11747138 DOI: 10.3389/frai.2024.1499530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 12/12/2024] [Indexed: 01/23/2025] Open
Abstract
Introduction Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization. Methods A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy. Results Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments. Discussion The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.
Collapse
Affiliation(s)
- Inam Abousaber
- Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk, Saudi Arabia
| | - Haitham F. Abdallah
- Department of Electronics and Electrical Communication, Higher Institute of Engineering and Technology, Kafr El Sheikh, Egypt
| | - Hany El-Ghaish
- Department of Computer and Automatic Control, Faculty of Engineering, Tanta University, Tanta, Egypt
| |
Collapse
|
11
|
Sperling J, Welsh W, Haseley E, Quenstedt S, Muhigaba PB, Brown A, Ephraim P, Shafi T, Waitzkin M, Casarett D, Goldstein BA. Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use. J Am Med Inform Assoc 2025; 32:51-62. [PMID: 39545362 DOI: 10.1093/jamia/ocae255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/22/2024] [Accepted: 09/25/2024] [Indexed: 11/17/2024] Open
Abstract
OBJECTIVES This study aims to improve the ethical use of machine learning (ML)-based clinical prediction models (CPMs) in shared decision-making for patients with kidney failure on dialysis. We explore factors that inform acceptability, interpretability, and implementation of ML-based CPMs among multiple constituent groups. MATERIALS AND METHODS We collected and analyzed qualitative data from focus groups with varied end users, including: dialysis support providers (clinical providers and additional dialysis support providers such as dialysis clinic staff and social workers); patients; patients' caregivers (n = 52). RESULTS Participants were broadly accepting of ML-based CPMs, but with concerns on data sources, factors included in the model, and accuracy. Use was desired in conjunction with providers' views and explanations. Differences among respondent types were minimal overall but most prevalent in discussions of CPM presentation and model use. DISCUSSION AND CONCLUSION Evidence of acceptability of ML-based CPM usage provides support for ethical use, but numerous specific considerations in acceptability, model construction, and model use for shared clinical decision-making must be considered. There are specific steps that could be taken by data scientists and health systems to engender use that is accepted by end users and facilitates trust, but there are also ongoing barriers or challenges in addressing desires for use. This study contributes to emerging literature on interpretability, mechanisms for sharing complexities, including uncertainty regarding the model results, and implications for decision-making. It examines numerous stakeholder groups including providers, patients, and caregivers to provide specific considerations that can influence health system use and provide a basis for future research.
Collapse
Affiliation(s)
- Jessica Sperling
- Social Science Research Institute, Duke University, Durham, NC 27708, United States
- Clinical and Translational Science Institute, Duke University School of Medicine, Durham, NC 27701, United States
- Department of Medicine, Duke University School of Medicine, Durham, NC 27708, United States
| | - Whitney Welsh
- Social Science Research Institute, Duke University, Durham, NC 27708, United States
| | - Erin Haseley
- Social Science Research Institute, Duke University, Durham, NC 27708, United States
| | - Stella Quenstedt
- Clinical and Translational Science Institute, Duke University School of Medicine, Durham, NC 27701, United States
| | - Perusi B Muhigaba
- Clinical and Translational Science Institute, Duke University School of Medicine, Durham, NC 27701, United States
| | - Adrian Brown
- Social Science Research Institute, Duke University, Durham, NC 27708, United States
| | - Patti Ephraim
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY 11030, United States
| | - Tariq Shafi
- Department of Medicine, Houston Methodist, Houston, TX 77030, United States
| | - Michael Waitzkin
- Science & Society, Duke University, Durham, NC 27708, United States
| | - David Casarett
- Department of Medicine, Duke University School of Medicine, Durham, NC 27708, United States
| | - Benjamin A Goldstein
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708, United States
| |
Collapse
|
12
|
Khamis A, Abdul F, Dsouza S, Sulaiman F, Farooqi M, Al Awadi F, Hassanein M, Ahmed FS, Alsharhan M, AlOlama A, Ali N, Abdulaziz A, Rafie AM, Goswami N, Bayoumi R. Risk of Microvascular Complications in Newly Diagnosed Type 2 Diabetes Patients Using Automated Machine Learning Prediction Models. J Clin Med 2024; 13:7422. [PMID: 39685880 DOI: 10.3390/jcm13237422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 11/27/2024] [Accepted: 12/03/2024] [Indexed: 12/18/2024] Open
Abstract
Background/Objectives: In type 2 diabetes (T2D), collective damage to the eyes, kidneys, and peripheral nerves constitutes microvascular complications, which significantly affect patients' quality of life. This study aimed to prospectively evaluate the risk of microvascular complications in newly diagnosed T2D patients in Dubai, UAE. Methods: Supervised automated machine learning in the Auto-Classifier model of the IBM SPSS Modeler package was used to predict microvascular complications in a training data set of 348 long-term T2D patients with complications using 24 independent variables as predictors and complications as targets. Three automated model scenarios were tested: Full All-Variable Model; Univariate-Selected Model, and Backward Stepwise Logistic Regression Model. An independent cohort of 338 newly diagnosed T2D patients with no complications was used for the model validation. Results: Long-term T2D patients with complications (duration = ~14.5 years) were significantly older (mean age = 56.3 ± 10.9 years) than the newly diagnosed patients without complications (duration = ~2.5 years; mean age = 48.9 ± 9.6 years). The Bayesian Network was the most reliable algorithm for predicting microvascular complications in all three scenarios with an area under the curve (AUC) of 77-87%, accuracy of 68-75%, sensitivity of 86-95%, and specificity of 53-75%. Among newly diagnosed T2D patients, 22.5% were predicted positive and 49.1% negative across all models. Logistic regression applied to the 16 significant predictors between the two sub-groups showed that BMI, HDL, adjusted for age at diagnosis of T2D, age at visit, and urine albumin explained >90% of the variation in microvascular measures. Conclusions: the Bayesian Network model effectively predicts microvascular complications in newly diagnosed T2D patients, highlighting the significant roles of BMI, HDL, age at diagnosis, age at visit, and urine albumin.
Collapse
Affiliation(s)
- Amar Khamis
- Hamdan Bin Mohammed College of Dental Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| | - Fatima Abdul
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| | - Stafny Dsouza
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| | - Fatima Sulaiman
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| | - Muhammad Farooqi
- Dubai Diabetes Center, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Fatheya Al Awadi
- Endocrinology Department, Dubai Hospital, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Mohammed Hassanein
- Endocrinology Department, Dubai Hospital, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Fayha Salah Ahmed
- Pathology and Genetics Department, Dubai Hospital, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Mouza Alsharhan
- Pathology and Genetics Department, Dubai Hospital, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Ayesha AlOlama
- Primary Healthcare Centre, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Noorah Ali
- Primary Healthcare Centre, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Aaesha Abdulaziz
- Primary Healthcare Centre, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Alia Mohammad Rafie
- Primary Healthcare Centre, Dubai Health, Dubai P.O. Box 7272, United Arab Emirates
| | - Nandu Goswami
- Center for Space and Aviation Health, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| | - Riad Bayoumi
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai P.O. Box 505055, United Arab Emirates
| |
Collapse
|
13
|
Cao S, Yang S, Chen B, Chen X, Fu X, Tang S. Establishing a differential diagnosis model between primary membranous nephropathy and non-primary membranous nephropathy by machine learning algorithms. Ren Fail 2024; 46:2380752. [PMID: 39039848 PMCID: PMC11268222 DOI: 10.1080/0886022x.2024.2380752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 07/11/2024] [Indexed: 07/24/2024] Open
Abstract
CONTEXT Four algorithms with relatively balanced complexity and accuracy in deep learning classification algorithm were selected for differential diagnosis of primary membranous nephropathy (PMN). OBJECTIVE This study explored the most suitable classification algorithm for PMN identification, and to provide data reference for PMN diagnosis research. METHODS A total of 500 patients were referred to Luo-he Central Hospital from 2019 to 2021. All patients were diagnosed with primary glomerular disease confirmed by renal biopsy, contained 322 cases of PMN, the 178 cases of non-PMN. Using the decision tree, random forest, support vector machine, and extreme gradient boosting (Xgboost) to establish a differential diagnosis model for PMN and non-PMN. Based on the true positive rate, true negative rate, false-positive rate, false-negative rate, accuracy, feature work area under the curve (AUC) of subjects, the best performance of the model was chosen. RESULTS The efficiency of the Xgboost model based on the above evaluation indicators was the highest, which the diagnosis of PMN of the sensitivity and specificity, respectively 92% and 96%. CONCLUSIONS The differential diagnosis model for PMN was established successfully and the efficiency performance of the Xgboost model was the best. It could be used for the clinical diagnosis of PMN.
Collapse
Affiliation(s)
- Shangmei Cao
- Department of Science and Technology Innovation Center, Luohe Central Hospital, The First Affiliated Hospital of Luohe Medical College, Henan Key Laboratory of Fertility Protection and Aristogenesis, Luohe, China
| | - Shaozhe Yang
- Department of Science and Technology Innovation Center, Luohe Central Hospital, The First Affiliated Hospital of Luohe Medical College, Henan Key Laboratory of Fertility Protection and Aristogenesis, Luohe, China
| | - Bolin Chen
- Department of Science and Technology Innovation Center, Luohe Central Hospital, The First Affiliated Hospital of Luohe Medical College, Henan Key Laboratory of Fertility Protection and Aristogenesis, Luohe, China
| | - Xixia Chen
- Division of Nephrology, First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province, China
| | - Xiuhong Fu
- Department of Science and Technology Innovation Center, Luohe Central Hospital, The First Affiliated Hospital of Luohe Medical College, Henan Key Laboratory of Fertility Protection and Aristogenesis, Luohe, China
| | - Shuifu Tang
- Division of Nephrology, First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong Province, China
| |
Collapse
|
14
|
Campanella S, Paragliola G, Cherubini V, Pierleoni P, Palma L. Towards Personalized AI-Based Diabetes Therapy: A Review. IEEE J Biomed Health Inform 2024; 28:6944-6957. [PMID: 39137085 DOI: 10.1109/jbhi.2024.3443137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Insulin pumps and other smart devices have recently made significant advancements in the treatment of diabetes, a disorder that affects people all over the world. The development of medical AI has been influenced by AI methods designed to help physicians make diagnoses, choose a course of therapy, and predict outcomes. In this article, we thoroughly analyse how AI is being used to enhance and personalize diabetes treatment. The search turned up 77 original research papers, from which we've selected the most crucial information regarding the learning models employed, the data typology, the deployment stage, and the application domains. We identified two key trends, enabled mostly by AI: patient-based therapy personalization and therapeutic algorithm optimization. In the meanwhile, we point out various shortcomings in the existing literature, like a lack of multimodal database analysis or a lack of interpretability. The rapid improvements in AI and the expansion of the amount of data already available offer the possibility to overcome these difficulties shortly and enable a wider deployment of this technology in clinical settings.
Collapse
|
15
|
Heo S, Kang J, Barbé T, Kim J, Bertulfo TF, Troyan P, Streit L, Slocumb RH. Relationships of Multidimensional Factors to Diabetes Complications: A Cross-Sectional, Correlational Study. West J Nurs Res 2024; 46:664-673. [PMID: 39171415 PMCID: PMC11380359 DOI: 10.1177/01939459241271332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
BACKGROUND Diabetes complications are prevalent in people with diabetes, causing considerable individual suffering and increased health costs. However, the relationships of multidimensional, modifiable, and nonmodifiable factors to diabetes complications and the role of diabetes distress have been rarely examined. OBJECTIVE The aims of this study were to examine the associations of age, sex, knowledge, self-efficacy, self-compassion, resilience, self-esteem, depressive symptoms, diabetes distress, social support, and body mass index with diabetes complications and to investigate the mediating role of diabetes distress. METHODS In this cross-sectional, correlational study, data on all study variables were collected from 148 people with diabetes through REDCap in 2023. Multiple regression analysis and the PROCESS macro for SPSS were used to address the aims. RESULTS Older age and higher levels of diabetes distress were associated with more diabetes complications. Depressive symptoms were associated with diabetes distress; and diabetes distress, but not depressive symptoms, was associated with diabetes complications, controlling for all other variables. CONCLUSIONS Depressive symptoms and diabetes distress were directly or indirectly associated with diabetes complications, and diabetes distress was a mediator in the relationship between depressive symptoms and diabetes complications. Health care providers can target reduction of depressive symptoms and diabetes distress to reduce diabetes complications.
Collapse
Affiliation(s)
- Seongkum Heo
- Georgia Baptist College of Nursing, Mercer University, Atlanta, GA, USA
| | - JungHee Kang
- College of Nursing, University of Kentucky, Lexington, KY, USA
| | - Tammy Barbé
- Georgia Baptist College of Nursing, Mercer University, Atlanta, GA, USA
| | - JinShil Kim
- College of Nursing, Gachon University, Incheon, South Korea
| | - Tara F. Bertulfo
- Georgia Baptist College of Nursing, Mercer University, Atlanta, GA, USA
| | - Pattie Troyan
- Georgia Baptist College of Nursing, Mercer University, Atlanta, GA, USA
| | - Linda Streit
- Georgia Baptist College of Nursing, Mercer University, Atlanta, GA, USA
| | - Rhonda H. Slocumb
- College of Nursing and Health Sciences, Georgia Southwestern State University, Americus, GA, USA
| |
Collapse
|
16
|
Andersen JD, Stoltenberg CW, Jensen MH, Vestergaard P, Hejlesen O, Hangaard S. Machine Learning-Driven Prediction of Comorbidities and Mortality in Adults With Type 1 Diabetes. J Diabetes Sci Technol 2024:19322968241267779. [PMID: 39091237 PMCID: PMC11571562 DOI: 10.1177/19322968241267779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
BACKGROUND Comorbidities such as cardiovascular disease (CVD) and diabetic kidney disease (DKD) are major burdens of type 1 diabetes (T1D). Predicting people at high risk of developing comorbidities would enable early intervention. This study aimed to develop models incorporating socioeconomic status (SES) to predict CVD, DKD, and mortality in adults with T1D to improve early identification of comorbidities. METHODS Nationwide Danish registry data were used. Logistic regression models were developed to predict the development of CVD, DKD, and mortality within five years of T1D diagnosis. Features included age, sex, personal income, and education. Performance was evaluated by five-fold cross-validation with area under the receiver operating characteristic curve (AUROC) and the precision-recall area under the curve (PR-AUC). The importance of SES was assessed from feature importance plots. RESULTS Of the 6572 included adults (≥21 years) with T1D, 379 (6%) developed CVD, 668 (10%) developed DKD, and 921 (14%) died within the five-year follow-up. The AUROC (±SD) was 0.79 (±0.03) for CVD, 0.61 (±0.03) for DKD, and 0.87 (±0.01) for mortality. The PR-AUC was 0.18 (±0.01), 0.15 (±0.03), and 0.49 (±0.02), respectively. Based on feature importance plots, SES was the most important feature in the DKD model but had minimal impact on models for CVD and mortality. CONCLUSIONS The developed models showed good performance for predicting CVD and mortality, suggesting they could help in the early identification of these outcomes in individuals with T1D. The importance of SES in individual prediction within diabetes remains uncertain.
Collapse
Affiliation(s)
- Jonas Dahl Andersen
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Aalborg, Denmark
| | - Carsten Wridt Stoltenberg
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Aalborg, Denmark
| | - Morten Hasselstrøm Jensen
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Data Science, Novo Nordisk, Søborg, Denmark
| | - Peter Vestergaard
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Aalborg, Denmark
- Department of Endocrinology, Aalborg University Hospital, Aalborg, Denmark
| | - Ole Hejlesen
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Stine Hangaard
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Aalborg, Denmark
| |
Collapse
|
17
|
Johns E, Alkanj A, Beck M, Dal Mas L, Gourieux B, Sauleau EA, Michel B. Using machine learning or deep learning models in a hospital setting to detect inappropriate prescriptions: a systematic review. Eur J Hosp Pharm 2024; 31:289-294. [PMID: 38050067 PMCID: PMC11265547 DOI: 10.1136/ejhpharm-2023-003857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/07/2023] [Indexed: 12/06/2023] Open
Abstract
OBJECTIVES The emergence of artificial intelligence (AI) is catching the interest of hospital pharmacists. A massive collection of health data is now available to train AI models and hold the promise of disrupting codes and practices. The objective of this systematic review was to examine the state of the art of machine learning or deep learning models that detect inappropriate hospital medication orders. METHODS A systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. MEDLINE and Embase databases were searched from inception to May 2023. Studies were included if they reported and described an AI model intended for use by clinical pharmacists in hospitals. Risk of bias was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). RESULTS 13 articles were selected after review: 12 studies were judged to have high risk of bias; 11 studies were published between 2020 and 2023; 8 were conducted in North America and Asia; 6 analysed orders and detected inappropriate prescriptions according to patient profiles and medication orders; and 7 detected specific inappropriate prescriptions, such as detecting antibiotic resistance, dosage abnormality in prescriptions, high alert drugs errors from prescriptions or predicting the risk of adverse drug events. Various AI models were used, mainly supervised learning techniques. The training datasets used were very heterogeneous; the length of study varied from 2 weeks to 7 years and the number of prescription orders analysed went from 31 to 5 804 192. CONCLUSIONS This systematic review points out that, to date, few original research studies report AI tools based on machine or deep learning in the field of hospital clinical pharmacy. However, these original articles, while preliminary, highlighted the potential value of integrating AI into clinical hospital pharmacy practice.
Collapse
Affiliation(s)
- Erin Johns
- Direction de la Qualité, de la Performance et de l'Innovation, Agence Régionale de Santé Grand Est Site de Strasbourg, Strasbourg, Grand Est, France
- IMAGeS, Laboratoire des Sciences de l'Ingénieur de l'Informatique et de l'Imagerie, Illkirch, Grand Est, France
| | - Ahmad Alkanj
- Laboratoire de Pharmacologie et de Toxicologie Neurocardiovasculaire, Université de Strasbourg, Strasbourg, Grand Est, France
| | - Morgane Beck
- Direction de la Qualité, de la Performance et de l'Innovation, Agence Régionale de Santé Grand Est Site de Strasbourg, Strasbourg, Grand Est, France
| | - Laurent Dal Mas
- Direction de la Qualité, de la Performance et de l'Innovation, Agence Régionale de Santé Grand Est Site de Strasbourg, Strasbourg, Grand Est, France
| | - Benedicte Gourieux
- Laboratoire de Pharmacologie et de Toxicologie Neurocardiovasculaire, Université de Strasbourg, Strasbourg, Grand Est, France
- Service Pharmacie - Stérilisation, Les Hopitaux Universitaires de Strasbourg, Strasbourg, Grand Est, France
| | - Erik-André Sauleau
- IMAGeS, Laboratoire des Sciences de l'Ingénieur de l'Informatique et de l'Imagerie, Illkirch, Grand Est, France
- Département de Santé Publique - Groupe Méthodes Recherche Clinique, Les Hopitaux Universitaires de Strasbourg, Strasbourg, Grand Est, France
| | - Bruno Michel
- Laboratoire de Pharmacologie et de Toxicologie Neurocardiovasculaire, Université de Strasbourg, Strasbourg, Grand Est, France
- Service Pharmacie - Stérilisation, Les Hopitaux Universitaires de Strasbourg, Strasbourg, Grand Est, France
| |
Collapse
|
18
|
Carrillo-Larco RM, Bravo-Rocca G, Castillo-Cara M, Xu X, Bernabe-Ortiz A. A multimodal approach using fundus images and text meta-data in a machine learning classifier with embeddings to predict years with self-reported diabetes - An exploratory analysis. Prim Care Diabetes 2024; 18:327-332. [PMID: 38616442 DOI: 10.1016/j.pcd.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/17/2024] [Accepted: 04/09/2024] [Indexed: 04/16/2024]
Abstract
AIMS Machine learning models can use image and text data to predict the number of years since diabetes diagnosis; such model can be applied to new patients to predict, approximately, how long the new patient may have lived with diabetes unknowingly. We aimed to develop a model to predict self-reported diabetes duration. METHODS We used the Brazilian Multilabel Ophthalmological Dataset. Unit of analysis was the fundus image and its meta-data, regardless of the patient. We included people 40 + years and fundus images without diabetic retinopathy. Fundus images and meta-data (sex, age, comorbidities and taking insulin) were passed to the MedCLIP model to extract the embedding representation. The embedding representation was passed to an Extra Tree Classifier to predict: 0-4, 5-9, 10-14 and 15 + years with self-reported diabetes. RESULTS There were 988 images from 563 people (mean age = 67 years; 64 % were women). Overall, the F1 score was 57 %. The group 15 + years of self-reported diabetes had the highest precision (64 %) and F1 score (63 %), while the highest recall (69 %) was observed in the group 0-4 years. The proportion of correctly classified observations was 55 % for the group 0-4 years, 51 % for 5-9 years, 58 % for 10-14 years, and 64 % for 15 + years with self-reported diabetes. CONCLUSIONS The machine learning model had acceptable accuracy and F1 score, and correctly classified more than half of the patients according to diabetes duration. Using large foundational models to extract image and text embeddings seems a feasible and efficient approach to predict years living with self-reported diabetes.
Collapse
Affiliation(s)
- Rodrigo M Carrillo-Larco
- Hubert Department of Global Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA; Emory Global Diabetes Research Center, Emory University, Atlanta, GA, USA.
| | | | | | - Xiaolin Xu
- School of Public Health, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China; The Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, Hangzhou, China; School of Public Health, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | | |
Collapse
|
19
|
Balagopalan A, Baldini I, Celi LA, Gichoya J, McCoy LG, Naumann T, Shalit U, van der Schaar M, Wagstaff KL. Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact. PLOS DIGITAL HEALTH 2024; 3:e0000474. [PMID: 38620047 PMCID: PMC11018283 DOI: 10.1371/journal.pdig.0000474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/18/2024] [Indexed: 04/17/2024]
Abstract
Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.
Collapse
Affiliation(s)
- Aparna Balagopalan
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
| | - Ioana Baldini
- IBM Research; Yorktown Heights, New York, United States of America
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center; Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health; Boston, Massachusetts, United States of America
| | - Judy Gichoya
- Department of Radiology and Imaging Sciences, School of Medicine, Emory University; Atlanta, Georgia, United States of America
| | - Liam G. McCoy
- Division of Neurology, Department of Medicine, University of Alberta; Edmonton, Alberta, Canada
| | - Tristan Naumann
- Microsoft Research; Redmond, Washington, United States of America
| | - Uri Shalit
- The Faculty of Data and Decision Sciences, Technion; Haifa, Israel
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge; Cambridge, United Kingdom
- The Alan Turing Institute; London, United Kingdom
| | | |
Collapse
|
20
|
Karmand H, Andishgar A, Tabrizi R, Sadeghi A, Pezeshki B, Ravankhah M, Taherifard E, Ahmadizar F. Machine-learning algorithms in screening for type 2 diabetes mellitus: Data from Fasa Adults Cohort Study. Endocrinol Diabetes Metab 2024; 7:e00472. [PMID: 38411386 PMCID: PMC10897867 DOI: 10.1002/edm2.472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/10/2024] [Accepted: 01/30/2024] [Indexed: 02/28/2024] Open
Abstract
INTRODUCTION The application of machine learning (ML) is increasingly growing in biomedical sciences. This study aimed to evaluate factors associated with type 2 diabetes mellitus (T2DM) and compare the performance of ML methods in identifying individuals with the disease in an Iranian setting. METHODS Using the baseline data from Fasa Adult Cohort Study (FACS) and in a sex-stratified manner, we studied factors associated with T2DM by applying seven different ML methods including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbours (KNN), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB) and Bagging classifier (BAG). We further compared the performance of these methods; for each algorithm, accuracy, precision, sensitivity, specificity, F1 score, and Area Under Curve (AUC) were calculated. RESULTS 10,112 participants were recruited between 2014 and 2016, of whom 1246 had T2DM at baseline. 4566 (45%) participants were males, aged between 35 and 70 years. For males, age, sugar consumption, and history of hospitalization were the most weighted variables regarding their importance in screening for T2DM using the GBM model, respectively; these variables were sugar consumption, urine blood, and age for females. GBM outperformed other models for both males and females with AUC of 0.75 (0.69-0.82) and 0.76 (0.71-0.80), and F1 score of 0.33 (0.27-0.39) and 0.42 (0.38-0.46), respectively. GBM also showed a sensitivity of 0.24 (0.19-0.29) and a specificity of 0.98 (0.96-1.0) in males and a sensitivity of 0.38 (0.34-0.42) and specificity of 0.92 (0.89-0.95) in females. Notably, close performance characteristics were detected among other ML models. CONCLUSIONS GBM model might achieve better performance in screening for T2DM in a south Iranian population.
Collapse
Affiliation(s)
- Hanieh Karmand
- Student Research Committee, School of MedicineFasa University of Medical SciencesFasaIran
| | | | - Reza Tabrizi
- Noncommunicable Diseases Research CenterFasa University of Medical ScienceFasaIran
| | - Alireza Sadeghi
- Student Research Committee, School of MedicineShiraz University of Medical SciencesShirazIran
- Health Policy Research Center, School of MedicineShiraz University of Medical SciencesShirazIran
| | - Babak Pezeshki
- Clinical Research Development Unit, Valiasr HospitalFasa University of Medical SciencesFasaIran
| | - Mahdi Ravankhah
- Student Research Committee, School of MedicineShiraz University of Medical SciencesShirazIran
| | - Erfan Taherifard
- Student Research Committee, School of MedicineShiraz University of Medical SciencesShirazIran
- Health Policy Research Center, School of MedicineShiraz University of Medical SciencesShirazIran
| | - Fariba Ahmadizar
- Data Science and Biostatistics DepartmentJulius Global HealthUtrechtThe Netherlands
| |
Collapse
|
21
|
Zrubka Z, Kertész G, Gulácsi L, Czere J, Hölgyesi Á, Nezhad HM, Mosavi A, Kovács L, Butte AJ, Péntek M. The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review. J Med Internet Res 2024; 26:e47430. [PMID: 38241075 PMCID: PMC10837761 DOI: 10.2196/47430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/29/2023] [Accepted: 11/17/2023] [Indexed: 01/23/2024] Open
Abstract
BACKGROUND Diabetes mellitus (DM) is a major health concern among children with the widespread adoption of advanced technologies. However, concerns are growing about the transparency, replicability, biasedness, and overall validity of artificial intelligence studies in medicine. OBJECTIVE We aimed to systematically review the reporting quality of machine learning (ML) studies of pediatric DM using the Minimum Information About Clinical Artificial Intelligence Modelling (MI-CLAIM) checklist, a general reporting guideline for medical artificial intelligence studies. METHODS We searched the PubMed and Web of Science databases from 2016 to 2020. Studies were included if the use of ML was reported in children with DM aged 2 to 18 years, including studies on complications, screening studies, and in silico samples. In studies following the ML workflow of training, validation, and testing of results, reporting quality was assessed via MI-CLAIM by consensus judgments of independent reviewer pairs. Positive answers to the 17 binary items regarding sufficient reporting were qualitatively summarized and counted as a proxy measure of reporting quality. The synthesis of results included testing the association of reporting quality with publication and data type, participants (human or in silico), research goals, level of code sharing, and the scientific field of publication (medical or engineering), as well as with expert judgments of clinical impact and reproducibility. RESULTS After screening 1043 records, 28 studies were included. The sample size of the training cohort ranged from 5 to 561. Six studies featured only in silico patients. The reporting quality was low, with great variation among the 21 studies assessed using MI-CLAIM. The number of items with sufficient reporting ranged from 4 to 12 (mean 7.43, SD 2.62). The items on research questions and data characterization were reported adequately most often, whereas items on patient characteristics and model examination were reported adequately least often. The representativeness of the training and test cohorts to real-world settings and the adequacy of model performance evaluation were the most difficult to judge. Reporting quality improved over time (r=0.50; P=.02); it was higher than average in prognostic biomarker and risk factor studies (P=.04) and lower in noninvasive hypoglycemia detection studies (P=.006), higher in studies published in medical versus engineering journals (P=.004), and higher in studies sharing any code of the ML pipeline versus not sharing (P=.003). The association between expert judgments and MI-CLAIM ratings was not significant. CONCLUSIONS The reporting quality of ML studies in the pediatric population with DM was generally low. Important details for clinicians, such as patient characteristics; comparison with the state-of-the-art solution; and model examination for valid, unbiased, and robust results, were often the weak points of reporting. To assess their clinical utility, the reporting standards of ML studies must evolve, and algorithms for this challenging population must become more transparent and replicable.
Collapse
Affiliation(s)
- Zsombor Zrubka
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - Gábor Kertész
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| | - László Gulácsi
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - János Czere
- Doctoral School of Innovation Management, Óbuda University, Budapest, Hungary
| | - Áron Hölgyesi
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
- Doctoral School of Molecular Medicine, Semmelweis University, Budapest, Hungary
| | - Hossein Motahari Nezhad
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
- Doctoral School of Business and Management, Corvinus University of Budapest, Budapest, Hungary
| | - Amir Mosavi
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| | - Levente Kovács
- Physiological Controls Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, United States
| | - Márta Péntek
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| |
Collapse
|
22
|
Abas MZ, Li K, Hairi NN, Choo WY, Wan KS. Machine learning based predictive model of Type 2 diabetes complications using Malaysian National Diabetes Registry: A study protocol. J Public Health Res 2024; 13:22799036241231786. [PMID: 38434578 PMCID: PMC10906050 DOI: 10.1177/22799036241231786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 01/24/2024] [Indexed: 03/05/2024] Open
Abstract
Background The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques. Design and methods This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified k-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics. Expected impact of the study on public health The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.
Collapse
Affiliation(s)
| | - Ken Li
- University College London, London, UK
| | | | | | - Kim Sui Wan
- Institute of Public Health, Ministry of Health Malaysia, Selangor, Malaysia
| |
Collapse
|
23
|
García-Domínguez A, Galván-Tejada CE, Magallanes-Quintanar R, Gamboa-Rosales H, Curiel IG, Peralta-Romero J, Cruz M. Diabetes Detection Models in Mexican Patients by Combining Machine Learning Algorithms and Feature Selection Techniques for Clinical and Paraclinical Attributes: A Comparative Evaluation. J Diabetes Res 2023; 2023:9713905. [PMID: 37404324 PMCID: PMC10317588 DOI: 10.1155/2023/9713905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 06/08/2023] [Accepted: 06/18/2023] [Indexed: 07/06/2023] Open
Abstract
The development of medical diagnostic models to support healthcare professionals has witnessed remarkable growth in recent years. Among the prevalent health conditions affecting the global population, diabetes stands out as a significant concern. In the domain of diabetes diagnosis, machine learning algorithms have been widely explored for generating disease detection models, leveraging diverse datasets primarily derived from clinical studies. The performance of these models heavily relies on the selection of the classifier algorithm and the quality of the dataset. Therefore, optimizing the input data by selecting relevant features becomes essential for accurate classification. This research presents a comprehensive investigation into diabetes detection models by integrating two feature selection techniques: the Akaike information criterion and genetic algorithms. These techniques are combined with six prominent classifier algorithms, including support vector machine, random forest, k-nearest neighbor, gradient boosting, extra trees, and naive Bayes. By leveraging clinical and paraclinical features, the generated models are evaluated and compared to existing approaches. The results demonstrate superior performance, surpassing accuracies of 94%. Furthermore, the use of feature selection techniques allows for working with a reduced dataset. The significance of feature selection is underscored in this study, showcasing its pivotal role in enhancing the performance of diabetes detection models. By judiciously selecting relevant features, this approach contributes to the advancement of medical diagnostic capabilities and empowers healthcare professionals in making informed decisions regarding diabetes diagnosis and treatment.
Collapse
Affiliation(s)
- Antonio García-Domínguez
- Academic Unit of Electrical Engineering, Autonomous University of Zacatecas, Juárez Garden 147, Downtown, Zacatecas 98000, Mexico
| | - Carlos E. Galván-Tejada
- Academic Unit of Electrical Engineering, Autonomous University of Zacatecas, Juárez Garden 147, Downtown, Zacatecas 98000, Mexico
| | - Rafael Magallanes-Quintanar
- Academic Unit of Electrical Engineering, Autonomous University of Zacatecas, Juárez Garden 147, Downtown, Zacatecas 98000, Mexico
| | - Hamurabi Gamboa-Rosales
- Academic Unit of Electrical Engineering, Autonomous University of Zacatecas, Juárez Garden 147, Downtown, Zacatecas 98000, Mexico
| | - Irma González Curiel
- Academic Unit of Chemical Sciences, Autonomous University of Zacatecas, Juarez Garden 147, Downtown, Zacatecas 98000, Mexico
| | - Jesús Peralta-Romero
- Medical Research Unit in Biochemistry, Specialties Hospital, National Medical Center Siglo XXI, Mexican Social Security Institute, Mexico City, Mexico
| | - Miguel Cruz
- Medical Research Unit in Biochemistry, Specialties Hospital, National Medical Center Siglo XXI, Mexican Social Security Institute, Mexico City, Mexico
| |
Collapse
|
24
|
Chemello G, Salvatori B, Morettini M, Tura A. Artificial Intelligence Methodologies Applied to Technologies for Screening, Diagnosis and Care of the Diabetic Foot: A Narrative Review. BIOSENSORS 2022; 12:985. [PMID: 36354494 PMCID: PMC9688674 DOI: 10.3390/bios12110985] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/26/2022] [Accepted: 11/04/2022] [Indexed: 06/16/2023]
Abstract
Diabetic foot syndrome is a multifactorial pathology with at least three main etiological factors, i.e., peripheral neuropathy, peripheral arterial disease, and infection. In addition to complexity, another distinctive trait of diabetic foot syndrome is its insidiousness, due to a frequent lack of early symptoms. In recent years, it has become clear that the prevalence of diabetic foot syndrome is increasing, and it is among the diabetes complications with a stronger impact on patient's quality of life. Considering the complex nature of this syndrome, artificial intelligence (AI) methodologies appear adequate to address aspects such as timely screening for the identification of the risk for foot ulcers (or, even worse, for amputation), based on appropriate sensor technologies. In this review, we summarize the main findings of the pertinent studies in the field, paying attention to both the AI-based methodological aspects and the main physiological/clinical study outcomes. The analyzed studies show that AI application to data derived by different technologies provides promising results, but in our opinion future studies may benefit from inclusion of quantitative measures based on simple sensors, which are still scarcely exploited.
Collapse
Affiliation(s)
- Gaetano Chemello
- CNR Institute of Neuroscience, Corso Stati Uniti 4, 35127 Padova, Italy
| | | | - Micaela Morettini
- Department of Information Engineering, Università Politecnica delle Marche, Via Brecce Bianche, 12, 60131 Ancona, Italy
| | - Andrea Tura
- CNR Institute of Neuroscience, Corso Stati Uniti 4, 35127 Padova, Italy
| |
Collapse
|
25
|
Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:7378307. [PMID: 35399848 PMCID: PMC8993553 DOI: 10.1155/2022/7378307] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/10/2022] [Accepted: 03/21/2022] [Indexed: 12/17/2022]
Abstract
Background Diabetic kidney disease (DKD), one of the complications of diabetes in patients, leads to progressive loss of kidney function. Timely intervention is known to improve outcomes. Therefore, screening patients to identify high-risk populations is important. Machine learning classification techniques can be applied to patient datasets to identify high-risk patients by building a predictive model. Objective This study aims to identify a suitable classification technique for predicting DKD by applying different classification techniques to a DKD dataset and comparing their performance using WEKA machine learning software. Methods The performance of nine different classification techniques was analyzed on a DKD dataset with 410 instances and 18 attributes. Data preprocessing was carried out using the PartitionMembershipFilter. A 10-fold cross validation was performed on the dataset. The performance was assessed on the basis of the execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error, and true values of the confusion matrix. Results With an accuracy of 93.6585% and a higher K value (0.8731), IBK and random tree classification techniques were found to be the best performing techniques. Moreover, they also exhibited the lowest root mean squared error rate (0.2496). There were 15 false-positive instances and 11 false-negative instances with these prediction models. Conclusions This study identified IBK and random tree classification techniques as the best performing classifiers and accurate prediction methods for DKD.
Collapse
|