1
|
Voskergian D, Bakir-Gungor B, Yousef M. Engineering novel features for diabetes complication prediction using synthetic electronic health records. Front Genet 2025; 16:1451290. [PMID: 40309033 PMCID: PMC12041673 DOI: 10.3389/fgene.2025.1451290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 01/31/2025] [Indexed: 05/02/2025] Open
Abstract
Diabetes significantly affects millions of people worldwide, leading to substantial morbidity, disability, and mortality rates. Predicting diabetes-related complications from health records is crucial for early prevention and for the development of effective treatment plans. In order to predict four different complications of diabetes mellitus, i.e., retinopathy, chronic kidney disease, ischemic heart disease, and amputations, this study introduces a novel feature engineering approach. While developing the classification models, we utilize XGBoost feature selection method and various supervised machine learning algorithms, including Random Forest, XGBoost, LogitBoost, AdaBoost, and Decision Tree. These models were trained on synthetic electronic health records (EHR) generated by dual-adversarial autoencoders. These EHRs represent nearly 1 million synthetic patients derived from an authentic cohort of 979,308 individuals with diabetes. The variables considered in the models were the age range accompanied by chronic diseases that occur during patient visits starting from the onset of diabetes. Throughout the experiments, XGBoost and Random Forest demonstrated the best overall prediction performance. The final models, which are tailored to each complication and trained using our feature engineering approach, achieved an accuracy between 69% and 77% and an AUC between 77% and 84% using cross-validation, while the partitioned validation approach yielded an accuracy between 59% and 78% and an AUC between 66% and 85%. These findings imply that the performance of our method surpass the performance of the traditional Bag-of-Features approach, highlighting the effectiveness of our approach in enhancing model accuracy and robustness.
Collapse
Affiliation(s)
- Daniel Voskergian
- Computer Engineering Department, Al-Quds University, Jerusalem, Palestine
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Türkiye
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
| |
Collapse
|
2
|
Hong X, Wang W, Yan S, Shen X, Zhang Y, Ye X. Related Factors Mining of Diabetes Complications Based on Manifold-Constrained Multi-Label Feature Selection. IEEE J Biomed Health Inform 2025; 29:643-656. [PMID: 38805335 DOI: 10.1109/jbhi.2024.3406135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
The primary cause of mortality among individuals with diabetes stems from complications. Identifying related factors for these complications holds immense potential for early prevention. Previous research predominantly employed traditional machine-learning techniques to establish prediction models utilizing medical indicators for related factor selection. However, uncovering the intricate correlations among complication labels and identifying similar characteristics among medical indicators has been challenging. We propose a novel embedded multi-label feature selection approach called LCFSM(Label Cosine and Feature Similar Manifold) to address the issue. LCFSM introduces manifold constraints into the objective function to uncover risk factors associated with diabetes complications. Label cosine similarity is set to optimize feature weights, forming label manifold constraints. Similarly, feature manifold constraints are established to utilize feature kernel similarity in optimizing feature weights. LCFSM formulates an objective function based on the regularized Least Squares and previous manifolds constraints, employing the Sylvester equation for convergence assurance. The experimental evaluation compares LCFSM against eight baselines, demonstrating superior performance in top-10 feature selection and feature stacking.LCFSM is applied to identify primary risk factors for diabetes complications. Related factors involve Electromyogram, Urine Routine Protein Positive, etc, offering valuable insights for early treatment.
Collapse
|
3
|
Nguyen PBH, Garger D, Lu D, Maalmi H, Prokisch H, Thorand B, Adamski J, Kastenmüller G, Waldenberger M, Gieger C, Peters A, Suhre K, Bönhof GJ, Rathmann W, Roden M, Grallert H, Ziegler D, Herder C, Menden MP. Interpretable multimodal machine learning (IMML) framework reveals pathological signatures of distal sensorimotor polyneuropathy. COMMUNICATIONS MEDICINE 2024; 4:265. [PMID: 39681608 DOI: 10.1038/s43856-024-00637-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/09/2024] [Indexed: 12/18/2024] Open
Abstract
BACKGROUND Distal sensorimotor polyneuropathy (DSPN) is a common neurological disorder in elderly adults and people with obesity, prediabetes and diabetes and is associated with high morbidity and premature mortality. DSPN is a multifactorial disease and not fully understood yet. METHODS Here, we developed the Interpretable Multimodal Machine Learning (IMML) framework for predicting DSPN prevalence and incidence based on sparse multimodal data. Exploiting IMMLs interpretability further empowered biomarker identification. We leveraged the population-based KORA F4/FF4 cohort including 1091 participants and their deep multimodal characterisation, i.e. clinical data, genomics, methylomics, transcriptomics, proteomics, inflammatory proteins and metabolomics. RESULTS Clinical data alone is sufficient to stratify individuals with and without DSPN (AUROC = 0.752), whilst predicting DSPN incidence 6.5 ± 0.2 years later strongly benefits from clinical data complemented with two or more molecular modalities (improved ΔAUROC > 0.1, achieved AUROC of 0.714). Important and interpretable features of incident DSPN prediction include up-regulation of proinflammatory cytokines, down-regulation of SUMOylation pathway and essential fatty acids, thus yielding novel insights in the disease pathophysiology. CONCLUSIONS These may become biomarkers for incident DSPN, guide prevention strategies and serve as proof of concept for the utility of IMML in studying complex diseases.
Collapse
Affiliation(s)
- Phong B H Nguyen
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany
- Faculty of Biology, Ludwig-Maximilians University Munich, 82152, Martinsried, Germany
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
| | - Daniel Garger
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany
- Faculty of Biology, Ludwig-Maximilians University Munich, 82152, Martinsried, Germany
| | - Diyuan Lu
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany
| | - Haifa Maalmi
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Holger Prokisch
- Institute of Neurogenomics, Helmholtz Munich, 85764, Neuherberg, Germany
- Institute of Human Genetics, Technical University Munich, 80333, Munich, Germany
| | - Barbara Thorand
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
- Institute of Epidemiology, Helmholtz Munich, 85764, Neuherberg, Germany
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians University Munich, 81377, Munich, Germany
| | - Jerzy Adamski
- Institute of Experimental Genetics, Helmholtz Munich, 85764, Neuherberg, Germany
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117597, Singapore
- Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, 1000, Ljubljana, Slovenia
| | - Gabi Kastenmüller
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany
- Institute of Bioinformatics and Systems Biology, Helmholtz Munich, 85764, Neuherberg, Germany
| | - Melanie Waldenberger
- Research Unit Molecular Epidemiology, Helmholtz Munich, 85764, Neuherberg, Germany
| | - Christian Gieger
- Research Unit Molecular Epidemiology, Helmholtz Munich, 85764, Neuherberg, Germany
| | - Annette Peters
- Institute of Epidemiology, Helmholtz Munich, 85764, Neuherberg, Germany
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians University Munich, 81377, Munich, Germany
| | - Karsten Suhre
- Department of Physiology and Biophysics, Weill Cornell Medicine - Qatar, Education City, Doha, 24144, Qatar
| | - Gidon J Bönhof
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Wolfgang Rathmann
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
- Institute of Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Michael Roden
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Harald Grallert
- Institute of Epidemiology, Helmholtz Munich, 85764, Neuherberg, Germany
- Research Unit Molecular Epidemiology, Helmholtz Munich, 85764, Neuherberg, Germany
| | - Dan Ziegler
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany.
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany.
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany.
| | - Michael P Menden
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany.
- Faculty of Biology, Ludwig-Maximilians University Munich, 82152, Martinsried, Germany.
- German Center for Diabetes Research (DZD), 85764, Neuherberg, Germany.
- Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
4
|
Sánchez CA, De Vries E, Gil F, Niño ME. Prediction model for lower limb amputation in hospitalized diabetic foot patients using classification and regression trees. Foot Ankle Surg 2024; 30:471-479. [PMID: 38575484 DOI: 10.1016/j.fas.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 03/01/2024] [Accepted: 03/16/2024] [Indexed: 04/06/2024]
Abstract
BACKGROUND The decision to perform amputation of a limb in a patient with diabetic foot ulcer (DFU) is not an easy task. Prediction models aim to help the surgeon in decision making scenarios. Currently there are no prediction model to determine lower limb amputation during the first 30 days of hospitalization for patients with DFU. METHODS Classification And Regression Tree analysis was applied on data from a retrospective cohort of patients hospitalized for the management of diabetic foot ulcer, using an existing database from two Orthopaedics and Traumatology departments. The secondary analysis identified independent variables that can predict lower limb amputation (mayor or minor) during the first 30 days of hospitalization. RESULTS Of the 573 patients in the database, 290 feet underwent a lower limb amputation during the first 30 days of hospitalization. Six different models were developed using a loss matrix to evaluate the error of not detecting false negatives. The selected tree produced 13 terminal nodes and after the pruning process, only one division remained in the optimal tree (Sensitivity: 69%, Specificity: 75%, Area Under the Curve: 0.76, Complexity Parameter: 0.01, Error: 0.85). Among the studied variables, the Wagner classification with a cut-off grade of 3 exceeded others in its predicting capacity. CONCLUSIONS Wagner classification was the variable with the best capacity for predicting amputation within 30 days. Infectious state and vascular occlusion described indirectly by this classification reflects the importance of taking quick decisions in those patients with a higher compromise of these two conditions. Finally, an external validation of the model is still required. LEVEL OF EVIDENCE III.
Collapse
Affiliation(s)
- C A Sánchez
- Department of Clinical Epidemiology and Biostatistics, Pontificia Universidad Javeriana, Bogotá, Colombia; Department of Orthopaedics and Traumatology, Hospital Universitario de la Samaritana, Bogotá, Colombia.
| | - E De Vries
- Department of Clinical Epidemiology and Biostatistics, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - F Gil
- Department of Orthopaedics and Traumatology, Hospital Universitario de la Samaritana, Bogotá, Colombia
| | - M E Niño
- Foot and ankle surgery, Clínica del Country and Hospital Militar Central, Bogotá, Colombia
| |
Collapse
|
5
|
Mesquita F, Bernardino J, Henriques J, Raposo JF, Ribeiro RT, Paredes S. Machine learning techniques to predict the risk of developing diabetic nephropathy: a literature review. J Diabetes Metab Disord 2024; 23:825-839. [PMID: 38932857 PMCID: PMC11196462 DOI: 10.1007/s40200-023-01357-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/20/2023] [Indexed: 06/28/2024]
Abstract
Purpose Diabetes is a major public health challenge with widespread prevalence, often leading to complications such as Diabetic Nephropathy (DN)-a chronic condition that progressively impairs kidney function. In this context, it is important to evaluate if Machine learning models can exploit the inherent temporal factor in clinical data to predict the risk of developing DN faster and more accurately than current clinical models. Methods Three different databases were used for this literature review: Scopus, Web of Science, and PubMed. Only articles written in English and published between January 2015 and December 2022 were included. Results We included 11 studies, from which we discuss a number of algorithms capable of extracting knowledge from clinical data, incorporating dynamic aspects in patient assessment, and exploring their evolution over time. We also present a comparison of the different approaches, their performance, advantages, disadvantages, interpretation, and the value that the time factor can bring to a more successful prediction of diabetic nephropathy. Conclusion Our analysis showed that some studies ignored the temporal factor, while others partially exploited it. Greater use of the temporal aspect inherent in Electronic Health Records (EHR) data, together with the integration of omics data, could lead to the development of more reliable and powerful predictive models.
Collapse
Affiliation(s)
- F. Mesquita
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
| | - J. Bernardino
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - J. Henriques
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - JF. Raposo
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - RT. Ribeiro
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - S. Paredes
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| |
Collapse
|
6
|
Maleki Varnosfaderani S, Forouzanfar M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering (Basel) 2024; 11:337. [PMID: 38671759 PMCID: PMC11047988 DOI: 10.3390/bioengineering11040337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
As healthcare systems around the world face challenges such as escalating costs, limited access, and growing demand for personalized care, artificial intelligence (AI) is emerging as a key force for transformation. This review is motivated by the urgent need to harness AI's potential to mitigate these issues and aims to critically assess AI's integration in different healthcare domains. We explore how AI empowers clinical decision-making, optimizes hospital operation and management, refines medical image analysis, and revolutionizes patient care and monitoring through AI-powered wearables. Through several case studies, we review how AI has transformed specific healthcare domains and discuss the remaining challenges and possible solutions. Additionally, we will discuss methodologies for assessing AI healthcare solutions, ethical challenges of AI deployment, and the importance of data privacy and bias mitigation for responsible technology use. By presenting a critical assessment of AI's transformative potential, this review equips researchers with a deeper understanding of AI's current and future impact on healthcare. It encourages an interdisciplinary dialogue between researchers, clinicians, and technologists to navigate the complexities of AI implementation, fostering the development of AI-driven solutions that prioritize ethical standards, equity, and a patient-centered approach.
Collapse
Affiliation(s)
| | - Mohamad Forouzanfar
- Département de Génie des Systèmes, École de Technologie Supérieure (ÉTS), Université du Québec, Montréal, QC H3C 1K3, Canada
- Centre de Recherche de L’institut Universitaire de Gériatrie de Montréal (CRIUGM), Montréal, QC H3W 1W5, Canada
| |
Collapse
|
7
|
P A, KS G, S RS, J BP, KN T, D D. Diabetic Foot Complication Avoidance Through a Wearable Sensor and Random Forest Classifier for Automated Evaluation. 2024 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS) 2024:846-851. [DOI: 10.1109/icaccs60874.2024.10717102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
Affiliation(s)
- Arunkumar P
- KPR Institute of Engineering and Technology,Department of Biomedical Engineering,Coimbatore,India
| | - Gayathri KS
- KPR Institute of Engineering and Technology,Department of Biomedical Engineering,Coimbatore,India
| | - Ridana Sri S
- KPR Institute of Engineering and Technology,Department of Biomedical Engineering,Coimbatore,India
| | - Bharathi Prabha J
- KPR Institute of Engineering and Technology,Department of Biomedical Engineering,Coimbatore,India
| | - Thangaraj KN
- Balaclinic & Diabetes Centre,Department of Diabetology,Tiruppur,India
| | - Deepak D
- Balaclinic & Diabetes Centre,Department of Diabetology,Tiruppur,India
| |
Collapse
|
8
|
Abas MZ, Li K, Hairi NN, Choo WY, Wan KS. Machine learning based predictive model of Type 2 diabetes complications using Malaysian National Diabetes Registry: A study protocol. J Public Health Res 2024; 13:22799036241231786. [PMID: 38434578 PMCID: PMC10906050 DOI: 10.1177/22799036241231786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 01/24/2024] [Indexed: 03/05/2024] Open
Abstract
Background The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques. Design and methods This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified k-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics. Expected impact of the study on public health The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.
Collapse
Affiliation(s)
| | - Ken Li
- University College London, London, UK
| | | | | | - Kim Sui Wan
- Institute of Public Health, Ministry of Health Malaysia, Selangor, Malaysia
| |
Collapse
|
9
|
Shojaee-Mend H, Velayati F, Tayefi B, Babaee E. Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study. Healthc Inform Res 2024; 30:73-82. [PMID: 38359851 PMCID: PMC10879823 DOI: 10.4258/hir.2024.30.1.73] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 02/17/2024] Open
Abstract
OBJECTIVES This study aimed to develop a model to predict fasting blood glucose status using machine learning and data mining, since the early diagnosis and treatment of diabetes can improve outcomes and quality of life. METHODS This crosssectional study analyzed data from 3376 adults over 30 years old at 16 comprehensive health service centers in Tehran, Iran who participated in a diabetes screening program. The dataset was balanced using random sampling and the synthetic minority over-sampling technique (SMOTE). The dataset was split into training set (80%) and test set (20%). Shapley values were calculated to select the most important features. Noise analysis was performed by adding Gaussian noise to the numerical features to evaluate the robustness of feature importance. Five different machine learning algorithms, including CatBoost, random forest, XGBoost, logistic regression, and an artificial neural network, were used to model the dataset. Accuracy, sensitivity, specificity, accuracy, the F1-score, and the area under the curve were used to evaluate the model. RESULTS Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important factors for predicting fasting blood glucose status. Though the models achieved similar predictive ability, the CatBoost model performed slightly better overall with 0.737 area under the curve (AUC). CONCLUSIONS A gradient boosted decision tree model accurately identified the most important risk factors related to diabetes. Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important risk factors for diabetes, respectively. This model can support planning for diabetes management and prevention.
Collapse
Affiliation(s)
- Hassan Shojaee-Mend
- Infectious Diseases Research Center, Gonabad University of Medical Sciences, Gonabad,
Iran
| | - Farnia Velayati
- Telemedicine Research Center, National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran,
Iran
| | - Batool Tayefi
- Preventive Medicine and Public Health Research Center, Psychosocial Health Research Institute, Department of Community and Family Medicine, School of Medicine, Iran University of Medical Sciences, Tehran,
Iran
| | - Ebrahim Babaee
- Preventive Medicine and Public Health Research Center, Psychosocial Health Research Institute, Department of Community and Family Medicine, School of Medicine, Iran University of Medical Sciences, Tehran,
Iran
- Vaccine Research Center, Iran University of Medical Sciences, Tehran,
Iran
| |
Collapse
|
10
|
Spallone V. Diabetic neuropathy: Current issues in diagnosis and prevention. CHRONIC COMPLICATIONS OF DIABETES MELLITUS 2024:117-163. [DOI: 10.1016/b978-0-323-88426-6.00016-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
11
|
Abegaz TM, Ahmed M, Sherbeny F, Diaby V, Chi H, Ali AA. Application of Machine Learning Algorithms to Predict Uncontrolled Diabetes Using the All of Us Research Program Data. Healthcare (Basel) 2023; 11:1138. [PMID: 37107973 PMCID: PMC10137945 DOI: 10.3390/healthcare11081138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/11/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
There is a paucity of predictive models for uncontrolled diabetes mellitus. The present study applied different machine learning algorithms on multiple patient characteristics to predict uncontrolled diabetes. Patients with diabetes above the age of 18 from the All of Us Research Program were included. Random forest, extreme gradient boost, logistic regression, and weighted ensemble model algorithms were employed. Patients who had a record of uncontrolled diabetes based on the international classification of diseases code were identified as cases. A set of features including basic demographic, biomarkers and hematological indices were included in the model. The random forest model demonstrated high performance in predicting uncontrolled diabetes, yielding an accuracy of 0.80 (95% CI: 0.79-0.81) as compared to the extreme gradient boost 0.74 (95% CI: 0.73-0.75), the logistic regression 0.64 (95% CI: 0.63-0.65) and the weighted ensemble model 0.77 (95% CI: 0.76-0.79). The maximum area under the receiver characteristics curve value was 0.77 (random forest model), while the minimum value was 0.7 (logistic regression model). Potassium levels, body weight, aspartate aminotransferase, height, and heart rate were important predictors of uncontrolled diabetes. The random forest model demonstrated a high performance in predicting uncontrolled diabetes. Serum electrolytes and physical measurements were important features in predicting uncontrolled diabetes. Machine learning techniques may be used to predict uncontrolled diabetes by incorporating these clinical characteristics.
Collapse
Affiliation(s)
- Tadesse M. Abegaz
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL 32307, USA
| | - Muktar Ahmed
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Fatimah Sherbeny
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL 32307, USA
| | - Vakaramoko Diaby
- College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
| | - Hongmei Chi
- The Department of Computer and Information Sciences, Florid A&M University, Tallahassee, FL 32307, USA
| | - Askal Ayalew Ali
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL 32307, USA
| |
Collapse
|
12
|
Chang L, Fukuoka Y, Aouizerat BE, Zhang L, Flowers E. Prediction of Weight Loss to Decrease the Risk for Type 2 Diabetes Using Multidimensional Data in Filipino Americans: Secondary Analysis. JMIR Diabetes 2023; 8:e44018. [PMID: 37040172 PMCID: PMC10131631 DOI: 10.2196/44018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 02/26/2023] [Accepted: 02/28/2023] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND Type 2 diabetes (T2D) has an immense disease burden, affecting millions of people worldwide and costing billions of dollars in treatment. As T2D is a multifactorial disease with both genetic and nongenetic influences, accurate risk assessments for patients are difficult to perform. Machine learning has served as a useful tool in T2D risk prediction, as it can analyze and detect patterns in large and complex data sets like that of RNA sequencing. However, before machine learning can be implemented, feature selection is a necessary step to reduce the dimensionality in high-dimensional data and optimize modeling results. Different combinations of feature selection methods and machine learning models have been used in studies reporting disease predictions and classifications with high accuracy. OBJECTIVE The purpose of this study was to assess the use of feature selection and classification approaches that integrate different data types to predict weight loss for the prevention of T2D. METHODS The data of 56 participants (ie, demographic and clinical factors, dietary scores, step counts, and transcriptomics) were obtained from a previously completed randomized clinical trial adaptation of the Diabetes Prevention Program study. Feature selection methods were used to select for subsets of transcripts to be used in the selected classification approaches: support vector machine, logistic regression, decision trees, random forest, and extremely randomized decision trees (extra-trees). Data types were included in different classification approaches in an additive manner to assess model performance for the prediction of weight loss. RESULTS Average waist and hip circumference were found to be different between those who exhibited weight loss and those who did not exhibit weight loss (P=.02 and P=.04, respectively). The incorporation of dietary and step count data did not improve modeling performance compared to classifiers that included only demographic and clinical data. Optimal subsets of transcripts identified through feature selection yielded higher prediction accuracy than when all available transcripts were included. After comparison of different feature selection methods and classifiers, DESeq2 as a feature selection method and an extra-trees classifier with and without ensemble learning provided the most optimal results, as defined by differences in training and testing accuracy, cross-validated area under the curve, and other factors. We identified 5 genes in two or more of the feature selection subsets (ie, CDP-diacylglycerol-inositol 3-phosphatidyltransferase [CDIPT], mannose receptor C type 2 [MRC2], PAT1 homolog 2 [PATL2], regulatory factor X-associated ankyrin containing protein [RFXANK], and small ubiquitin like modifier 3 [SUMO3]). CONCLUSIONS Our results suggest that the inclusion of transcriptomic data in classification approaches for prediction has the potential to improve weight loss prediction models. Identification of which individuals are likely to respond to interventions for weight loss may help to prevent incident T2D. Out of the 5 genes identified as optimal predictors, 3 (ie, CDIPT, MRC2, and SUMO3) have been previously shown to be associated with T2D or obesity. TRIAL REGISTRATION ClinicalTrials.gov NCT02278939; https://clinicaltrials.gov/ct2/show/NCT02278939.
Collapse
Affiliation(s)
- Lisa Chang
- Department of Physiological Nursing, University of California, San Francisco, San Francisco, CA, United States
- Keck Graduate Institute, Claremont, CA, United States
| | - Yoshimi Fukuoka
- Department of Physiological Nursing, University of California, San Francisco, San Francisco, CA, United States
| | - Bradley E Aouizerat
- Bluestone Center for Clinical Research, New York University, New York, NY, United States
- Department of Oral and Maxillofacial Surgery, New York University, New York, NY, United States
| | - Li Zhang
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, United States
- Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Elena Flowers
- Department of Physiological Nursing, University of California, San Francisco, San Francisco, CA, United States
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
13
|
Chou CY, Hsu DY, Chou CH. Predicting the Onset of Diabetes with Machine Learning Methods. J Pers Med 2023; 13:406. [PMID: 36983587 PMCID: PMC10057336 DOI: 10.3390/jpm13030406] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 02/16/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The number of people suffering from diabetes in Taiwan has continued to rise in recent years. According to the statistics of the International Diabetes Federation, about 537 million people worldwide (10.5% of the global population) suffer from diabetes, and it is estimated that 643 million people will develop the condition (11.3% of the total population) by 2030. If this trend continues, the number will jump to 783 million (12.2%) by 2045. At present, the number of people with diabetes in Taiwan has reached 2.18 million, with an average of one in ten people suffering from the disease. In addition, according to the Bureau of National Health Insurance in Taiwan, the prevalence rate of diabetes among adults in Taiwan has reached 5% and is increasing each year. Diabetes can cause acute and chronic complications that can be fatal. Meanwhile, chronic complications can result in a variety of disabilities or organ decline. If holistic treatments and preventions are not provided to diabetic patients, it will lead to the consumption of more medical resources and a rapid decline in the quality of life of society as a whole. In this study, based on the outpatient examination data of a Taipei Municipal medical center, 15,000 women aged between 20 and 80 were selected as the subjects. These women were patients who had gone to the medical center during 2018-2020 and 2021-2022 with or without the diagnosis of diabetes. This study investigated eight different characteristics of the subjects, including the number of pregnancies, plasma glucose level, diastolic blood pressure, sebum thickness, insulin level, body mass index, diabetes pedigree function, and age. After sorting out the complete data of the patients, this study used Microsoft Machine Learning Studio to train the models of various kinds of neural networks, and the prediction results were used to compare the predictive ability of the various parameters for diabetes. Finally, this study found that after comparing the models using two-class logistic regression as well as the two-class neural network, two-class decision jungle, or two-class boosted decision tree for prediction, the best model was the two-class boosted decision tree, as its area under the curve could reach a score of 0.991, which was better than other models.
Collapse
Affiliation(s)
- Chun-Yang Chou
- Research Center for Healthcare Industry Innovation, National Taipei University of Nursing and Health Sciences, Taipei 112, Taiwan
| | - Ding-Yang Hsu
- Department of Industrial Design, Ming Chi University of Technology, Taipei 243, Taiwan
| | - Chun-Hung Chou
- Industrial Technology Research Institute, Hsinchu 310401, Taiwan
| |
Collapse
|
14
|
Gosak L, Martinović K, Lorber M, Stiglic G. Artificial intelligence based prediction models for individuals at risk of multiple diabetic complications: A systematic review of the literature. J Nurs Manag 2022; 30:3765-3776. [PMID: 36329678 PMCID: PMC10100477 DOI: 10.1111/jonm.13894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 10/03/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022]
Abstract
AIM The aim of this review is to examine the effectiveness of artificial intelligence in predicting multimorbid diabetes-related complications. BACKGROUND In diabetic patients, several complications are often present, which have a significant impact on the quality of life; therefore, it is crucial to predict the level of risk for diabetes and its complications. EVALUATION International databases PubMed, CINAHL, MEDLINE and Scopus were searched using the terms artificial intelligence, diabetes mellitus and prediction of complications to identify studies on the effectiveness of artificial intelligence for predicting multimorbid diabetes-related complications. The results were organized by outcomes to allow more efficient comparison. KEY ISSUES Based on the inclusion/exclusion criteria, 11 articles were included in the final analysis. The most frequently predicted complications were diabetic neuropathy (n = 7). Authors included from two to a maximum of 14 complications. The most commonly used prediction models were penalized regression, random forest and Naïve Bayes model neural network. CONCLUSION The use of artificial intelligence can predict the risks of diabetes complications with greater precision based on available multidimensional datasets and provides an important tool for nurses working in preventive health care. IMPLICATIONS FOR NURSING MANAGEMENT Using artificial intelligence contributes to a better quality of care, better autonomy of patients in diabetes management and reduction of complications, costs of medical care and mortality.
Collapse
Affiliation(s)
- Lucija Gosak
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia
| | - Kristina Martinović
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia.,Faculty of Health Sciences, University of Primorska, Izola, Slovenia
| | - Mateja Lorber
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia
| | - Gregor Stiglic
- Faculty of Health Sciences, University of Maribor, Maribor, Slovenia.,Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia.,Usher Institute, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
15
|
Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:7378307. [PMID: 35399848 PMCID: PMC8993553 DOI: 10.1155/2022/7378307] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/10/2022] [Accepted: 03/21/2022] [Indexed: 12/17/2022]
Abstract
Background Diabetic kidney disease (DKD), one of the complications of diabetes in patients, leads to progressive loss of kidney function. Timely intervention is known to improve outcomes. Therefore, screening patients to identify high-risk populations is important. Machine learning classification techniques can be applied to patient datasets to identify high-risk patients by building a predictive model. Objective This study aims to identify a suitable classification technique for predicting DKD by applying different classification techniques to a DKD dataset and comparing their performance using WEKA machine learning software. Methods The performance of nine different classification techniques was analyzed on a DKD dataset with 410 instances and 18 attributes. Data preprocessing was carried out using the PartitionMembershipFilter. A 10-fold cross validation was performed on the dataset. The performance was assessed on the basis of the execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error, and true values of the confusion matrix. Results With an accuracy of 93.6585% and a higher K value (0.8731), IBK and random tree classification techniques were found to be the best performing techniques. Moreover, they also exhibited the lowest root mean squared error rate (0.2496). There were 15 false-positive instances and 11 false-negative instances with these prediction models. Conclusions This study identified IBK and random tree classification techniques as the best performing classifiers and accurate prediction methods for DKD.
Collapse
|