1
|
Delpino FM, Costa ÂK, César do Nascimento M, Dias Moura HS, Geremias Dos Santos H, Wichmann RM, Porto Chiavegatto Filho AD, Arcêncio RA, Nunes BP. Does machine learning have a high performance to predict obesity among adults and older adults? A systematic review and meta-analysis. Nutr Metab Cardiovasc Dis 2024; 34:2034-2045. [PMID: 39004592 DOI: 10.1016/j.numecd.2024.05.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/27/2024] [Accepted: 05/23/2024] [Indexed: 07/16/2024]
Abstract
AIM Machine learning may be a tool with the potential for obesity prediction. This study aims to review the literature on the performance of machine learning models in predicting obesity and to quantify the pooled results through a meta-analysis. DATA SYNTHESIS A systematic review and meta-analysis were conducted, including studies that used machine learning to predict obesity. Searches were conducted in October 2023 across databases including LILACS, Web of Science, Scopus, Embase, and CINAHL. We included studies that utilized classification models and reported results in the Area Under the ROC Curve (AUC) (PROSPERO registration: CRD42022306940), without imposing restrictions on the year of publication. The risk of bias was assessed using an adapted version of the Transparent Reporting of a multivariable prediction model for individual Prognosis or Diagnosis (TRIPOD). Meta-analysis was conducted using MedCalc software. A total of 14 studies were included, with the majority demonstrating satisfactory performance for obesity prediction, with AUCs exceeding 0.70. The random forest algorithm emerged as the top performer in obesity prediction, achieving an AUC of 0.86 (95%CI: 0.76-0.96; I2: 99.8%), closely followed by logistic regression with an AUC of 0.85 (95%CI: 0.75-0.95; I2: 99.6%). The least effective model was gradient boosting, with an AUC of 0.77 (95%CI: 0.71-0.82; I2: 98.1%). CONCLUSION Machine learning models demonstrated satisfactory predictive performance for obesity. However, future research should utilize more comparable data, larger databases, and a broader range of machine learning models.
Collapse
Affiliation(s)
- Felipe Mendes Delpino
- Postgraduate Program in Nursing, Federal University of Pelotas. Pelotas, Rio Grande do Sul, Brazil; Postgraduate Program in Public Health Nursing, University of São Paulo, Ribeirão Preto, Brazil.
| | - Ândria Krolow Costa
- Postgraduate Program in Nursing, Federal University of Pelotas. Pelotas, Rio Grande do Sul, Brazil
| | | | | | | | | | | | | | - Bruno Pereira Nunes
- Postgraduate Program in Nursing, Federal University of Pelotas. Pelotas, Rio Grande do Sul, Brazil
| |
Collapse
|
2
|
Yagin FH, Aygun U, Algarni A, Colak C, Al-Hashem F, Ardigò LP. Platelet Metabolites as Candidate Biomarkers in Sepsis Diagnosis and Management Using the Proposed Explainable Artificial Intelligence Approach. J Clin Med 2024; 13:5002. [PMID: 39274215 PMCID: PMC11395774 DOI: 10.3390/jcm13175002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 08/16/2024] [Accepted: 08/22/2024] [Indexed: 09/16/2024] Open
Abstract
Background: Sepsis is characterized by an atypical immune response to infection and is a dangerous health problem leading to significant mortality. Current diagnostic methods exhibit insufficient sensitivity and specificity and require the discovery of precise biomarkers for the early diagnosis and treatment of sepsis. Platelets, known for their hemostatic abilities, also play an important role in immunological responses. This study aims to develop a model integrating machine learning and explainable artificial intelligence (XAI) to identify novel platelet metabolomics markers of sepsis. Methods: A total of 39 participants, 25 diagnosed with sepsis and 14 control subjects, were included in the study. The profiles of platelet metabolites were analyzed using quantitative 1H-nuclear magnetic resonance (NMR) technology. Data were processed using the synthetic minority oversampling method (SMOTE)-Tomek to address the issue of class imbalance. In addition, missing data were filled using a technique based on random forests. Three machine learning models, namely extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and kernel tree boosting (KTBoost), were used for sepsis prediction. The models were validated using cross-validation. Clinical annotations of the optimal sepsis prediction model were analyzed using SHapley Additive exPlanations (SHAP), an XAI technique. Results: The results showed that the KTBoost model (0.900 accuracy and 0.943 AUC) achieved better performance than the other models in sepsis diagnosis. SHAP results revealed that metabolites such as carnitine, glutamate, and myo-inositol are important biomarkers in sepsis prediction and intuitively explained the prediction decisions of the model. Conclusion: Platelet metabolites identified by the KTBoost model and XAI have significant potential for the early diagnosis and monitoring of sepsis and improving patient outcomes.
Collapse
Affiliation(s)
- Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Türkiye
| | - Umran Aygun
- Department of Anesthesiology and Reanimation, Malatya Yesilyurt Hasan Calık State Hospital, Malatya 44929, Türkiye
| | - Abdulmohsen Algarni
- Central Labs, King Khalid University, AlQura'a, Abha, P.O. Box 960, Saudi Arabia
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Türkiye
| | - Fahaid Al-Hashem
- Department of Physiology, College of Medicine, King Khalid University, Abha 61421, Saudi Arabia
| | - Luca Paolo Ardigò
- Department of Teacher Education, NLA University College, 0166 Oslo, Norway
| |
Collapse
|
3
|
Palmieri F, Akhtar NF, Pané A, Jiménez A, Olbeyra RP, Viaplana J, Vidal J, de Hollanda A, Gama-Perez P, Jiménez-Chillarón JC, Garcia-Roves PM. Machine learning allows robust classification of visceral fat in women with obesity using common laboratory metrics. Sci Rep 2024; 14:17263. [PMID: 39068287 PMCID: PMC11283481 DOI: 10.1038/s41598-024-68269-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 07/22/2024] [Indexed: 07/30/2024] Open
Abstract
The excessive accumulation and malfunctioning of visceral adipose tissue (VAT) is a major determinant of increased risk of obesity-related comorbidities. Thus, risk stratification of people living with obesity according to their amount of VAT is of clinical interest. Currently, the most common VAT measurement methods include mathematical formulae based on anthropometric dimensions, often biased by human measurement errors, bio-impedance, and image techniques such as X-ray absorptiometry (DXA) analysis, which requires specialized equipment. However, previous studies showed the possibility of classifying people living with obesity according to their VAT through blood chemical concentrations by applying machine learning techniques. In addition, most of the efforts were spent on men living with obesity while little was done for women. Therefore, this study aims to compare the performance of the multilinear regression model (MLR) in estimating VAT and six different supervised machine learning classifiers, including logistic regression (LR), support vector machine and decision tree-based models, to categorize 149 women living with obesity. For clustering, the study population was categorized into classes 0, 1, and 2 according to their VAT and the accuracy of each MLR and classification model was evaluated using DXA-data (DXAdata), blood chemical concentrations (BLDdata), and both DXAdata and BLDdata together (ALLdata). Estimation error and R 2 were computed for MLR, while receiver operating characteristic (ROC) and precision-recall curves (PR) area under the curve (AUC) were used to assess the performance of every classification model. MLR models showed a poor ability to estimate VAT with mean absolute error ≥ 401.40 andR 2 ≤ 0.62 in all the datasets. The highest accuracy was found for LR with values of 0.57, 0.63, and 0.53 for ALLdata, DXAdata, and BLDdata, respectively. The ROC AUC showed a poor ability of both ALLdata and DXAdata to distinguish class 1 from classes 0 and 2 (AUC = 0.31, 0.71, and 0.85, respectively) as also confirmed by PR (AUC = 0.24, 0.57, and 0.73, respectively). However, improved performances were obtained when applying LR model to BLDdata (ROC AUC ≥ 0.61 and PR AUC ≥ 0.42), especially for class 1. These results seem to suggest that, while a direct and reliable estimation of VAT was not possible in our cohort, blood sample-derived information can robustly classify women living with obesity by machine learning-based classifiers, a fact that could benefit the clinical practice, especially in those health centres where medical imaging devices are not available. Nonetheless, these promising findings should be further validated over a larger population.
Collapse
Affiliation(s)
- Flavio Palmieri
- Biophysics unit, Department of Physiological Sciences, Faculty of Medicine and Health, Universitat de Barcelona, Bellvitge campus, 08907, Barcelona, Spain.
- Nutrition, Metabolism and Gene Therapy Group; Diabetes and Metabolism Program; Bellvitge Biomedical Research Institute (IDIBELL), 08908, Barcelona, Spain.
| | - Nidà Farooq Akhtar
- Escola d'Enginyeria de Barcelona Est (EEBE) Universitat Politècnica De Catalunya. Barcelona Tech-UPC, 08019, Barcelona, Spain
| | - Adriana Pané
- Obesity Unit, Endocrinology and Nutrition Department, Hospital Clínic de Barcelona, 08036, Barcelona, Spain
- Centro de Investigación Biomédica en Red de la Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III (ISCIII), 28029, Madrid, Spain
| | - Amanda Jiménez
- Obesity Unit, Endocrinology and Nutrition Department, Hospital Clínic de Barcelona, 08036, Barcelona, Spain
- Centro de Investigación Biomédica en Red de la Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III (ISCIII), 28029, Madrid, Spain
- Fundació Clínic per a la Recerca Biomèdica (FCRB)-Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), 08036, Barcelona, Spain
| | - Romina Paula Olbeyra
- Fundació Clínic per a la Recerca Biomèdica (FCRB)-Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), 08036, Barcelona, Spain
| | - Judith Viaplana
- Fundació Clínic per a la Recerca Biomèdica (FCRB)-Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), 08036, Barcelona, Spain
| | - Josep Vidal
- Obesity Unit, Endocrinology and Nutrition Department, Hospital Clínic de Barcelona, 08036, Barcelona, Spain
- Fundació Clínic per a la Recerca Biomèdica (FCRB)-Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), 08036, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Instituto de Salud Carlos III (ISCIII), 28029, Madrid, Spain
| | - Ana de Hollanda
- Obesity Unit, Endocrinology and Nutrition Department, Hospital Clínic de Barcelona, 08036, Barcelona, Spain
- Centro de Investigación Biomédica en Red de la Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III (ISCIII), 28029, Madrid, Spain
- Fundació Clínic per a la Recerca Biomèdica (FCRB)-Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), 08036, Barcelona, Spain
| | - Pau Gama-Perez
- Biophysics unit, Department of Physiological Sciences, Faculty of Medicine and Health, Universitat de Barcelona, Bellvitge campus, 08907, Barcelona, Spain
| | - Josep C Jiménez-Chillarón
- Biophysics unit, Department of Physiological Sciences, Faculty of Medicine and Health, Universitat de Barcelona, Bellvitge campus, 08907, Barcelona, Spain
- Metabolic diseases of pediatric origin unit, Institut de Recerca Sant Joan de Déu - Barcelona Children's Hospital, 08950, Esplugues del Llobregat, Spain
| | - Pablo M Garcia-Roves
- Biophysics unit, Department of Physiological Sciences, Faculty of Medicine and Health, Universitat de Barcelona, Bellvitge campus, 08907, Barcelona, Spain.
- Nutrition, Metabolism and Gene Therapy Group; Diabetes and Metabolism Program; Bellvitge Biomedical Research Institute (IDIBELL), 08908, Barcelona, Spain.
- Centro de Investigación Biomédica en Red de la Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III (ISCIII), 28029, Madrid, Spain.
| |
Collapse
|
4
|
Yagin FH, Al-Hashem F, Ahmad I, Ahmad F, Alkhateeb A. Pilot-Study to Explore Metabolic Signature of Type 2 Diabetes: A Pipeline of Tree-Based Machine Learning and Bioinformatics Techniques for Biomarkers Discovery. Nutrients 2024; 16:1537. [PMID: 38794775 PMCID: PMC11124278 DOI: 10.3390/nu16101537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 05/13/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND This study aims to identify unique metabolomics biomarkers associated with Type 2 Diabetes (T2D) and develop an accurate diagnostics model using tree-based machine learning (ML) algorithms integrated with bioinformatics techniques. METHODS Univariate and multivariate analyses such as fold change, a receiver operating characteristic curve (ROC), and Partial Least-Squares Discriminant Analysis (PLS-DA) were used to identify biomarker metabolites that showed significant concentration in T2D patients. Three tree-based algorithms [eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Adaptive Boosting (AdaBoost)] that demonstrated robustness in high-dimensional data analysis were used to create a diagnostic model for T2D. RESULTS As a result of the biomarker discovery process validated with three different approaches, Pyruvate, D-Rhamnose, AMP, pipecolate, Tetradecenoic acid, Tetradecanoic acid, Dodecanediothioic acid, Prostaglandin E3/D3 (isobars), ADP and Hexadecenoic acid were determined as potential biomarkers for T2D. Our results showed that the XGBoost model [accuracy = 0.831, F1-score = 0.845, sensitivity = 0.882, specificity = 0.774, positive predictive value (PPV) = 0.811, negative-PV (NPV) = 0.857 and Area under the ROC curve (AUC) = 0.887] had the slight highest performance measures. CONCLUSIONS ML integrated with bioinformatics techniques offers accurate and positive T2D candidate biomarker discovery. The XGBoost model can successfully distinguish T2D based on metabolites.
Collapse
Affiliation(s)
- Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Fahaid Al-Hashem
- Department of Physiology, College of Medicine, King Khalid University, Abha 61421, Saudi Arabia
| | - Irshad Ahmad
- Department of Medical Rehabilitation Sciences, College of Applied Medical Sciences, King Khalid University, Abha 61421, Saudi Arabia
| | - Fuzail Ahmad
- Department of Respiratory Care, College of Applied Sciences, Almaarefa University, Diriya, Riyadh 13713, Saudi Arabia
| | - Abedalrhman Alkhateeb
- Department of Computer Science, Lakehead University, Thunder Bay, ON P7B 5E1, Canada
| |
Collapse
|
5
|
Aygun U, Yagin FH, Yagin B, Yasar S, Colak C, Ozkan AS, Ardigò LP. Assessment of Sepsis Risk at Admission to the Emergency Department: Clinical Interpretable Prediction Model. Diagnostics (Basel) 2024; 14:457. [PMID: 38472930 DOI: 10.3390/diagnostics14050457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 02/18/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
This study aims to develop an interpretable prediction model based on explainable artificial intelligence to predict bacterial sepsis and discover important biomarkers. A total of 1572 adult patients, 560 of whom were sepsis positive and 1012 of whom were negative, who were admitted to the emergency department with suspicion of sepsis, were examined. We investigated the performance characteristics of sepsis biomarkers alone and in combination for confirmed sepsis diagnosis using Sepsis-3 criteria. Three different tree-based algorithms-Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost)-were used for sepsis prediction, and after examining comprehensive performance metrics, descriptions of the optimal model were obtained with the SHAP method. The XGBoost model achieved accuracy of 0.898 (0.868-0.929) and area under the ROC curve (AUC) of 0.940 (0.898-0.980) with a 95% confidence interval. The five biomarkers for predicting sepsis were age, respiratory rate, oxygen saturation, procalcitonin, and positive blood culture. SHAP results revealed that older age, higher respiratory rate, procalcitonin, neutrophil-lymphocyte count ratio, C-reactive protein, plaque, leukocyte particle concentration, as well as lower oxygen saturation, systolic blood pressure, and hemoglobin levels increased the risk of sepsis. As a result, the Explainable Artificial Intelligence (XAI)-based prediction model can guide clinicians in the early diagnosis and treatment of sepsis, providing more effective sepsis management and potentially reducing mortality rates and medical costs.
Collapse
Affiliation(s)
- Umran Aygun
- Department of Anesthesiology and Reanimation, Malatya Yesilyurt Hasan Calık State Hospital, Malatya 44929, Turkey
| | - Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Burak Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Seyma Yasar
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Ahmet Selim Ozkan
- Department of Anesthesiology and Reanimation, Malatya Turgut Ozal University School of Medicine, Malatya 44210, Turkey
| | - Luca Paolo Ardigò
- Department of Teacher Education, NLA University College, 0166 Oslo, Norway
| |
Collapse
|
6
|
Shu C, Zheng C, Luo D, Song J, Jiang Z, Ge L. Acute ischemic stroke prediction and predictive factors analysis using hematological indicators in elderly hypertensives post-transient ischemic attack. Sci Rep 2024; 14:695. [PMID: 38184714 PMCID: PMC10771433 DOI: 10.1038/s41598-024-51402-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/04/2024] [Indexed: 01/08/2024] Open
Abstract
Elderly hypertensive patients diagnosed with transient ischemic attack (TIA) are at a heightened risk for developing acute ischemic stroke (AIS). This underscores the critical need for effective risk prediction and identification of predictive factors. In our study, we utilized patient data from peripheral blood tests and clinical profiles within hospital information systems. These patients were followed for a three-year period to document incident AIS. Our cohort of 11,056 individuals was randomly divided into training, validation, and testing sets in a 5:2:3 ratio. We developed an XGBoost model, developed using selected indicators, provides an effective and non-invasive method for predicting the risk of AIS in elderly hypertensive patients diagnosed with TIA. Impressively, this model achieved a balanced accuracy of 0.9022, a recall of 0.8688, and a PR-AUC of 0.9315. Notably, our model effectively encapsulates essential data variations involving mixed nonlinear interactions, providing competitive performance against more complex models that incorporate a wider range of variables. Further, we conducted an in-depth analysis of the importance and sensitivity of each selected indicator and their interactions. This research equips clinicians with the necessary tools for more precise identification of high-risk individuals, thereby paving the way for more effective stroke prevention and management strategies.
Collapse
Affiliation(s)
- Chang Shu
- Tianjin Key Laboratory of Cerebral Vascular and Neurodegenerative Diseases, Tianjin Neurosurgical Institute, Tianjin Huanhu Hospital, Tianjin, 300350, China.
| | - Chenguang Zheng
- Tianjin Key Laboratory of Brain Science and Neural Engineering, Tianjin University, Tianjin, China
| | - Da Luo
- Tianjin Key Laboratory of Cerebral Vascular and Neurodegenerative Diseases, Tianjin Neurosurgical Institute, Tianjin Huanhu Hospital, Tianjin, 300350, China
| | - Jie Song
- Academy of Medical Engineering and Translational Medicine, Intelligent Medical Engineering, Tianjin University, Tianjin, China
| | - Zhengyi Jiang
- Academy of Medical Engineering and Translational Medicine, Intelligent Medical Engineering, Tianjin University, Tianjin, China
| | - Le Ge
- Tianjin Key Laboratory of Cerebral Vascular and Neurodegenerative Diseases, Tianjin Neurosurgical Institute, Tianjin Huanhu Hospital, Tianjin, 300350, China.
| |
Collapse
|