Cui Z, Dong Y, Yang H, Li K, Li X, Ding R, Yin Z. Machine learning prediction models for multidrug-resistant organism infections in ICU ventilator-associated pneumonia patients: Analysis using the MIMIC-IV database.
Comput Biol Med 2025;
190:110028. [PMID:
40154202 DOI:
10.1016/j.compbiomed.2025.110028]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 03/09/2025] [Accepted: 03/12/2025] [Indexed: 04/01/2025]
Abstract
OBJECTIVE
This study aims to construct and compare four machine learning models using the MIMIC-IV database to identify high-risk factors for multidrug-resistant organism (MDRO) infection in Ventilator-associated pneumonia (VAP) patients.
METHODS
The study included 972 VAP patients from the MIMIC-IV database. Data encompassing demographic information, vital signs, laboratory results, and other relevant variables were collected. The class imbalance issue was addressed using the Synthetic Minority Over-sampling Technique (SMOTE). The dataset was randomly split into training and testing sets (8:2). LASSO regression and feature importance scores were used for feature selection. Clinical prediction models were built using logistic regression, XGBoost, random forest and gradient boosting machine. The performance of the models was evaluated through receiver operating characteristic(ROC) curve analysis.Model calibration was assessed using calibration curves and Brier scores. The effectiveness was evaluated through Decision Curve Analysis (DCA). SHAP was utilized for model interpretation.
RESULTS
Among 972 patients, 824 were non-MDROs-VAP and 128 were MDROs-VAP. Comparative analysis revealed statistically significant differences in various clinical parameters. XGBoost exhibited the best predictive performance, incorporating 20 features with an AUC of 0.831 (95 % CI: 0.785-0.877) on the test set. Calibration curves demonstrated robust consistency, corroborated by Decision Curve Analysis (DCA) affirming the clinical utility. SHAP analysis identified the most important features: red cell distribution width, duration of mechanical ventilation, anion gap, basophil percentage, and neutrophil percentage.
CONCLUSION
This study established and compared four machine learning models for MDROs infections in VAP patients. XGBoost was identified as the optimal predictor, and SHAP values provided insights into 20 independent risk factors, confirming its excellent predictive value.
IMPLICATIONS FOR CLINICAL PRACTICE
VAP is a common infection in ICU patients with a heightened risk of MDRO and increased mortality. The recognition of high bias in existing models calls for future research to employ rigorous methodologies and robust data sources, aiming to develop and validate more accurate and clinically applicable predictive models for MDROs infections in VAP patients.
Collapse