Xu Z, Zhang K, Liu D, Fang X. Predicting mortality and risk factors of sepsis related ARDS using machine learning models.
Sci Rep 2025;
15:13509. [PMID:
40251182 PMCID:
PMC12008361 DOI:
10.1038/s41598-025-96501-w]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Accepted: 03/28/2025] [Indexed: 04/20/2025] Open
Abstract
Sepsis related acute respiratory distress syndrome (ARDS) is a common and serious disease in clinic. Accurate prediction of in-hospital mortality of patients is crucial to optimize treatment and improve prognosis under the new global definition of ARDS. Our study aimed to use machine learning models to develop models that can effectively predict the in-hospital mortality of patients with sepsis related ARDS, calculate the mortality, and to identify related risk factors under the new global definition of ARDS. Based on MIMIC database, our study included 3470 first-time admission records of patients with sepsis related ARDS. After excluding 4 patients under the age of 18, 75 patients with less than 24 h stay in ICU, and 5 cases with missing indicators > 30%, finally 3386 cases were retained. The variance inflation factor (VIF) analysis was used to test the collinearity of the explanatory variables. The data were divided into the training set and the test set according to the ratio of 7:3. Six models, extreme gradient boosting (XGBoost), light gradient boosting (LightGBM), random forest (RF), classification and regression tree (CART), naive bayes (NB) and logistic regression (LR), were designed for training and testing. In the training set, XGBoost (AUROC = 0.951, 95% CI 0.942-0.961), LR (AUROC = 0.835, 95% CI 0.817-0.854), RF (AUROC = 1.0, 95% CI 1.0-1.0), LightGBM (AUROC = 1.0, 95% CI 1.0-1.0), CART (AUROC = 0.831, 95% CI 0.811-0.852), NB (AUROC = 0.793, 95% CI 0.772-0.814). In the test set, XGBoost (AUROC = 0.833, 95% CI 0.804-0.861), LR (AUROC = 0.82695% CI 0.796-0.856), RF (AUROC = 0.846, 95% CI 0.818-0.874), LightGBM (AUROC = 0.827, 95% CI 0.798-0.856), CART (AUROC = 0.753, 95% CI 0.718-0.787), NB (AUROC = 0.799, 95% CI 0.768-0.831). The RF model has the best performance on the test set. Further analyze the feature importance ranking and partial dependence plots of random forest model. Acute physiology and chronic health evaluation III (APACHE III), bicarbonate, anion gap and non-invasive blood pressure systolic were identified as the four most important risk characteristics. In this study, a variety of machine learning models have been successfully constructed to predict the in-hospital mortality of patients with sepsis related ARDS, among which the RF model performs well. Key risk factors identified include APACHE III, bicarbonate, anion gap and non-invasive blood pressure systolic. The identification of these factors helps clinicians to assess patients' conditions more accurately and develop personalized treatment plans, thereby improving the survival rate and prognosis quality of patients under the new global definition of ARDS.
Collapse