1
|
Gao J, Lu Y, Ashrafi N, Domingo I, Alaei K, Pishgar M. Prediction of sepsis mortality in ICU patients using machine learning methods. BMC Med Inform Decis Mak 2024; 24:228. [PMID: 39152423 PMCID: PMC11328468 DOI: 10.1186/s12911-024-02630-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 08/05/2024] [Indexed: 08/19/2024] Open
Abstract
PROBLEM Sepsis, a life-threatening condition, accounts for the deaths of millions of people worldwide. Accurate prediction of sepsis outcomes is crucial for effective treatment and management. Previous studies have utilized machine learning for prognosis, but have limitations in feature sets and model interpretability. AIM This study aims to develop a machine learning model that enhances prediction accuracy for sepsis outcomes using a reduced set of features, thereby addressing the limitations of previous studies and enhancing model interpretability. METHODS This study analyzes intensive care patient outcomes using the MIMIC-IV database, focusing on adult sepsis cases. Employing the latest data extraction tools, such as Google BigQuery, and following stringent selection criteria, we selected 38 features in this study. This selection is also informed by a comprehensive literature review and clinical expertise. Data preprocessing included handling missing values, regrouping categorical variables, and using the Synthetic Minority Over-sampling Technique (SMOTE) to balance the data. We evaluated several machine learning models: Decision Trees, Gradient Boosting, XGBoost, LightGBM, Multilayer Perceptrons (MLP), Support Vector Machines (SVM), and Random Forest. The Sequential Halving and Classification (SHAC) algorithm was used for hyperparameter tuning, and both train-test split and cross-validation methodologies were employed for performance and computational efficiency. RESULTS The Random Forest model was the most effective, achieving an area under the receiver operating characteristic curve (AUROC) of 0.94 with a confidence interval of ±0.01. This significantly outperformed other models and set a new benchmark in the literature. The model also provided detailed insights into the importance of various clinical features, with the Sequential Organ Failure Assessment (SOFA) score and average urine output being highly predictive. SHAP (Shapley Additive Explanations) analysis further enhanced the model's interpretability, offering a clearer understanding of feature impacts. CONCLUSION This study demonstrates significant improvements in predicting sepsis outcomes using a Random Forest model, supported by advanced machine learning techniques and thorough data preprocessing. Our approach provided detailed insights into the key clinical features impacting sepsis mortality, making the model both highly accurate and interpretable. By enhancing the model's practical utility in clinical settings, we offer a valuable tool for healthcare professionals to make data-driven decisions, ultimately aiming to minimize sepsis-induced fatalities.
Collapse
Affiliation(s)
- Jiayi Gao
- Department of Industrial System Engineering, University of Southern California, 3715 McClintock Ave, Los Angeles, CA, 90089, USA
| | - Yuying Lu
- Department of Industrial System Engineering, University of Southern California, 3715 McClintock Ave, Los Angeles, CA, 90089, USA
| | - Negin Ashrafi
- Department of Industrial System Engineering, University of Southern California, 3715 McClintock Ave, Los Angeles, CA, 90089, USA
| | - Ian Domingo
- Department of Information and Computer Science, University of California, Irvine, Inner Ring Rd, Irvine, CA, 92697, USA
| | - Kamiar Alaei
- Department of Health Science, California State University, Long Beach, 1250 Bellflower Blvd. HHS2-117, Long Beach, CA, 90840, USA
| | - Maryam Pishgar
- Department of Industrial System Engineering, University of Southern California, 3715 McClintock Ave, Los Angeles, CA, 90089, USA.
| |
Collapse
|
2
|
Saelee R, Alexander DS, Onufrak S, Imperatore G, Bullard KM. Household Food Security Status and Allostatic Load among United States Adults: National Health and Nutrition Examination Survey 2015-2020. J Nutr 2024; 154:785-793. [PMID: 38158187 PMCID: PMC10922609 DOI: 10.1016/j.tjnut.2023.12.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/20/2023] [Accepted: 12/26/2023] [Indexed: 01/03/2024] Open
Abstract
BACKGROUND Household food insecurity has been linked to adverse health outcomes, but the pathways driving these associations are not well understood. The stress experienced by those in food-insecure households and having to prioritize between food and other essential needs could lead to physiologic dysregulations [i.e., allostatic load (AL)] and, as a result, adversely impact their health. OBJECTIVE To assess the association between household food security status and AL and differences by gender, race and ethnicity, and Supplemental Nutrition Assistance Program (SNAP) participation. METHODS We used data from 7640 United States adults in the 2015-2016 and 2017-March 2020 National Health and Nutrition Examination Survey to estimate means and prevalence ratios (PR) for AL scores (based on cardiovascular, metabolic, and immune biomarkers) associated with self-reported household food security status from multivariable linear and logistic regression models. RESULTS Adults in marginally food-secure [mean = 3.09, standard error (SE) = 0.10] and food-insecure households (mean = 3.05; SE = 0.08) had higher mean AL than those in food-secure households (mean = 2.70; SE = 0.05). Compared with adults in food-secure households in the same category, those more likely to have an elevated AL included: SNAP participants [PR = 1.12; 95% confidence interval (CI): 1.03, 1.22] and Hispanic women (PR = 1.20; 95% CI: 1.05, 1.37) in marginally food-secure households; and non-Hispanic Black women (PR = 1.14; 95% CI: 1.03, 1.26), men (PR = 1.13; 95% CI: 1.02, 1.26), and non-SNAP non-Hispanic White adults (PR = 1.22; 95% CI: 1.08, 1.39) in food-insecure households. CONCLUSIONS AL may be one pathway by which household food insecurity affects health and may vary by gender, race and ethnicity, and SNAP participation.
Collapse
Affiliation(s)
- Ryan Saelee
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States.
| | - Dayna S Alexander
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Stephen Onufrak
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Giuseppina Imperatore
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Kai McKeever Bullard
- Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
| |
Collapse
|
3
|
Li Z, Yang J, Zhong J, Zhang D. Assessment of Urban Agglomeration Ecological Sustainability and Identification of Influencing Factors: Based on the 3DEF Model and the Random Forest. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:422. [PMID: 36612743 PMCID: PMC9819968 DOI: 10.3390/ijerph20010422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
The evaluation of ecological sustainability is significant for high-quality urban development and scientific management and regulation. Taking the Chengdu urban agglomeration (CUA) as the research object, this paper combined the three-dimensional ecological footprint model (3DEF) and random forest to evaluate the ecological sustainability of the study area and identify the influencing factors. The study results indicate that: (1) From 2000 to 2019, the ecological sustainability of Chengdu urban agglomeration was divided into four types, and the overall ecological sustainability of this region showed a downward trend. The areas with higher ecological sustainability were mainly distributed in the northern part of the urban agglomeration (Mianyang City) and the southern part (Leshan City and Ya'an City), while the cities in the central region (Chengdu City, Meishan City, and Ziyang City) had lower ecological sustainability. (2) The main factors affecting the ecological sustainability of urban agglomerations are industrial wastewater discharge, industrial smoke (powder) dust discharge, and green coverage of built-up areas, followed by urbanization and population size. Through this study, we have two meaningful findings: (a) Our research method in this paper provides a new way to study the factors affecting the ecological sustainability of urban agglomerations. (b) The results of the identification of influencing factors might be the reference for urban environmental infrastructure construction and urban planning.
Collapse
Affiliation(s)
- Zhigang Li
- College of Management Science, Chengdu University of Technology, Chengdu 610059, China
| | - Jie Yang
- College of Management Science, Chengdu University of Technology, Chengdu 610059, China
- The Engineering & Technical College of Chengdu University of Technology, Leshan 614000, China
| | - Jialong Zhong
- College of Management Science, Chengdu University of Technology, Chengdu 610059, China
| | - Dong Zhang
- College of Management Science, Chengdu University of Technology, Chengdu 610059, China
| |
Collapse
|
4
|
Dritsas E, Trigka M. Data-Driven Machine-Learning Methods for Diabetes Risk Prediction. SENSORS (BASEL, SWITZERLAND) 2022; 22:5304. [PMID: 35890983 PMCID: PMC9318204 DOI: 10.3390/s22145304] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/10/2022] [Accepted: 07/13/2022] [Indexed: 01/11/2023]
Abstract
Diabetes mellitus is a chronic condition characterized by a disturbance in the metabolism of carbohydrates, fats and proteins. The most characteristic disorder in all forms of diabetes is hyperglycemia, i.e., elevated blood sugar levels. The modern way of life has significantly increased the incidence of diabetes. Therefore, early diagnosis of the disease is a necessity. Machine Learning (ML) has gained great popularity among healthcare providers and physicians due to its high potential in developing efficient tools for risk prediction, prognosis, treatment and the management of various conditions. In this study, a supervised learning methodology is described that aims to create risk prediction tools with high efficiency for type 2 diabetes occurrence. A features analysis is conducted to evaluate their importance and explore their association with diabetes. These features are the most common symptoms that often develop slowly with diabetes, and they are utilized to train and test several ML models. Various ML models are evaluated in terms of the Precision, Recall, F-Measure, Accuracy and AUC metrics and compared under 10-fold cross-validation and data splitting. Both validation methods highlighted Random Forest and K-NN as the best performing models in comparison to the other models.
Collapse
Affiliation(s)
| | - Maria Trigka
- Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece;
| |
Collapse
|
5
|
Pinto M, Marotta N, Caracò C, Simeone E, Ammendolia A, de Sire A. Quality of Life Predictors in Patients With Melanoma: A Machine Learning Approach. Front Oncol 2022; 12:843611. [PMID: 35402230 PMCID: PMC8990304 DOI: 10.3389/fonc.2022.843611] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 02/25/2022] [Indexed: 12/20/2022] Open
Abstract
Health related quality of life (HRQoL) is an important recognized health outcome for cancer treatments, but also disease course with slower recovery and increased morbidity. These issues are of implication in melanoma, which maintains a risk of disease progression for many years after diagnosis. This study aimed to explore and weigh factors in the perception of the quality of life and possible relationships with demographic–clinical characteristics in people with melanoma via a machine learning approach. In this observational study, patients with melanoma, without metastatic disease, were recruited from January 2020 to December 2021 with a follow-up of at least one year. Demographic variables and clinics were collected, and the 12-Item Short-Form Health Survey (SF-12) was adopted as the physical and mental aspects of the Health-Related Quality of Life (HRQoL) measure. All the variables were processed in a random forest model to weigh at each node of each tree of this machine learning regression model, their actual weight in SF-12 score. We included 203 melanoma patients, mean aged 59.25 ± 15.1 years: 56 (27%) affecting the upper limbs and 147 (73%) affecting the trunk. The model of 142 patients with no missing value, generating 92 trees (MSE = 0.45, R2 of 0.78), reported that the lesion site was the most influencing variable on HRQoL based on the decrease in Gini impurity in variable weighing at each node intersection in forest generation. In this scenario, we built two distinct models for lesion sites and demonstrated that the variable that most influenced the quality of life in upper limb melanoma was lymphedema, while BMI was in the trunk. Given these results, random forest regressions could play a crucial role in the clinical and rehabilitation approach. The machine-learning model for detecting the HRQoL predictor in melanoma patients indicates that the experienced lymphedema and BMI may influence the HRQoL perception. This study suggests that the prevention and treatment of lymphedema and bodyweight reduction might improve the quality of life in melanoma.
Collapse
Affiliation(s)
- Monica Pinto
- Rehabilitation Medicine Unit, Strategic Health Services Department, Istituto Nazionale Tumori-Istituti di Ricovero e Cura a Carattere Scientifico (IRCCS)-Fondazione G. Pascale, Naples, Italy
| | - Nicola Marotta
- Physical Medicine and Rehabilitation Unit, Department of Medical and Surgical Sciences, University of Catanzaro "Magna Graecia", Catanzaro, Italy
| | - Corrado Caracò
- Melanoma and Skin Cancer Surgery Unit, Department of Melanoma, Cancer Immunotherapy and Development Therapeutics, Istituto Nazionale Tumori-Istituti di Ricovero e Cura a Carattere Scientifico (IRCCS)-Fondazione G. Pascale, Naples, Italy
| | - Ester Simeone
- Department of Melanoma, Cancer Immunotherapy and Development Therapeutics, Istituto Nazionale Tumori-Istituti di Ricovero e Cura a Carattere Scientifico (IRCCS)-Fondazione G. Pascale, Naples, Italy
| | - Antonio Ammendolia
- Physical Medicine and Rehabilitation Unit, Department of Medical and Surgical Sciences, University of Catanzaro "Magna Graecia", Catanzaro, Italy
| | - Alessandro de Sire
- Physical Medicine and Rehabilitation Unit, Department of Medical and Surgical Sciences, University of Catanzaro "Magna Graecia", Catanzaro, Italy
| |
Collapse
|
6
|
Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6010013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer.
Collapse
|
7
|
Swin Transformer and Deep Convolutional Neural Networks for Coastal Wetland Classification Using Sentinel-1, Sentinel-2, and LiDAR Data. REMOTE SENSING 2022. [DOI: 10.3390/rs14020359] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The use of machine learning algorithms to classify complex landscapes has been revolutionized by the introduction of deep learning techniques, particularly in remote sensing. Convolutional neural networks (CNNs) have shown great success in the classification of complex high-dimensional remote sensing imagery, specifically in wetland classification. On the other hand, the state-of-the-art natural language processing (NLP) algorithms are transformers. Although the transformers have been studied for a few remote sensing applications, the integration of deep CNNs and transformers has not been studied, particularly in wetland mapping. As such, in this study, we explore the potential and possible limitations to be overcome regarding the use of a multi-model deep learning network with the integration of a modified version of the well-known deep CNN network of VGG-16, a 3D CNN network, and Swin transformer for complex coastal wetland classification. Moreover, we discuss the potential and limitation of the proposed multi-model technique over several solo models, including a random forest (RF), support vector machine (SVM), VGG-16, 3D CNN, and Swin transformer in the pilot site of Saint John city located in New Brunswick, Canada. In terms of F-1 score, the multi-model network obtained values of 0.87, 0.88, 0.89, 0.91, 0.93, 0.93, and 0.93 for the recognition of shrub wetland, fen, bog, aquatic bed, coastal marsh, forested wetland, and freshwater marsh, respectively. The results suggest that the multi-model network is superior to other solo classifiers from 3.36% to 33.35% in terms of average accuracy. Results achieved in this study suggest the high potential for integrating and using CNN networks with the cutting-edge transformers for the classification of complex landscapes in remote sensing.
Collapse
|
8
|
Barua L, Zou B, Zhou Y, Liu Y. Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017. TRANSPORTATION 2021; 50:437-476. [PMID: 34873350 PMCID: PMC8637526 DOI: 10.1007/s11116-021-10250-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/19/2021] [Indexed: 05/29/2023]
Abstract
Despite the rapid growth of online shopping and research interest in the relationship between online and in-store shopping, national-level modeling and investigation of the demand for online shopping with a prediction focus remain limited in the literature. This paper differs from prior work and leverages two recent releases of the U.S. National Household Travel Survey (NHTS) data for 2009 and 2017 to develop machine learning (ML) models, specifically gradient boosting machine (GBM), for predicting household-level online shopping purchases. The NHTS data allow for not only conducting nationwide investigation but also at the level of households, which is more appropriate than at the individual level given the connected consumption and shopping needs of members in a household. We follow a systematic procedure for model development including employing Recursive Feature Elimination algorithm to select input variables (features) in order to reduce the risk of model overfitting and increase model explainability. Among several ML models, GBM is found to yield the best prediction accuracy. Extensive post-modeling investigation is conducted in a comparative manner between 2009 and 2017, including quantifying the importance of each input variable in predicting online shopping demand, and characterizing value-dependent relationships between demand and the input variables. In doing so, two latest advances in machine learning techniques, namely Shapley value-based feature importance and Accumulated Local Effects plots, are adopted to overcome inherent drawbacks of the popular techniques in current ML modeling. The modeling and investigation are performed at the national level, with a number of findings obtained. The models developed and insights gained can be used for online shopping-related freight demand generation and may also be considered for evaluating the potential impact of relevant policies on online shopping demand.
Collapse
Affiliation(s)
- Limon Barua
- Department of Civil, Materials, and Environmental Engineering, University of Illinois Chicago, Chicago, USA
| | - Bo Zou
- Department of Civil, Materials, and Environmental Engineering, University of Illinois Chicago, Chicago, USA
- Department of Civil and Environmental Engineering, University of California, Berkeley, USA
| | - Yan Zhou
- Vehicle and Energy Technology and Mobility Analysis, Argonne National Laboratory, Lemont, USA
| | - Yulin Liu
- Institute of Transportation Studies, University of California, Berkeley, USA
| |
Collapse
|
9
|
Linking Agricultural Index Insurance with Factors That Influence Maize Yield in Rain-Fed Smallholder Farming Systems. SUSTAINABILITY 2021. [DOI: 10.3390/su13095176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Weather extremes pose substantial threats to food security in areas where the main source of livelihood is rain-fed crop production. In most of these areas, agricultural index insurance (AII) is recognized as being capable of securitizing food production by providing safety nets against weather-induced crop losses. Unfortunately, however, AII does not indemnify farmers for non-weather-related crop losses. This study investigates how this gap can be filled by exploring strategies through which AII can be linked with non-weather factors that influence crop production. We do this by using an improvised variable ranking methodology to identify these factors in the O.R. Tambo District Municipality, South Africa. Results show that key agrometeorological variables comprising surface moisture content, growing degree-days, and precipitation influence maize yield even under optimal weather conditions, while seed variety, fertilizer application rate, soil pH, and ownership of machinery play an equally important role. This finding is important because it demonstrates that although AII focuses more on weather elements, there are non-weather variables that may expose farmers to production risk even under optimal weather conditions. As such, linking AII with critical non-weather, yield-determining factors can be a better risk management strategy.
Collapse
|