Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Aldrich C. Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework. Minerals 2020;10:420. [DOI: 10.3390/min10050420] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Gao J, Lu Y, Ashrafi N, Domingo I, Alaei K, Pishgar M. Prediction of sepsis mortality in ICU patients using machine learning methods. BMC Med Inform Decis Mak 2024;24:228. [PMID: 39152423 PMCID: PMC11328468 DOI: 10.1186/s12911-024-02630-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 08/05/2024] [Indexed: 08/19/2024] Open

Abstract

PROBLEM

Sepsis, a life-threatening condition, accounts for the deaths of millions of people worldwide. Accurate prediction of sepsis outcomes is crucial for effective treatment and management. Previous studies have utilized machine learning for prognosis, but have limitations in feature sets and model interpretability.

AIM

This study aims to develop a machine learning model that enhances prediction accuracy for sepsis outcomes using a reduced set of features, thereby addressing the limitations of previous studies and enhancing model interpretability.

METHODS

This study analyzes intensive care patient outcomes using the MIMIC-IV database, focusing on adult sepsis cases. Employing the latest data extraction tools, such as Google BigQuery, and following stringent selection criteria, we selected 38 features in this study. This selection is also informed by a comprehensive literature review and clinical expertise. Data preprocessing included handling missing values, regrouping categorical variables, and using the Synthetic Minority Over-sampling Technique (SMOTE) to balance the data. We evaluated several machine learning models: Decision Trees, Gradient Boosting, XGBoost, LightGBM, Multilayer Perceptrons (MLP), Support Vector Machines (SVM), and Random Forest. The Sequential Halving and Classification (SHAC) algorithm was used for hyperparameter tuning, and both train-test split and cross-validation methodologies were employed for performance and computational efficiency.

RESULTS

The Random Forest model was the most effective, achieving an area under the receiver operating characteristic curve (AUROC) of 0.94 with a confidence interval of ±0.01. This significantly outperformed other models and set a new benchmark in the literature. The model also provided detailed insights into the importance of various clinical features, with the Sequential Organ Failure Assessment (SOFA) score and average urine output being highly predictive. SHAP (Shapley Additive Explanations) analysis further enhanced the model's interpretability, offering a clearer understanding of feature impacts.

CONCLUSION

This study demonstrates significant improvements in predicting sepsis outcomes using a Random Forest model, supported by advanced machine learning techniques and thorough data preprocessing. Our approach provided detailed insights into the key clinical features impacting sepsis mortality, making the model both highly accurate and interpretable. By enhancing the model's practical utility in clinical settings, we offer a valuable tool for healthcare professionals to make data-driven decisions, ultimately aiming to minimize sepsis-induced fatalities.

Collapse

Saelee R, Alexander DS, Onufrak S, Imperatore G, Bullard KM. Household Food Security Status and Allostatic Load among United States Adults: National Health and Nutrition Examination Survey 2015-2020. J Nutr 2024;154:785-793. [PMID: 38158187 PMCID: PMC10922609 DOI: 10.1016/j.tjnut.2023.12.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/20/2023] [Accepted: 12/26/2023] [Indexed: 01/03/2024] Open

Abstract

BACKGROUND

Household food insecurity has been linked to adverse health outcomes, but the pathways driving these associations are not well understood. The stress experienced by those in food-insecure households and having to prioritize between food and other essential needs could lead to physiologic dysregulations [i.e., allostatic load (AL)] and, as a result, adversely impact their health.

OBJECTIVE

To assess the association between household food security status and AL and differences by gender, race and ethnicity, and Supplemental Nutrition Assistance Program (SNAP) participation.

METHODS

We used data from 7640 United States adults in the 2015-2016 and 2017-March 2020 National Health and Nutrition Examination Survey to estimate means and prevalence ratios (PR) for AL scores (based on cardiovascular, metabolic, and immune biomarkers) associated with self-reported household food security status from multivariable linear and logistic regression models.

RESULTS

Adults in marginally food-secure [mean = 3.09, standard error (SE) = 0.10] and food-insecure households (mean = 3.05; SE = 0.08) had higher mean AL than those in food-secure households (mean = 2.70; SE = 0.05). Compared with adults in food-secure households in the same category, those more likely to have an elevated AL included: SNAP participants [PR = 1.12; 95% confidence interval (CI): 1.03, 1.22] and Hispanic women (PR = 1.20; 95% CI: 1.05, 1.37) in marginally food-secure households; and non-Hispanic Black women (PR = 1.14; 95% CI: 1.03, 1.26), men (PR = 1.13; 95% CI: 1.02, 1.26), and non-SNAP non-Hispanic White adults (PR = 1.22; 95% CI: 1.08, 1.39) in food-insecure households.

CONCLUSIONS

AL may be one pathway by which household food insecurity affects health and may vary by gender, race and ethnicity, and SNAP participation.

Collapse

Li Z, Yang J, Zhong J, Zhang D. Assessment of Urban Agglomeration Ecological Sustainability and Identification of Influencing Factors: Based on the 3DEF Model and the Random Forest. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;20:422. [PMID: 36612743 PMCID: PMC9819968 DOI: 10.3390/ijerph20010422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]

Dritsas E, Trigka M. Data-Driven Machine-Learning Methods for Diabetes Risk Prediction. SENSORS (BASEL, SWITZERLAND) 2022;22:5304. [PMID: 35890983 PMCID: PMC9318204 DOI: 10.3390/s22145304] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/10/2022] [Accepted: 07/13/2022] [Indexed: 01/11/2023]

Pinto M, Marotta N, Caracò C, Simeone E, Ammendolia A, de Sire A. Quality of Life Predictors in Patients With Melanoma: A Machine Learning Approach. Front Oncol 2022;12:843611. [PMID: 35402230 PMCID: PMC8990304 DOI: 10.3389/fonc.2022.843611] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 02/25/2022] [Indexed: 12/20/2022] Open

Abstract

Health related quality of life (HRQoL) is an important recognized health outcome for cancer treatments, but also disease course with slower recovery and increased morbidity. These issues are of implication in melanoma, which maintains a risk of disease progression for many years after diagnosis. This study aimed to explore and weigh factors in the perception of the quality of life and possible relationships with demographic–clinical characteristics in people with melanoma via a machine learning approach. In this observational study, patients with melanoma, without metastatic disease, were recruited from January 2020 to December 2021 with a follow-up of at least one year. Demographic variables and clinics were collected, and the 12-Item Short-Form Health Survey (SF-12) was adopted as the physical and mental aspects of the Health-Related Quality of Life (HRQoL) measure. All the variables were processed in a random forest model to weigh at each node of each tree of this machine learning regression model, their actual weight in SF-12 score. We included 203 melanoma patients, mean aged 59.25 ± 15.1 years: 56 (27%) affecting the upper limbs and 147 (73%) affecting the trunk. The model of 142 patients with no missing value, generating 92 trees (MSE = 0.45, R2 of 0.78), reported that the lesion site was the most influencing variable on HRQoL based on the decrease in Gini impurity in variable weighing at each node intersection in forest generation. In this scenario, we built two distinct models for lesion sites and demonstrated that the variable that most influenced the quality of life in upper limb melanoma was lymphedema, while BMI was in the trunk. Given these results, random forest regressions could play a crucial role in the clinical and rehabilitation approach. The machine-learning model for detecting the HRQoL predictor in melanoma patients indicates that the experienced lymphedema and BMI may influence the HRQoL perception. This study suggests that the prevention and treatment of lymphedema and bodyweight reduction might improve the quality of life in melanoma.

Collapse

Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6010013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Abstract Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country. However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features. The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer. Collapse

Swin Transformer and Deep Convolutional Neural Networks for Coastal Wetland Classification Using Sentinel-1, Sentinel-2, and LiDAR Data. REMOTE SENSING 2022. [DOI: 10.3390/rs14020359] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Barua L, Zou B, Zhou Y, Liu Y. Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017. TRANSPORTATION 2021;50:437-476. [PMID: 34873350 PMCID: PMC8637526 DOI: 10.1007/s11116-021-10250-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/19/2021] [Indexed: 05/29/2023]

Abstract

Despite the rapid growth of online shopping and research interest in the relationship between online and in-store shopping, national-level modeling and investigation of the demand for online shopping with a prediction focus remain limited in the literature. This paper differs from prior work and leverages two recent releases of the U.S. National Household Travel Survey (NHTS) data for 2009 and 2017 to develop machine learning (ML) models, specifically gradient boosting machine (GBM), for predicting household-level online shopping purchases. The NHTS data allow for not only conducting nationwide investigation but also at the level of households, which is more appropriate than at the individual level given the connected consumption and shopping needs of members in a household. We follow a systematic procedure for model development including employing Recursive Feature Elimination algorithm to select input variables (features) in order to reduce the risk of model overfitting and increase model explainability. Among several ML models, GBM is found to yield the best prediction accuracy. Extensive post-modeling investigation is conducted in a comparative manner between 2009 and 2017, including quantifying the importance of each input variable in predicting online shopping demand, and characterizing value-dependent relationships between demand and the input variables. In doing so, two latest advances in machine learning techniques, namely Shapley value-based feature importance and Accumulated Local Effects plots, are adopted to overcome inherent drawbacks of the popular techniques in current ML modeling. The modeling and investigation are performed at the national level, with a number of findings obtained. The models developed and insights gained can be used for online shopping-related freight demand generation and may also be considered for evaluating the potential impact of relevant policies on online shopping demand.

Collapse

Linking Agricultural Index Insurance with Factors That Influence Maize Yield in Rain-Fed Smallholder Farming Systems. SUSTAINABILITY 2021. [DOI: 10.3390/su13095176] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]