1
|
Yuting Y, Shan D. Associations between urinary and blood heavy metal exposure and heart failure in elderly adults: Insights from an interpretable machine learning model based on NHANES (2003-2020). INTERNATIONAL JOURNAL OF CARDIOLOGY. CARDIOVASCULAR RISK AND PREVENTION 2025; 25:200418. [PMID: 40491714 PMCID: PMC12146108 DOI: 10.1016/j.ijcrp.2025.200418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Revised: 04/04/2025] [Accepted: 04/30/2025] [Indexed: 06/11/2025]
Abstract
Background The relationship between heavy metal exposure and heart failure is complex and poorly understood. This study employs machine learning techniques to model these associations in a population aged 50 years and older from the National Health and Nutrition Examination Survey (NHANES). Our findings emphasize the need for continued investigation into the mechanisms of these associations and highlight the importance of monitoring and regulatory measures to mitigate heavy metal exposure in populations at risk. Methods Five machine learning models were evaluated, with Gradient Boosting Decision Trees (GBDT) selected as the optimal model based on accuracy, interpretability, and ability to capture nonlinear relationships. Model performance was assessed through various metrics, and interpretability was enhanced using SHAP (SHapley Additive exPlanations), permuted Feature Importance, Individual Conditional Expectation (ICE), and Partial Dependence Plots (PDP). Results The GBDT model achieved an accuracy of 0.78, with a sensitivity of 0.93 and an AUC of 0.92. Our analysis revealed that higher levels of urinary iodine, blood cadmium, urinary cobalt, urinary tungsten, and urinary arsenic acid were significantly associated with heart failure. Synergistic effects involving age and body mass index (BMI) were also observed, further strengthening these associations.
Collapse
Affiliation(s)
- Yang Yuting
- Department of Cardiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
- Clinic Center of Human Gene Research, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1277 Jiefang Ave, Wuhan, 430022, China
| | - Deng Shan
- Department of Cardiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
- Clinic Center of Human Gene Research, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1277 Jiefang Ave, Wuhan, 430022, China
- Hubei Key Laboratory of Metabolic Abnormalities and Vascular Aging, Huazhong University of Science and Technology, Wuhan, China
- Hubei Clinical Research Center for Metabolic and Cardiovascular Disease, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
2
|
Wang S, Guo D, Chen X, Chen SZ, Cui XW, Han YH, Xiang P. Environmentally relevant concentrations of antimony pose potential risks to human health: An evaluation on human umbilical vein endothelial cells. Toxicol In Vitro 2025; 106:106054. [PMID: 40086647 DOI: 10.1016/j.tiv.2025.106054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 02/27/2025] [Accepted: 03/11/2025] [Indexed: 03/16/2025]
Abstract
Antimony (Sb) ore exploitation and the use of Sb-containing drugs pose known health risks. This study investigated the toxicity of environmentally relevant concentrations of Sb (0.12-12 mg L-1) on human umbilical vein endothelial cells (HUVECs). The 50 % lethal concentration (LC50) of Sb to HUVECs was 11.4 mg L-1. Exposing to high level of Sb induced cell cycle arrest by altering the expression of cell cycle regulators, inhibiting the transitions of G0/G1 to S and S to G2/M. At 1.2 mg L-1 Sb, CKD6 and p21 expressions in HUVECs changed to 0.75 and 1.32 folds that of no-Sb control, respectively (p < 0.01). At 12 mg L-1 Sb, CDK2, CKD6, and p27 expressions decreased by 1.54, 4.41, and 1.54 folds (p < 0.001), while p21 expression increased by 3.03 folds (p < 0.001) as compared to control. Sb also led to cell apoptosis, evidenced by Annexin V-FITC/PI staining and changes in the expressions of Bax (1.21-1.30 folds, p < 0.01) and Bcl-2 (0.65-0.83 folds). Oxidative damage was a pivotal factor driving cell apoptosis, probably through down-regulating antioxidant genes (CAT, GPX1, and GSTP1) and up-regulating stress response genes (HO-1, SOD1, and TrxR1). The elevated H2O2 generated in mitochondria likely contributed to cell apoptosis due to the imbalance in H2O2 metabolism. These findings suggest that environmentally relevant concentrations of Sb can exert cytotoxicity to HUVECs, which should be of potential concern for human cardiovascular disease.
Collapse
Affiliation(s)
- Shanshan Wang
- College of Integrative Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian 350122, China
| | - Dongqian Guo
- College of Integrative Medicine, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian 350122, China
| | - Xian Chen
- Fujian Key Laboratory of Pollution Control and Resource Reuse, College of Environmental and Resource Sciences, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Su-Zhu Chen
- Center of Reproductive Medicine, Fujian Maternity and Child Health Hospital, College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University, Fuzhou, Fujian 350001, China
| | - Xi-Wen Cui
- Fujian Key Laboratory of Pollution Control and Resource Reuse, College of Environmental and Resource Sciences, Fujian Normal University, Fuzhou, Fujian 350117, China
| | - Yong-He Han
- Fujian Key Laboratory of Pollution Control and Resource Reuse, College of Environmental and Resource Sciences, Fujian Normal University, Fuzhou, Fujian 350117, China.
| | - Ping Xiang
- Institute of Environmental Remediation and Human Health, School of Ecology and Environment, Southwest Forestry University, Kunming, Yunnan 650224, China.
| |
Collapse
|
3
|
Zhao M, Gu S, Liu T, Gao S, Qiao Z, Wang K, Niu Q, Ma R, Guo H, Guo S, He J. Association Between Urinary Metals and Polycyclic Aromatic Hydrocarbon Levels and Cardiovascular Disease Among Adult Americans: Data from NHANES 2011 to 2016. Cardiovasc Toxicol 2025:10.1007/s12012-025-10009-3. [PMID: 40423918 DOI: 10.1007/s12012-025-10009-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Accepted: 04/29/2025] [Indexed: 05/28/2025]
Abstract
Previous studies have inconclusively examined the associations of metals or polycyclic aromatic hydrocarbons (PAHs) with cardiovascular disease (CVD) separately, highlighting the need to explore their combined association with CVD. Based on the 2011-2016 National Health and Nutrition Examination Survey, the association of 12 metals and six PAHs in urine with CVD was analyzed using weighted logistic regression, weighted quantile sum (WQS) regression, and Bayesian kernel machine regression (BKMR). Crucial metals and PAHs were screened, and dose-response, subgroup, interactions, and mediation analyses were conducted. 4306 participants were included, of whom 406 had CVD. Weighted logistic regression showed that cadmium (OR = 1.41, 95% CI 1.11-1.78), tin (OR = 1.63, 95% CI 1.03-2.60), and 1-hydroxypyrene (1-PYR) (OR = 1.40, 95% CI 1.15-1.69) were positively correlated with CVD. These factors also showed a linear relation with CVD. The WQS and BKMR models indicated that the combined association of 12 metals and six PAHs was positively associated with CVD. Cadmium, cesium, tin, uranium, and 1-PYR played critical roles (all weights > 0.050). Subgroup analysis revealed that these substances were mostly positively associated with CVD in young and middle-aged people, smokers, drinkers, and those who were overweight. There was an interaction between tin and smoking status (P for interaction < 0.05). Cadmium and tin mediated 18.40% and 6.90% of the association of 1-PYR with CVD, respectively, whereas the proportions of the mediating effects of 1-PYR in the association of cadmium and tin with CVD were 8.10% and 7.90%, respectively. Overall, higher levels of urinary metals and PAHs mixtures may be associated with higher CVD prevalence. Cadmium, cesium, tin, uranium, and 1-PYR played crucial roles in this association. Cadmium and tin played mediating roles in the association between 1-PYR and CVD. Meanwhile, 1-PYR also played a mediating role in the association between cadmium and tin and CVD.
Collapse
Affiliation(s)
- Minyao Zhao
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Sijie Gu
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Tingchao Liu
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Shipeng Gao
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Zheng Qiao
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Kui Wang
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Qiang Niu
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China.
- NHC Key Laboratory of Prevention and Treatment of Central Asia High Incidence Diseases, First Affiliated Hospital of Shihezi University, Shihezi, 832000, China.
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Shihezi, 832000, China.
- NHC Key Laboratory of Prevention and Treatment of Central Asia High Incidence Diseases, First Affiliated Hospital of Shihezi University, Shihezi, 832000, China.
| |
Collapse
|
4
|
Lu X, Kou H, Li C, Zhan R, Guo R, Liu S, Shen P, Shen M, Du T, Lu J, Shen X. Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2025; 299:118392. [PMID: 40403686 DOI: 10.1016/j.ecoenv.2025.118392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2025] [Revised: 04/30/2025] [Accepted: 05/19/2025] [Indexed: 05/24/2025]
Abstract
Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011-2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011-2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768-0.845), balanced accuracy (0.762; 95 % CI: 0.721-0.802), F1 value (0585; 95 % CI: 0.535-0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122-0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.
Collapse
Affiliation(s)
- Xiaochuan Lu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Huawei Kou
- Medical Affairs Department of Cancer Hospital, General Hospital of Ningxia Medical University, Yinchuan 750004, China.
| | - Cong Li
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | | | - Rongrong Guo
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Shengnan Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Peixuan Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Meiyue Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Tingwei Du
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Jiaqi Lu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| | - Xiaoli Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| |
Collapse
|
5
|
Hu X, Zhi S, Li Y, Cheng Y, Fan H, Li H, Meng Z, Xie J, Tang S, Li W. Development and application of an early prediction model for risk of bloodstream infection based on real-world study. BMC Med Inform Decis Mak 2025; 25:186. [PMID: 40369550 PMCID: PMC12079808 DOI: 10.1186/s12911-025-03020-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2025] [Accepted: 05/05/2025] [Indexed: 05/16/2025] Open
Abstract
BACKGROUND Bloodstream Infection (BSI) is a severe systemic infectious disease that can lead to sepsis and Multiple Organ Dysfunction Syndrome (MODS), resulting in high mortality rates and posing a major public health burden globally. Early identification of BSI is crucial for effective intervention, reducing mortality, and improving patient outcomes. However, existing diagnostic methods are flawed by low specificity, long detection times and high demands on testing platforms. The development of artificial intelligence provides a new approach for early disease identification. This study aims to explore the optimal combination of routine laboratory data and clinical monitoring indicators, and to utilize machine learning algorithms to construct an early, rapid, and universally applicable BSI risk prediction model, to assist in the early diagnosis of BSI in clinical practice. METHODS Clinical data of 2582 suspected BSI patients admitted to the Chongqing University Central Hospital, from January 1, 2021 to December 31, 2023 were collected for this study. The data were divided into a modeling dataset and an external validation dataset based on chronological order, while the modeling dataset was further divided into a training set and an internal validation set. The occurrence rate of BSI, distribution of pathogens, and microbial primary reporting time were analyzed within the training set. During the feature selection stage, univariate regression and ML algorithms were applied. First, Univariate logistic regression was used to screen for predictive factors of BSI. Then, the Boruta algorithm, Lasso regression, and Recursive Feature Elimination with Cross-validation (RFE-CV) were employed to determine the optimal combination of predictors for predicting BSI. Based on the optimal combination, six machine learning algorithms were used to construct an early BSI risk prediction model. The best model was selected by models' performance, and the Shapley Additive Explanations (SHAP) method was used to explain the model. The external validation set was used to evaluate the predictive performance and generalizability of the selected model, and the research findings were ultimately applied in clinical practice. RESULTS The incidence of BSI among inpatients at the Chongqing University Central Hospital was 12.91%. Following further feature selection, a set of 5 variables was determined, including white blood cell count, standard bicarbonate, base excess of extracellular fluid, interleukin-6, and body temperature. BSI early risk prediction models were constructed using six machine learning algorithms, with the XGBoost model demonstrating the best performance, achieving an AUC value of 0.782 in the internal validation set and an AUC value of 0.776 in the external validation set. This model is made publicly available as an online webpage tool for clinical use. CONCLUSIONS This study successfully identified a set of 5 features by analyzing routine laboratory data clinical monitoring indicators among hospitalized patients. Based on this set, a machine learning-based early risk prediction model for BSI was constructed. The model is capable of early and rapid differentiation between BSI and non-BSI patients. The inclusion of minimal risk prediction factors enhances its applicability in clinical settings, particularly at the primary care level. To further improve the model's real-world applicability and more convenient for clinical use, the online application of the model could greatly improve the efficiency of BSI diagnosis and reducing patients' mortality.
Collapse
Affiliation(s)
- Xiefei Hu
- Department of Clinical Laboratory, Chongqing Emergency Medical Center, School of Medicine, Chongqing University Central Hospital, Chongqing University, Chongqing, China
| | - Shenshen Zhi
- Department of Clinical Laboratory, Chongqing Emergency Medical Center, School of Medicine, Chongqing University Central Hospital, Chongqing University, Chongqing, China
| | - Yang Li
- Peking University Chongqing Big Data Research Institute, Chongqing, China
| | - Yuming Cheng
- Beckman Coulter Commercial Enterprise (China) Co., Ltd, Shanghai, China
| | - Haiping Fan
- School of Medicine, ChongQing University, Chongqing, China
| | - Haorong Li
- Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Zihao Meng
- Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Jiaxin Xie
- School of Medicine, ChongQing University, Chongqing, China
| | - Shu Tang
- Chongqing University of Posts and Telecommunications, Chongqing, China.
| | - Wei Li
- Department of Clinical Laboratory, Chongqing Emergency Medical Center, School of Medicine, Chongqing University Central Hospital, Chongqing University, Chongqing, China.
| |
Collapse
|
6
|
Chen H, Wang D, Shen J, Guo B, Song C, Ma D, Wu Y, Liu G, Chen G, Ni Y, Kong T, Wang F. Predicting peripartum depression using elastic net regression and machine learning: the role of remnant cholesterol. BMC Pregnancy Childbirth 2025; 25:544. [PMID: 40340559 PMCID: PMC12060319 DOI: 10.1186/s12884-025-07656-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 04/25/2025] [Indexed: 05/10/2025] Open
Abstract
BACKGROUND Traditional statistical methods have dominated research on peripartum depression (PPD), but innovative approaches may provide deeper insights. This study aims to predict the impact factors of PPD using elastic net regression (ENR) combined with machine learning (ML) model. METHODS This longitudinal study was conducted from June 2020 to May 2023, involving healthy pregnant women in the first trimester, followed up until the completion of the assessment in the second trimester. PPD symptoms were assessed using the Edinburgh Postnatal Depression Scale (EPDS). Features with p <.05 from logistic regression were selected and refined using ENR. These features were then used to build six ML models to identify the best-performing one. SHapley Additive exPlanations (SHAP) analysis was employed to enhance model interpretability by visualizing its decision-making process. RESULTS A total of 608 participants were followed, resulting in 384 valid questionnaires. After excluding incomplete or incorrect baseline data, 325 participants were ultimately included in the study. Among these, 130 were classified as having mild depression, and 32 were classified with major depression. Nineteen features were initially identified as being associated with PPD, with 14 retained after ENR refinement. The random forest (RF) model outperformed the other ML models. SHAP analysis identified the top five predictors of PPD: magnesium (Mg), remnant cholesterol (RC), calcium (Ca), mean corpuscular hemoglobin concentration (MCHc), and potassium (K). Mg, Ca, MCHc, and K were negatively correlated with PPD, while RC showed a positive correlation. CONCLUSIONS The RF model effectively identified associations between exposure factors and PPD. Mg, Ca, MCHc, and K were found to be protective factors, while RC emerged as a potential risk factor, highlighting its potential as a novel biomarker for PPD.
Collapse
Affiliation(s)
- Hongxu Chen
- School of Public Health, Xinjiang Medical University, Urumqi, 830063, China
| | - Denglan Wang
- Xinjiang Key Laboratory of Neurological Disorder Research, the Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830063, China
| | - Juanjuan Shen
- Xinjiang Key Laboratory of Neurological Disorder Research, the Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830063, China
| | - Baoyan Guo
- Xinjiang Key Laboratory of Neurological Disorder Research, the Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830063, China
| | - Chun Song
- Xinjiang Key Laboratory of Neurological Disorder Research, the Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830063, China
| | - Duo Ma
- Department of Ultrasonography, The Second Afffliated Hospital of Xiamen Medical College, Xiamen, China
| | - Yan Wu
- Beijing Hui-Long-Guan Hospital, Peking University, Beijing, 100096, China
| | - Guohui Liu
- Inner Mongolia Maternity and Child Health Care Hospital, Huhhot, 010020, China
| | - Guangxue Chen
- Department of Gynaecology and Obstetrics, Beijing Jishuitan Hospital, Capital Medical University, Beijing, 100035, China
| | - Yan Ni
- Department of Women Health Care, Quzhou Maternal and Child Health Care Hospital, Quzhou, 324000, China
| | - Tiantian Kong
- Xinjiang Key Laboratory of Neurological Disorder Research, the Second Affiliated Hospital of Xinjiang Medical University, Urumqi, 830063, China.
| | - Fan Wang
- Beijing Hui-Long-Guan Hospital, Peking University, Beijing, 100096, China.
| |
Collapse
|
7
|
Xiao Z, Wang M, Zhao Y, Wang H. A Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus. Food Sci Nutr 2025; 13:e70234. [PMID: 40313792 PMCID: PMC12041655 DOI: 10.1002/fsn3.70234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2025] [Revised: 04/15/2025] [Accepted: 04/21/2025] [Indexed: 05/03/2025] Open
Abstract
Diabetes is one of the leading causes of death and disability worldwide. Developing earlier and more accurate diagnosis methods is crucial for clinical prevention and treatment of diabetes. Here, data on biochemical indicators and physiological characteristics of 4335 participants from the National Health and Nutrition Examination Survey (NHANES) database from 2017 to 2020 were collected. After data preprocessing, the dataset was randomly divided into a training set (70%) and a test set (30%); then the Boruta algorithm was used to screen feature indicators on the training set. Next, three machine learning algorithms, including Random Forest (RF), Multi-Layer Perceptron (MLP), and Extreme Gradient Boosting (XGBoost) were employed to build predictive models through 10-fold cross-validation on the training dataset, followed by performance evaluation on the test dataset. The RF model exhibited the best performance, with an area under the curve (AUC) of 0.958 (95% CI: 0.943-0.973), a recall of 0.897, a specificity and F1 score of 0.916 and 0.747, respectively, and an overall accuracy of 0.913. Moreover, SHapley Additive exPlanations (SHAP) and Partial Dependency Plots (PDP) were applied to interpret the RF model to analyze the risk factors for diabetes. Glycohemoglobin, glucose, fasting glucose, age, cholesterol, osmolality, BMI, blood urea nitrogen, and insulin were found to exert the greatest influence on the prevalence of diabetes. Collectively, the RF model has considerable application prospects for the diagnosis of diabetes and can serve as a valuable supplementary tool for clinical diagnosis and risk assessment in diabetes.
Collapse
Affiliation(s)
- Zhihui Xiao
- College of Food Science and TechnologyShanghai Ocean UniversityShanghaiChina
| | - Mingfu Wang
- Shenzhen Key Laboratory of Food Nutrition and Health, College of Chemistry and Environmental EngineeringShenzhen UniversityShenzhenChina
| | - Yueliang Zhao
- College of Food Science and TechnologyShanghai Ocean UniversityShanghaiChina
- School of Public HealthShanghai Jiao Tong University School of MedicineShanghaiChina
| | - Hui Wang
- School of Public HealthShanghai Jiao Tong University School of MedicineShanghaiChina
| |
Collapse
|
8
|
Proshad R, Chandra K, Islam M, Khurram D, Rahim MA, Asif MR, Idris AM. Evaluation of machine learning models for accurate prediction of heavy metals in coal mining region soils in Bangladesh. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2025; 47:181. [PMID: 40266355 DOI: 10.1007/s10653-025-02489-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Accepted: 03/30/2025] [Indexed: 04/24/2025]
Abstract
Coal mining soils are highly susceptible to heavy metal pollution due to the discharge of mine tailings, overburden dumps, and acid mine drainage. Developing a reliable predictive model for heavy metal concentrations in this region has proven to be a significant challenge. This study employed machine learning (ML) techniques to model heavy metal pollution in soils within this critical ecosystem. A total of 91 standardized soil samples were analyzed to predict the accumulation of eight heavy metals using four distinct ML algorithms. Among them, random forest model outer performed in predicting As (0.79), Cd (0.89), Cr (0.63), Ni (0.56), Cu (0.60), and Zn (0.52), achieving notable R squared values. The feature attribute analysis identified As-K, Pb-K, Cd-S, Zn-Fe2O3, Cr- Fe2O3, Ni-Al2O3, Cu-P, and Mn- Fe2O3 relationships resembled with correlation coefficients among them. The developed models revealed that the contamination factor for metals in soils indicated extremely high levels of Pb contamination (CF ≥ 6). In conclusion, this research offers a robust framework for predicting heavy metal pollution in coal mining soils, highlighting critical areas that require immediate conservation efforts. These findings emphasize the necessity for targeted environmental management and mitigation to reduce heavy metal pollution in mining sites.
Collapse
Affiliation(s)
- Ram Proshad
- State Key Laboratory of Mountain Hazards and Engineering Safety, Institute of Mountain Hazards and Environment, Chinese Academy of Sciences, Chengdu, 610041, Sichuan, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Krishno Chandra
- Faculty of Agricultural Engineering and Technology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Maksudul Islam
- Department of Environmental Science, Patuakhali Science and Technology University, Dumki, Patuakhali, 8602, Bangladesh
| | - Dil Khurram
- College of Ecology and Environment, Chengdu University of Technology, Chengdu, 610059, Sichuan, China
| | - Md Abdur Rahim
- University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Laboratory of Mountain Hazards and Engineering Resilience, Institute of Mountain Hazards and Environment, Chinese Academy of Sciences (CAS), Chengdu, 610299, China
- Department of Disaster Resilience and Engineering, Patuakhali Science and Technology University, Dumki, Patuakhali, 8602, Bangladesh
| | - Maksudur Rahman Asif
- College of Environment and Ecology, Taiyuan University of Technology, Jinzhong, 030600, Shanxi, China
| | - Abubakr M Idris
- Department of Chemistry, College of Science, King Khalid University, 62529, Abha, Saudi Arabia.
- Research Center for Advanced Materials Science (RCAMS), King Khalid University, 62529, Abha, Saudi Arabia.
| |
Collapse
|
9
|
Liu J, Wang B, Li Q. Machine learning model for age related macular degeneration based on pesticides: the National Health and Nutrition Examination Survey 2007-2008. Front Public Health 2025; 13:1561913. [PMID: 40308919 PMCID: PMC12042703 DOI: 10.3389/fpubh.2025.1561913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Accepted: 03/24/2025] [Indexed: 05/02/2025] Open
Abstract
Age-related macular degeneration (AMD) is the most common cause of irreversible deterioration of vision in older adults. Previous studies have found that exposure to pesticides can lead to a worsening of AMD. In this paper, information on pesticide exposure and AMD from the National Health and Nutrition Examination Survey (NHANES) database was used to divide the data into a training set and a validation set. Firstly, the correlation between the variables in the model is analyzed. The model is then built using nine machine learning algorithms and verified on a validation set. Finally, it is found that the random forest model has high predictive value, and its Receiver Operating Characteristic (ROC) value is 0.75. Finally, SHapley additive interpretation (SHAP) analysis was used to rank the importance of each variable in the random forest model, and it was found that chlorpyrifos and malathion had quite significant effects on the occurrence and development of AMD.
Collapse
Affiliation(s)
| | | | - Qiuming Li
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| |
Collapse
|
10
|
Liu J, Li X, Wang Y, Xu Z, Lv Y, He Y, Chen L, Feng Y, Liu G, Bai Y, Xie W, Wu Q. Predicting postoperative pulmonary infection in elderly patients undergoing major surgery: a study based on logistic regression and machine learning models. BMC Pulm Med 2025; 25:128. [PMID: 40108569 PMCID: PMC11921591 DOI: 10.1186/s12890-025-03582-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 03/05/2025] [Indexed: 03/22/2025] Open
Abstract
BACKGROUND Postoperative pulmonary infection (POI) is strongly associated with a poor prognosis and has a high incidence in elderly patients undergoing major surgery. Machine learning (ML) algorithms are increasingly being used in medicine, but the predictive role of logistic regression (LR) and ML algorithms for POI in high-risk populations remains unclear. METHODS We conducted a retrospective cohort study of older adults undergoing major surgery over a period of six years. The included patients were randomly divided into training and validation sets at a ratio of 7:3. The features selected by the least absolute shrinkage and selection operator regression algorithm were used as the input variables of the ML and LR models. The random forest of multiple interpretable methods was used to interpret the ML models. RESULTS Of the 9481 older adults in our study, 951 developed POI. Among the different algorithms, LR performed the best with an AUC of 0.80, whereas the decision tree performed the worst with an AUC of 0.75. Furthermore, the LR model outperformed the other ML models in terms of accuracy (88.22%), specificity (90.29%), precision (44.42%), and F1 score (54.25%). Despite employing four interpretable methods for RF analysis, there existed a certain degree of inconsistency in the results. Finally, to facilitate clinical application, we established a web-friendly version of the nomogram based on the LR algorithm; In addition, patients were divided into three significantly distinct risk intervals in predicting POI. CONCLUSIONS Compared with popular ML algorithms, LR was more effective at predicting POI in older patients undergoing major surgery. The constructed nomogram could identify high-risk elderly patients and facilitate perioperative management planning. TRIAL REGISTRATION The study was retrospectively registered (NCT06491459).
Collapse
Affiliation(s)
- Jie Liu
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
- Department of Anesthesiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xia Li
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Yanting Wang
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Zhenzhen Xu
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Yong Lv
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Yuyao He
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Lu Chen
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Yiqi Feng
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Guoyang Liu
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Yunxiao Bai
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Wanli Xie
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Qingping Wu
- Department of Anesthesiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China.
| |
Collapse
|
11
|
Deng L, Liu K, Fan Y, Qian X, Ke T, Liu T, Li M, Xu X, Yang D, Li H. Interpretable machine learning models reveal the partnership of microplastics and perfluoroalkyl substances in sediments at a century scale. JOURNAL OF HAZARDOUS MATERIALS 2025; 486:137018. [PMID: 39740544 DOI: 10.1016/j.jhazmat.2024.137018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/13/2024] [Accepted: 12/25/2024] [Indexed: 01/02/2025]
Abstract
It is challenging to explore the complex interactions between perfluoroalkyl substances (PFASs) and microplastics in lake sediments. The partnership of perfluoroalkyl substances (PFASs) and microplastics in lake sediments are difficult to determine experimentally. This study utilized sediment cores from Taihu Lake to reconstruct the coexistence history and innovatively reveal the collaboration between PFASs and microplastics by using post-hoc interpretable machine learning methods. Microplastics and PFASs emerged in the 1960s and have significantly increased since the 1990s. PFASs and microplastics had the highest growth rate in the 0-10 cm range, with average growth rates of 35.96 pg/g/year and 4.40 items/year per 100 g, respectively. Extreme gradient boosting demonstrated the best simulation of PFASs and microplastics in machine learning models. Feature importance and Shapley additive explanations semi-quantitatively clarified the importance of transparent and pellet microplastics on PFASs concentrations, as well as the importance of perfluorooctane sulfonate (PFOS) and ΣPFASs on microplastics. Moisture content, redox potential, χfd, and χARM were the key influencing factors on contaminants. Partial dependence plots showed the influencing thresholds were 0.30 ng/g for ΣPFASs and 0.15 ng/g for PFOS on microplastics, and 10 items per 100 g for pellets and 12 items per 100 g for transparent plastics on PFASs. This study elucidated the interactions between two typical emerging contaminants on a century-scale through the intersection of environmental geochemistry and interpretable machine learning.
Collapse
Affiliation(s)
- Ligang Deng
- School of Environment, Nanjing Normal University, Nanjing 210023, China; State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Kai Liu
- School of Environment, Nanjing Normal University, Nanjing 210023, China; State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Yifan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Xin Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China; Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China
| | - Tong Ke
- School of Environment, Nanjing Normal University, Nanjing 210023, China
| | - Tong Liu
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo, Japan
| | - Mingjia Li
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Xiaohan Xu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Daojun Yang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Huiming Li
- School of Environment, Nanjing Normal University, Nanjing 210023, China; Jiangsu Province Engineering Research Center of Environmental Risk Prevention and Emergency Response Technology, Nanjing 210023, China.
| |
Collapse
|
12
|
Araujo-Moura K, Souza L, de Oliveira TA, Rocha MS, De Moraes ACF, Chiavegatto Filho A. Prediction of Hypertension in the Pediatric Population Using Machine Learning and Transfer Learning: A Multicentric Analysis of the SAYCARE Study. Int J Public Health 2025; 70:1607944. [PMID: 40145015 PMCID: PMC11937837 DOI: 10.3389/ijph.2025.1607944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 02/25/2025] [Indexed: 03/28/2025] Open
Abstract
Objective To develop a machine learning (ML) model utilizing transfer learning (TL) techniques to predict hypertension in children and adolescents across South America. Methods Data from two cohorts (children and adolescents) in seven South American cities were analyzed. A TL strategy was implemented by transferring knowledge from a CatBoost model trained on the children's sample and adapting it to the adolescent sample. Model performance was evaluated using standard metrics. Results Among children, the prevalence of normal blood pressure was 88.9% (301 participants), while 14.1% (50 participants) had elevated blood pressure (EBP). In the adolescent group, the prevalence of normal blood pressure was 92.5% (284 participants), with 7.5% (23 participants) presenting with EBP. Random Forest, XGBoost, and LightGBM achieved high accuracy (0.90) for children, with XGBoost and LightGBM demonstrating superior recall (0.50) and AUC-ROC (0.74). For adolescents, models without TL showed poor performance, with accuracy and recall values remaining low and AUC-ROC ranging from 0.46 to 0.56. After applying TL, model performance improved significantly, with CatBoost achieving an AUC-ROC of 0.82, accuracy of 1.0, and recall of 0.18. Conclusion Soft drinks, filled cookies, and chips were key dietary predictors of elevated blood pressure, with higher intake in adolescents. Machine learning with transfer learning effectively identified these risks, emphasizing the need for early dietary interventions to prevent hypertension and support cardiovascular health in pediatric populations.
Collapse
Affiliation(s)
- Keisyanne Araujo-Moura
- Department of Epidemiology, School of Public Health, University of São Paulo, São Paulo, Brazil
| | - Letícia Souza
- Department of Epidemiology, School of Public Health, University of São Paulo, São Paulo, Brazil
| | | | - Mateus Silva Rocha
- Department of Statistic, State University of Paraíba, Campina Grande, Paraíba, Brazil
| | - Augusto César Ferreira De Moraes
- School of Public Health in Austin, Department of Epidemiology, Michael and Susan Dell Center for Healthy Living, Texas Physical Activity Research Collaborative (Texas PARC), University of Texas Health Science Center at Houston, Houston, TX, United States
| | | |
Collapse
|
13
|
Tang Q, Wang Y, Luo Y. An interpretable machine learning model with demographic variables and dietary patterns for ASCVD identification: from U.S. NHANES 1999-2018. BMC Med Inform Decis Mak 2025; 25:105. [PMID: 40033349 PMCID: PMC11874124 DOI: 10.1186/s12911-025-02937-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 02/18/2025] [Indexed: 03/05/2025] Open
Abstract
Current research on the association between demographic variables and dietary patterns with atherosclerotic cardiovascular disease (ASCVD) is limited in breadth and depth. This study aimed to construct a machine learning (ML) algorithm that can accurately and transparently establish correlations between demographic variables, dietary habits, and ASCVD. The dataset used in this research originates from the United States National Health and Nutrition Examination Survey (U.S. NHANES) spanning 1999-2018. Five ML models were developed to predict ASCVD, and the best-performing model was selected for further analysis. The study included 40,298 participants. Using 20 population characteristics, the eXtreme Gradient Boosting (XGBoost) model demonstrated high performance, achieving an area under the curve value of 0.8143 and an accuracy of 88.4%. The model showed a positive correlation between male sex and ASCVD risk, while age and smoking also exhibited positive associations with ASCVD risk. Dairy product intake exhibited a negative correlation, while a lower intake of refined grains did not reduce the risk of ASCVD. Additionally, the poverty income ratio and calorie intake exhibited non-linear associations with the disease. The XGBoost model demonstrated significant efficacy, and precision in determining the relationship between the demographic characteristics and dietary intake of participants in the U.S. NHANES 1999-2018 dataset and ASCVD.
Collapse
Affiliation(s)
- Qun Tang
- Department of Cardiovascular Medicine, Wuhu City Second People's Hospital, Wuhu, 241000, China
| | - Yong Wang
- Department of Cardiovascular Medicine, Wuhu City Second People's Hospital, Wuhu, 241000, China
| | - Yan Luo
- Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100020, China.
| |
Collapse
|
14
|
Hu J, Yang L, Kang N, Wang N, Shen L, Zhang X, Liu S, Li H, Xue T, Ma S, Zhu T. Associations between long-term exposure to fine particulate matter and its constituents with lung cancer incidence: Evidence from a prospective cohort study in Beijing, China. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 368:125686. [PMID: 39842494 DOI: 10.1016/j.envpol.2025.125686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 12/29/2024] [Accepted: 01/12/2025] [Indexed: 01/24/2025]
Abstract
Association between long-term exposure to ambient fine particulate matter (PM2.5) and lung cancer incidence is well-documented. However, the role of different PM2.5 constituents [black carbon (BC), ammonium (NH4+), nitrate (NO3-), organic matter (OM), and inorganic sulfate (SO42-)] remain unclear. The study aimed to specify the associations between PM2.5 constituents and lung cancer incidence. Based on a prospective cohort of 130,860 participants in Beijing, the present study utilized Cox model to explore the associations between PM2.5 constituents and lung cancer incidence. We further used mixed exposure models [weighted quantile sum (WQS) and quantile-based g-computation (Qgcomp)] and machine learning model [random forest model with SHapley Additive exPlanations (SHAP)] to specify the importance of each constituent. Results indicated that PM2.5 mass and its constituents were significantly associated with increased lung cancer incidence. The hazard ratios (HRs) and 95% confidence intervals (CIs) of 1-μg/m3 increase in the 5-year average concentrations were 1.01 (95% CI: 1.00, 1.02) for PM2.5 mass, 1.23 (95% CI: 1.06, 1.42) for BC, 1.15 (95% CI: 1.04, 1.27) for NH4+, 1.08 (95% CI: 1.02, 1.16) for NO3-, 1.04 (95% CI: 1.01, 1.06) for OM, and 1.08 (95% CI: 1.03, 1.15) for SO42-. Both the WQS and Qgcomp models assigned the two highest positive weights to BC and SO42-. SHAP analysis identified SO42- and BC as the first and third most important contributors, respectively. Our results indicated that PM2.5 mass and its constituents were significantly associated with lung cancer incidence, and BC and SO42- were the key constituents in these associations.
Collapse
Affiliation(s)
- Jinlong Hu
- College of Environmental Sciences and Engineering, Peking University, Beijing, China
| | - Lei Yang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Beijing Office for Cancer Prevention and Control, Peking University Cancer Hospital & Institute, Beijing, 100142, China; Peking University Cancer Hospital (Inner Mongolia Campus)/Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Center, Hohhot, 010020, China
| | - Ning Kang
- Institute of Reproductive and Child Health / National Health Commission Key Laboratory of Reproductive Health and Department of Epidemiology and Biostatistics / Ministry of Education Key Laboratory of Epidemiology of Major Diseases (PKU), School of Public Health, Peking University Health Science Centre, Beijing, China.
| | - Ning Wang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Beijing Office for Cancer Prevention and Control, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Luyan Shen
- Key Laboratory of Carcinogenesis and Translational Research, Department of Thoracic Surgery I, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Xi Zhang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Beijing Office for Cancer Prevention and Control, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Shuo Liu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Beijing Office for Cancer Prevention and Control, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Huichao Li
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Beijing Office for Cancer Prevention and Control, Peking University Cancer Hospital & Institute, Beijing, 100142, China
| | - Tao Xue
- Institute of Reproductive and Child Health / National Health Commission Key Laboratory of Reproductive Health and Department of Epidemiology and Biostatistics / Ministry of Education Key Laboratory of Epidemiology of Major Diseases (PKU), School of Public Health, Peking University Health Science Centre, Beijing, China; Advanced Institute of Information Technology, Peking University, Hangzhou, China; State Environmental Protection Key Laboratory of Atmospheric Exposure and Health Risk Management, Center for Environment and Health, Peking University, Beijing, China
| | - Shaohua Ma
- State Key Laboratory of Molecular Oncology, Beijing, Key Laboratory of Carcinogenesis and Translational Research, Department of Thoracic Surgery I, Peking University Cancer Hospital & Institute, Beijing, 100142, China.
| | - Tong Zhu
- College of Environmental Sciences and Engineering, Peking University, Beijing, China; State Environmental Protection Key Laboratory of Atmospheric Exposure and Health Risk Management, Center for Environment and Health, Peking University, Beijing, China
| |
Collapse
|
15
|
Wang X, Chen G, He R, Gao Y, Lu J, Xu T, Liu H, Jiang Z. Machine learning prediction of glaucoma by heavy metal exposure: results from the National Health and Nutrition Examination Survey 2005 to 2008. Sci Rep 2025; 15:4891. [PMID: 39929915 PMCID: PMC11811145 DOI: 10.1038/s41598-025-88698-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 01/30/2025] [Indexed: 02/13/2025] Open
Abstract
Using follow-up data from the National Health and Nutrition Examination Survey (NHANES) database, we have collected information on 2572 subjects and used generalized linear model to investigate the association between urinary heavy metal levels and glaucoma risk. In addition, we have developed an individualized risk prediction model using machine learning algorithms and further interpreted the model results through feature importance analysis, local cumulative analysis, and interaction effects. In this study, we found significant association between logarithmically calculated arsenic (As) metabolites, especially arsenochlorine (AC), and glaucoma after adjusting for a series of confounders, including urinary creatinine (β = 1.090, 95% CI: 0.313-1.835). The Shapley Additive Explanations (SHAP) analysis results and clinical risk scores also indicated that As metabolites promoted glaucoma more severely than other variables. This study applied machine learning for the first time to explore the relationship between heavy metals and glaucoma while analyzing the effects of multiple heavy metal exposures on the disease, improving the predictive power compared to conventional models. Our results provided important insights into the potential role of heavy metals in the pathogenesis of glaucoma, facilitated the discovery of new biomarkers for early diagnosis, risk assessment, and timely treatment of glaucoma, and guided public health measures to reduce heavy metal exposure.
Collapse
Affiliation(s)
- Xinchen Wang
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China
| | - Gang Chen
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China
- Department of Ophthalmology, The Traditional Chinese Medicine Hospital of Jinzhai County, 233 Hongjun Avenue, Lu'an, 237000, China
| | - Rui He
- Department of Ophthalmology, The Lu'an Hospital Affiliated to Anhui Medical University, 21 West Anhui Road, Lu'an, 237005, China
- Department of Ophthalmology, The Lu'an People's Hospital, 21 West Anhui Road, Lu'an, 237005, China
| | - Yuting Gao
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China
| | - Jingwen Lu
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China
| | - Tongcheng Xu
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China
| | - Heting Liu
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China.
| | - Zhengxuan Jiang
- Department of Ophthalmology, The Second Affliated Hospital of Anhui Medical University, 678 Furong Road, Hefei, 230601, China.
| |
Collapse
|
16
|
Wu H, Li Y, Jiang Y, Li X, Wang S, Zhao C, Yang X, Chang B, Yang J, Qiao J. Machine learning prediction of obesity-associated gut microbiota: identifying Bifidobacterium pseudocatenulatum as a potential therapeutic target. Front Microbiol 2025; 15:1488656. [PMID: 39974372 PMCID: PMC11839209 DOI: 10.3389/fmicb.2024.1488656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 12/05/2024] [Indexed: 02/21/2025] Open
Abstract
Background The rising prevalence of obesity and related metabolic disorders highlights the urgent need for innovative research approaches. Utilizing machine learning (ML) algorithms to predict obesity-associated gut microbiota and validating their efficacy with specific bacterial strains could significantly enhance obesity management strategies. Methods We leveraged gut microbiome data from 1,563 healthy individuals and 2,043 overweight patients sourced from the GMrepo database. We assessed the anti-obesity effects of Bifidobacterium pseudocatenulatum through experimentation with Caenorhabditis elegans and C3H10T1/2 cells. Results Our analysis revealed a significant correlation between gut bacterial composition and body weight. The top 40 bacterial species were utilized to develop ML models, with XGBoost demonstrating the highest predictive accuracy. SHAP analysis indicated a negative association between the relative abundance of six bacterial species, including B. pseudocatenulatum, and body mass index (BMI). Furthermore, B. pseudocatenulatum was shown to reduce lipid accumulation in C. elegans and inhibit lipid differentiation in C3H10T1/2 cells. Conclusion Bifidobacterium pseudocatenulatum holds potential as a therapeutic agent for managing diet-induced obesity, underscoring its relevance in microbiome-based obesity research and intervention.
Collapse
Affiliation(s)
- Hao Wu
- Zhejiang Institute of Tianjin University (Shaoxing), Shaoxing, China
| | - Yuan Li
- NHC Key Lab of Hormones and Development and Tianjin Key Lab of Metabolic Diseases, Tianjin Medical University Chu Hsien-I Memorial Hospital & Institute of Endocrinology, Tianjin, China
| | - Yuxuan Jiang
- Yidu Cloud (Beijing) Technology Co., Ltd., Beijing, China
| | - Xinran Li
- NHC Key Lab of Hormones and Development and Tianjin Key Lab of Metabolic Diseases, Tianjin Medical University Chu Hsien-I Memorial Hospital & Institute of Endocrinology, Tianjin, China
| | - Shenglan Wang
- NHC Key Lab of Hormones and Development and Tianjin Key Lab of Metabolic Diseases, Tianjin Medical University Chu Hsien-I Memorial Hospital & Institute of Endocrinology, Tianjin, China
| | - Changle Zhao
- Zhejiang Institute of Tianjin University (Shaoxing), Shaoxing, China
- Department of Pharmaceutical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Ximiao Yang
- Zhejiang Institute of Tianjin University (Shaoxing), Shaoxing, China
| | - Baocheng Chang
- NHC Key Lab of Hormones and Development and Tianjin Key Lab of Metabolic Diseases, Tianjin Medical University Chu Hsien-I Memorial Hospital & Institute of Endocrinology, Tianjin, China
| | - Juhong Yang
- NHC Key Lab of Hormones and Development and Tianjin Key Lab of Metabolic Diseases, Tianjin Medical University Chu Hsien-I Memorial Hospital & Institute of Endocrinology, Tianjin, China
- Guangdong Medical University, Zhanjiang, China
| | - Jianjun Qiao
- Zhejiang Institute of Tianjin University (Shaoxing), Shaoxing, China
- Department of Pharmaceutical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
17
|
Chen J. Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015-2018. BMC Public Health 2025; 25:455. [PMID: 39905341 PMCID: PMC11796195 DOI: 10.1186/s12889-025-21658-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Accepted: 01/28/2025] [Indexed: 02/06/2025] Open
Abstract
BACKGROUND Alveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine regression. Overfitting using kernel function, long calculation period, the definition of prior distribution and lack of rank of heavy metal will affect the performance of the statistical model. Optimal model on this topic still remains controversy. This study aimed: (1) to develop an algorithm for exploring the association between heavy metal exposure and ABL; (2) filter the actual causal variables and investigate how heavy metals were associated with ABL; and (3) identify the potential risk factors for ABL. METHODS Data were collected from National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018 to develop a machine learning (ML) model. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. The selected data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) and divided into a training set and testing set at a 3:1 ratio. Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), and XGboost were used to construct the ML model. Accuracy, Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 score were used to select the optimal model for further analysis. The contribution of the variables to the ML model was explained using the Shapley Additive Explanations (SHAP) method. RESULTS RF showed the best performance in exploring the association between heavy metal exposure and ABL, with an AUC (0.88), accuracy (0.78), precision (0.76), recall (0.83), and F1 score (0.79). Age was the most important factor in the ML model (mean| SHAP value| = 0.09), and Cd was the primary contributor. Sex had little effect on the ML model contribution. CONCLUSION In this study, RF showed superior performance compared with the other five algorithms. Among the 12 heavy metals, Cd was the most important factor in the ML model. The relationship of Co & Pb and ABL are weaker than that of Cd. Among all the independent variables, age was considered the most important factor for this model. As for PIR, low-income participants present association with ABL. Mexican American and Non-Hispanic White show low association with ABL compared to Non-Hispanic Black and other races. Gender feature demonstrates a weak association with ABL. In the future, more advanced algorithms should be developed to validate these results and related parameters can be tuned to improve the accuracy of the model. CLINICAL TRIAL NUMBER not applicable.
Collapse
Affiliation(s)
- Jiayi Chen
- Department of stomatology, Suzhou Wujiang District Hospital of Traditional Chinese Medicine, Dachun road 999, Wujiang District, Suzhou, 215221, PR China.
| |
Collapse
|
18
|
Shen M, Zhang Y, Zhan R, Du T, Shen P, Lu X, Liu S, Guo R, Shen X. Predicting the risk of cardiovascular disease in adults exposed to heavy metals: Interpretable machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2025; 290:117570. [PMID: 39721423 DOI: 10.1016/j.ecoenv.2024.117570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/16/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024]
Abstract
Machine learning exhibits excellent performance in terms of predictive power. We aimed to construct an interpretable machine learning model utilizing National Health and Nutrition Examination Survey data to investigate the relationship between heavy metal exposure and cardiovascular disease (CVD). A total of 4600 adults were included in the analysis. The Least Absolute Shrinkage and Selection Operator regression method was employed to select relevant feature variables. Subsequently, six machine learning models were constructed, including random forest, decision tree, gradient boosting decision tree, k-nearest neighbor, support vector machine, and AdaBoost algorithms. Feature importance analysis, partial dependence plot, and shapley additive explanations were integrated to enhance the interpretability of the CVD prediction model. Among all models, the random forest exhibited the best performance, with an accuracy of 90 %, an area under the curve of 0.85, and an F1 score of 0.86. Urine cadmium (Cd), blood lead (Pb), urine thallium (Tl), and urine tungsten (W) were identified as the most significant predictors of CVD, with importance scores of 0.062, 0.057, 0.051, and 0.050, respectively. At the overall level, higher levels of urine Cd, blood Pb, and urine W were associated with an increased risk of CVD, whereas a lower level of urine Tl was linked to a reduced CVD risk. Additionally, the analysis of synergistic effects revealed that Cd was the predominant determinant of CVD risk. The random forest-based CVD prediction model demonstrated excellent predictive power and provided valuable insights for personalized patient care and optimal resource allocation in populations exposed to heavy metals.
Collapse
Affiliation(s)
- Meiyue Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Yine Zhang
- Ningxia Center for Disease Control and Prevention, Yinchuan, China
| | | | - Tingwei Du
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Peixuan Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Xiaochuan Lu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Shengnan Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China; Ningxia Center for Disease Control and Prevention, Yinchuan, China; Qingdao Haici Hospital, Qingdao 266033, China
| | - Rongrong Guo
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Xiaoli Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| |
Collapse
|
19
|
Yu L, Cao S, Song B, Hu Y. Predicting grip strength-related frailty in middle-aged and older Chinese adults using interpretable machine learning models: a prospective cohort study. Front Public Health 2024; 12:1489848. [PMID: 39741944 PMCID: PMC11685125 DOI: 10.3389/fpubh.2024.1489848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Accepted: 12/02/2024] [Indexed: 01/03/2025] Open
Abstract
Introduction Frailty is an emerging global health burden, and there is no consensus on the precise prediction of frailty. We aimed to explore the association between grip strength and frailty and interpret the optimal machine learning (ML) model using the SHapley Additive exPlanation (SHAP) to predict the risk of frailty. Methods Data for the study were extracted from the China Health and Retirement Longitudinal Study (CHARLS) database. Socio-demographic, medical history, anthropometric, psychological, and sleep parameters were analyzed in this study. We used the least absolute shrinkage and selection operator (LASSO) regression to filter the model for the best predictor variables and constructed six ML models for predicting frailty. The feature performance of six ML models was compared based on the area under the receiver operating characteristic curve (AUROC) and the light gradient boosting machine (LightGBM) model was selected as the best predictive frailty model. We used SHAP to interpret the LightGBM model and to reveal the decision-making process by which the model predicts frailty. Results A total of 10,834 eligible participants were included in the study. Using the lowest quartile of grip strength as a reference, grip strength was negatively associated with the risk of frailty when grip strength was >29.00 kg for males or >19.00 kg for females (p < 0.001). The LightGBM model predicted frailty with optimal performance with an AUROC of 0.768 (95% CI 0.741 ~ 0.795). The SHAP summary plot showed that all features predicted frailty in order of importance, with cognitive function being considered the most important predictive feature. The poorer the cognitive function, nighttime sleep duration, body mass index (BMI), and grip strength, the higher the risk of frailty in middle-aged and older adults. The SHAP individual force plot clearly shows that the LightGBM model predicts frailty in the individual decision-making process. Conclusion The grip strength-related LightGBM prediction model based on SHAP has high accuracy and robustness in predicting the risk of frailty. Increasing grip strength, cognitive function, nighttime sleep duration, and BMI reduce the risk of frailty and may provide strategies for individualized management of frailty.
Collapse
Affiliation(s)
- Lisheng Yu
- Neurosurgery, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Wenzhou Municipal Key Laboratory of Neurodevelopmental Pathology and Physiology, Wenzhou Medical University, Wenzhou, China
| | - Shunshun Cao
- Pediatric Endocrinology, Genetics and Metabolism, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Botian Song
- Reproductive Medicine Center, Obstetrics and Gynecology, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yangyang Hu
- Reproductive Medicine Center, Obstetrics and Gynecology, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
20
|
Deng L, Fan Y, Li M, Wang S, Xu X, Gao X, Li H, Qian X, Li X. Integration of interpretable machine learning and environmental magnetism elucidates reduction mechanism of bioavailable potentially toxic elements in lakes after monsoon. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 954:176418. [PMID: 39322082 DOI: 10.1016/j.scitotenv.2024.176418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 09/01/2024] [Accepted: 09/18/2024] [Indexed: 09/27/2024]
Abstract
Little information is available on the influence of substantial precipitation and particulate matter entering during the monsoon process on the release of potentially toxic elements (PTEs) into lake sediments. Sediments from a typical subtropical lake across three periods, pre-monsoon, monsoon, and post-monsoon, were collected to determine the chemical forms of 12 PTEs (As, Cd, Co, Cr, Cu, Fe, Hg, Pb, Mn, Ni, Sb, and Zn), magnetic properties, and physicochemical indicators. Feature importance, Shapley additive explanations, and partial dependence plots were used to explore the factors influencing bioavailable PTEs. The proportion of bioavailable forms of PTEs decreased from 3.85 % (Cd) to 87.84 % (Hg) after the monsoon. Gradient extreme boosting demonstrated robust fitting accuracy for the prediction of the bioavailable forms of the 12 PTEs (R2 > 0.84). Shapley additive explanations identified that the bioavailable forms were influenced by the total PTE concentrations, wind, shortwave radiation, and particle inputs (25.1 %-88.5 % for total importance), either individually or in combination. The partial dependence plots highlighted the influence thresholds of background values and anthropogenic factors on the bioavailable forms of PTEs. Changes in environmental properties could indicate the process of external sediment influx into lakes. The optimized model combined with magnetic parameters showed strong performance in other cases (coefficient of determination>0.58), confirming the ubiquitous decrease in bioavailable forms of PTEs in sediments across subtropical lakes after monsoons.
Collapse
Affiliation(s)
- Ligang Deng
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China; School of Environment, Nanjing Normal University, Nanjing 210023, China
| | - Yifan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Mingjia Li
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Shuo Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Xiaohan Xu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Xiang Gao
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Huiming Li
- School of Environment, Nanjing Normal University, Nanjing 210023, China.
| | - Xin Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China; Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China.
| | - Xiaolong Li
- School of Earth and Environment, Anhui University of Science and Technology, Huainan 232001, China
| |
Collapse
|
21
|
Liu J, Li X, Zhu P. Effects of Various Heavy Metal Exposures on Insulin Resistance in Non-diabetic Populations: Interpretability Analysis from Machine Learning Modeling Perspective. Biol Trace Elem Res 2024; 202:5438-5452. [PMID: 38409445 DOI: 10.1007/s12011-024-04126-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/22/2024] [Indexed: 02/28/2024]
Abstract
Increasing and compelling evidence has been proved that heavy metal exposure is involved in the development of insulin resistance (IR). We trained an interpretable predictive machine learning (ML) model for IR in the non-diabetic populations based on levels of heavy metal exposure. A total of 4354 participants from the NHANES (2003-2020) with complete information were randomly divided into a training set and a test set. Twelve ML algorithms, including random forest (RF), XGBoost (XGB), logistic regression (LR), GaussianNB (GNB), ridge regression (RR), support vector machine (SVM), multilayer perceptron (MLP), decision tree (DT), AdaBoost (AB), Gradient Boosting Decision Tree (GBDT), Voting Classifier (VC), and K-Nearest Neighbour (KNN), were constructed for IR prediction using the training set. Among these models, the RF algorithm had the best predictive performance, showing an accuracy of 80.14%, an AUC of 0.856, and an F1 score of 0.74 in the test set. We embedded three interpretable methods, the permutation feature importance analysis, partial dependence plot (PDP), and Shapley additive explanations (SHAP) in RF model for model interpretation. Urinary Ba, urinary Mo, blood Pb, and blood Cd levels were identified as the main influencers of IR. Within a specific range, urinary Ba (0.56-3.56 µg/L) and urinary Mo (1.06-20.25 µg/L) levels exhibited the most pronounced upwards trend with the risk of IR, while blood Pb (0.05-2.81 µg/dL) and blood Cd (0.24-0.65 µg/L) levels showed a declining trend with IR. The findings on the synergistic effects demonstrated that controlling urinary Ba levels might be more crucial for the management of IR. The SHAP decision plot offered personalized care for IR based on heavy metal control. In conclusion, by utilizing interpretable ML approaches, we emphasize the predictive value of heavy metals for IR, especially Ba, Mo, Pb, and Cd.
Collapse
Affiliation(s)
- Jun Liu
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital of Chongqing Medical University, 74 Linjiang Road, Yuzhong District, Chongqing, 400010, China
| | - Xingyu Li
- Cardiovascular Medicine, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Peng Zhu
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital of Chongqing Medical University, 74 Linjiang Road, Yuzhong District, Chongqing, 400010, China.
| |
Collapse
|
22
|
Zhang Y, Chen Y, Su Q, Huang X, Li Q, Yang Y, Zhang Z, Chen J, Xiao Z, Xu R, Zu Q, Du S, Zheng W, Ye W, Xiang J. The use of machine and deep learning to model the relationship between discomfort temperature and labor productivity loss among petrochemical workers. BMC Public Health 2024; 24:3269. [PMID: 39587532 PMCID: PMC11587756 DOI: 10.1186/s12889-024-20713-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 11/12/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND Workplace may not only increase the risk of heat-related illnesses and injuries but also compromise work efficiency, particularly in a warming climate. This study aimed to utilize machine learning (ML) and deep learning (DL) algorithms to quantify the impact of temperature discomfort on productivity loss among petrochemical workers and to identify key influencing factors. METHODS A cross-sectional face-to-face questionnaire survey was conducted among petrochemical workers between May and September 2023 in Fujian Province, China. Initial feature selection was performed using Lasso regression. The dataset was divided into training (70%), validation (20%), and testing (10%) sets. Six predictive models were evaluated: support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), Gaussian Naive Bayes (GNB), multilayer perceptron (MLP), and logistic regression (LR). The most effective model was further analyzed with SHapley Additive exPlanations (SHAP). RESULTS Among the 2393 workers surveyed, 58.4% (1,747) reported productivity loss when working in high temperatures. Lasso regression identified twenty-seven predictive factors such as educational level and smoking. All six models displayed strong prediction accuracy (SVM = 0.775, RF = 0.760, XGBoost = 0.727, GNB = 0.863, MLP = 0.738, LR = 0.680). GNB model showed the best performance, with a cutoff of 0.869, accuracy of 0.863, precision of 0.897, sensitivity of 0.918, specificity of 0.715, and an F1-score of 0.642, indicating its efficacy as a predictive tool. SHAP analysis showed that occupational health training (SHAP value: -3.56), protective measures (-2.61), and less physically demanding jobs (-1.75) were negatively associated with heat-attributed productivity loss, whereas lack of air conditioning (1.92), noise (2.64), vibration (1.15), and dust (0.95) increased the risk of heat-induced productivity loss. CONCLUSIONS Temperature discomfort significantly undermined labor productivity in the petrochemical sector, and this impact may worsen in a warming climate if adaptation and prevention measures are insufficient. To effectively reduce heat-related productivity loss, there is a need to strengthen occupational health training and implement strict controls for occupational hazards, minimizing the potential combined effects of heat with other exposures.
Collapse
Affiliation(s)
- Yilin Zhang
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Yifeng Chen
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Qingling Su
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou, 350122, Fujian Province, China
| | - Xiaoyin Huang
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou, 350122, Fujian Province, China
| | - Qingyu Li
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Yan Yang
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Zitong Zhang
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Jiake Chen
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Zhihong Xiao
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China
| | - Rong Xu
- Minnan Branch of the First Affiliated Hospital of Fujian Medical University, Quangang, Quanzhou, 362100, Fujian Province, China
| | - Qing Zu
- Minnan Branch of the First Affiliated Hospital of Fujian Medical University, Quangang, Quanzhou, 362100, Fujian Province, China
| | - Shanshan Du
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou, 350122, Fujian Province, China
| | - Wei Zheng
- Minnan Branch of the First Affiliated Hospital of Fujian Medical University, Quangang, Quanzhou, 362100, Fujian Province, China.
| | - Weimin Ye
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou, 350122, Fujian Province, China.
| | - Jianjun Xiang
- Department of Preventive Medicine, School of Public Health, Fujian Medical University; and Key Laboratory of Environment and Health, Fujian Province University, 1 North Xue-Fu Rd, Minhou, Fuzhou, 350122, Fujian Province, China.
- School of Public Health, The University of Adelaide, North Terrace Campus, Adelaide, South Australia, 5005, Australia.
| |
Collapse
|
23
|
Nong X, Lai C, Chen L, Wei J. A novel coupling interpretable machine learning framework for water quality prediction and environmental effect understanding in different flow discharge regulations of hydro-projects. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 950:175281. [PMID: 39117235 DOI: 10.1016/j.scitotenv.2024.175281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/01/2024] [Accepted: 08/02/2024] [Indexed: 08/10/2024]
Abstract
Machine learning models (MLMs) have been increasingly used to forecast water pollution. However, the "black box" characteristic for understanding mechanism processes still limits the applicability of MLMs for water quality management in hydro-projects under complex and frequently artificial regulation. This study proposes an interpretable machine learning framework for water quality prediction coupled with a hydrodynamic (flow discharge) scenario-based Random Forest (RF) model with multiple model-agnostic techniques and quantifies global, local, and joint interpretations (i.e., partial dependence, individual conditional expectation, and accumulated local effects) of environmental factor implications. The framework was applied and verified to predict the permanganate index (CODMn) under different flow discharge regulation scenarios in the Middle Route of the South-to-North Water Diversion Project of China (MRSNWDPC). A total of 4664 sampling cases data matrices, including water quality, meteorological, and hydrological indicators from eight national stations along the main canal of the MRSNWDPC, were collected from May 2019 to December 2020. The results showed that the RF models were effective in forecasting CODMn in all flow discharge scenarios, with a mean square error, coefficient of determination, and mean absolute error of 0.006-0.026, 0.481-0.792, and 0.069-0.104, respectively, in the testing dataset. A global interpretation indicated that dissolved oxygen, flow discharge, and surface pressure are the three most important variables of CODMn. Local and joint interpretations indicated that the RF-based prediction model provides a basic understanding of the physical mechanisms of environmental systems. The proposed framework can effectively learn the fundamental environmental implications of water quality variations and provide reliable prediction performance, highlighting the importance of model interpretability for trustworthy machine learning applications in water management projects. This study provides scientific references for applying advanced data-driven MLMs to water quality forecasting and a reliable methodological framework for water quality management and similar hydro-projects.
Collapse
Affiliation(s)
- Xizhi Nong
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China; State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China; Centre for Urban Sustainability and Resilience, Department of Civil, Environmental and Geomatic Engineering, University College London, London WC1E 6BT, UK; School of Computing and Engineering, University of West London, London W5 5RF, UK
| | - Cheng Lai
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China
| | - Lihua Chen
- College of Civil Engineering and Architecture, Guangxi University, Nanning 530004, China.
| | - Jiahua Wei
- State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing 100084, China
| |
Collapse
|
24
|
Gao X, Liu C, Yin L, Wang A, Li J, Gao Z. Machine learning model for age-related macular degeneration based on heavy metals: The National Health and Nutrition Examination Survey 2005 to 2008. Sci Rep 2024; 14:26913. [PMID: 39506000 PMCID: PMC11541880 DOI: 10.1038/s41598-024-78412-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 10/30/2024] [Indexed: 11/08/2024] Open
Abstract
Age-related macular degeneration (AMD) is the leading cause of blindness in older people in developed countries. It has been suggested that heavy metal exposure may be associated with the development of AMD, but most studies have focused on the effects of a single metal with traditional methods. In this study, we analyzed the relationship between 13 urinary heavy metal concentrations and AMD using NHANES data between 2005 and 2008. We constructed and compared 11 machine learning models to identify the best model for predicting AMD risk. We further interpreted the models by Permutation Feature Importance (PFI), Partial Dependence Plot (PDP) analysis, and SHapley Additive exPlanations (SHAP) analysis. 216 AMD patients out of 2380 participants. The random forest (RF) model performed optimally in predicting the risk of AMD, with an AUC value of 0.970. PFI analyses revealed that age and urinary cadmium (Cd) were the main factors influencing the risk of AMD. SHAP analyses further confirmed the significance of Cd concentration in predicting the risk of AMD, and we revealed a significant interaction with significant interaction of race. Our study firstly explored the relationship between heavy metal exposure levels and AMD based on machine learning techniques, found that urinary Cd concentration had the greatest impact on AMD, and revealed the superior predictive performance of machine learning methods. Furthermore, our study provided a new perspective for early screening and intervention of AMD.
Collapse
Affiliation(s)
- Xiang Gao
- Department of Ophthalmology, The First Affiliated Hospital of Bengbu Medical University, 287 Changhuai Road, Bengbu, 233000, China
| | - Chao Liu
- Department of Ophthalmology, The First Affiliated Hospital of Bengbu Medical University, 287 Changhuai Road, Bengbu, 233000, China
| | - Linkang Yin
- Department of Ophthalmology, The First Affiliated Hospital of Bengbu Medical University, 287 Changhuai Road, Bengbu, 233000, China
| | - Aiqin Wang
- Department of Ophthalmology, The First Affiliated Hospital of Bengbu Medical University, 287 Changhuai Road, Bengbu, 233000, China
| | - Juan Li
- Department of Ophthalmology, The First Affiliated Hospital of Bengbu Medical University, 287 Changhuai Road, Bengbu, 233000, China.
| | - Ziqing Gao
- Department of Ophthalmology, The First Affiliated Hospital of Bengbu Medical University, 287 Changhuai Road, Bengbu, 233000, China.
| |
Collapse
|
25
|
Xiao H, Liang X, Li H, Chen X, Li Y. Trends in the prevalence of osteoporosis and effects of heavy metal exposure using interpretable machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 286:117238. [PMID: 39490102 DOI: 10.1016/j.ecoenv.2024.117238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 09/30/2024] [Accepted: 10/19/2024] [Indexed: 11/05/2024]
Abstract
There is limited evidence that heavy metals exposure contributes to osteoporosis. Multi-parameter scoring machine learning (ML) techniques were developed using National Health and Nutrition Examination Survey data to predict osteoporosis based on heavy metal exposure levels. For generating an optimal predictive model for osteoporosis, 12 ML models were used. Identification was carried out using the model that performed the best. For interpretation of models, Shapley additive explanation (SHAP) methods and partial dependence plots (PDP) were integrated into a pipeline and incorporated into the ML pipeline. By regressing osteoporosis on survey cycles, logistic regression was used to evaluate linear trends in osteoporosis over time. For the purpose of training and validating predictive models, 5745 eligible participants were randomly selected into training and testing set. It was evident from the results that the gradient boosting decision tree model performed the best among the predictive models, attributing to an accuracy rate of 89.40 % in the testing set. Based on the model results, the area under the curve and F1 score were 0.88 and 0.39, respectively. As a result of the SHAP analysis, urinary Co, urinary Tu, blood Cd, and urinary Hg levels were identified as the most influential factors influencing osteoporosis. Urinary Co (0.20-6.10 μg/mg creatinine), urinary Tu (0.06-1.93 μg/mg creatinine), blood Cd (0.07-0.50 μg/L), and urinary Hg (0.06-0.75 μg/mg creatinine) levels displayed a distinctive upward trend with risk of osteoporosis as values increased. Our analysis revealed that urinary Co, urinary Tu, blood Cd, and urinary Hg played a significant role in predictability.
Collapse
Affiliation(s)
- Hewei Xiao
- Department of Scientific Research, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
| | - Xueyan Liang
- Phase 1 Clinical Trial Laboratory, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
| | - Huijuan Li
- Phase 1 Clinical Trial Laboratory, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China; Department of Pharmacy, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China
| | - Xiaoyu Chen
- Phase 1 Clinical Trial Laboratory, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China; Department of Pharmacy, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China.
| | - Yan Li
- Department of Pharmacy, Guangxi Academy of Medical Sciences and the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi, China.
| |
Collapse
|
26
|
Li J, Zou L, Ma H, Zhao J, Wang C, Li J, Hu G, Yang H, Wang B, Xu D, Xia Y, Jiang Y, Jiang X, Li N. Interpretable machine learning based on CT-derived extracellular volume fraction to predict pathological grading of hepatocellular carcinoma. Abdom Radiol (NY) 2024; 49:3383-3396. [PMID: 38703190 DOI: 10.1007/s00261-024-04313-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/23/2024] [Accepted: 03/25/2024] [Indexed: 05/06/2024]
Abstract
PURPOSE To develop a non-invasive auxiliary assessment method based on CT-derived extracellular volume (ECV) to predict the pathological grading (PG) of hepatocellular carcinoma (HCC). METHODS The study retrospectively analyzed 238 patients who underwent HCC resection surgery between January 2013 and April 2023. Six machine learning algorithms were employed to construct predictive models for HCC PG: logistic regression, extreme gradient boosting, Light Gradient Boosting Machine (LightGBM), random forest, adaptive boosting, and Gaussian naive Bayes. Model performance was evaluated using receiver operating characteristic curve analysis, including area under the curve (AUC), sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and F1 score. Calibration plots were used for visual evaluation of model calibration. Clinical decision curve analysis was performed to assess potential clinical utility by calculating net benefit. RESULTS 166 patients from Hospital A were allocated to the training set, while 72 patients from Hospital B (constituting 30.25% of the total sample) were assigned to the test set. The model achieved an AUC of 1.000 (95%CI: 1.000-1.000) in the training set and 0.927 (95%CI: 0.837-0.999) in the validation set, respectively. Ultimately, the model achieved an AUC of 0.909 (95%CI: 0.837-0.980) in the test set, with an accuracy of 0.778, sensitivity of 0.906, specificity of 0.789, negative predictive value of 0.556, and F1 score of 0.908. CONCLUSION This study successfully developed and validated a non-invasive auxiliary assessment method based on CT-derived ECV to predict the HCC PG, providing important supplementary information for clinical decision-making.
Collapse
Affiliation(s)
- Jie Li
- Department of Radiology, Binzhou Medical University Hospital, No. 661 Huanghe 2nd Road, Bincheng District, Binzhou, 256600, China
| | - Linxuan Zou
- Department of Radiology, Binzhou Medical University Hospital, No. 661 Huanghe 2nd Road, Bincheng District, Binzhou, 256600, China
| | - Heng Ma
- Department of Radiology, Yantai Yuhuangding Hospital, Qingdao University, Yantai, 264000, China
| | - Jifu Zhao
- Department of Radiology, Binzhou Medical University Hospital, No. 661 Huanghe 2nd Road, Bincheng District, Binzhou, 256600, China
| | - Chengyan Wang
- Department of Radiology, Binzhou Medical University Hospital, No. 661 Huanghe 2nd Road, Bincheng District, Binzhou, 256600, China
| | - Jun Li
- Department of Radiology, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, 264000, China
| | - Guangchao Hu
- School of Medical Imaging, Binzhou Medical University, No. 346 Guanhai Road, Laishan District, Yantai, 264003, China
| | - Haoran Yang
- School of Medical Imaging, Binzhou Medical University, No. 346 Guanhai Road, Laishan District, Yantai, 264003, China
| | - Beizhong Wang
- Department of Radiology, Binzhou Medical University Hospital, No. 661 Huanghe 2nd Road, Bincheng District, Binzhou, 256600, China
| | - Donghao Xu
- School of Medical Imaging, Binzhou Medical University, No. 346 Guanhai Road, Laishan District, Yantai, 264003, China
| | - Yuanhao Xia
- Department of Radiology, Yantai Yuhuangding Hospital, Qingdao University, Yantai, 264000, China
- School of Medical Imaging, Binzhou Medical University, No. 346 Guanhai Road, Laishan District, Yantai, 264003, China
| | - Yi Jiang
- Department of Vascular Interventional Surgery, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, 264000, China
| | - Xingyue Jiang
- Department of Radiology, Binzhou Medical University Hospital, No. 661 Huanghe 2nd Road, Bincheng District, Binzhou, 256600, China.
| | - Naixuan Li
- Department of Vascular Interventional Surgery, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, 264000, China.
| |
Collapse
|
27
|
Olshvang D, Harris C, Chellappa R, Santhanam P. Predictive modeling of lean body mass, appendicular lean mass, and appendicular skeletal muscle mass using machine learning techniques: A comprehensive analysis utilizing NHANES data and the Look AHEAD study. PLoS One 2024; 19:e0309830. [PMID: 39240958 PMCID: PMC11379308 DOI: 10.1371/journal.pone.0309830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/19/2024] [Indexed: 09/08/2024] Open
Abstract
This study addresses the pressing need for improved methods to predict lean mass in adults, and in particular lean body mass (LBM), appendicular lean mass (ALM), and appendicular skeletal muscle mass (ASMM) for the early detection and management of sarcopenia, a condition characterized by muscle loss and dysfunction. Sarcopenia presents significant health risks, especially in populations with chronic diseases like cancer and the elderly. Current assessment methods, primarily relying on Dual-energy X-ray absorptiometry (DXA) scans, lack widespread applicability, hindering timely intervention. Leveraging machine learning techniques, this research aimed to develop and validate predictive models using data from the National Health and Nutrition Examination Survey (NHANES) and the Action for Health in Diabetes (Look AHEAD) study. The models were trained on anthropometric data, demographic factors, and DXA-derived metrics to accurately estimate LBM, ALM, and ASMM normalized to weight. Results demonstrated consistent performance across various machine learning algorithms, with LassoNet, a non-linear extension of the popular LASSO method, exhibiting superior predictive accuracy. Notably, the integration of bone mineral density measurements into the models had minimal impact on predictive accuracy, suggesting potential alternatives to DXA scans for lean mass assessment in the general population. Despite the robustness of the models, limitations include the absence of outcome measures and cohorts highly vulnerable to muscle mass loss. Nonetheless, these findings hold promise for revolutionizing lean mass assessment paradigms, offering implications for chronic disease management and personalized health interventions. Future research endeavors should focus on validating these models in diverse populations and addressing clinical complexities to enhance prediction accuracy and clinical utility in managing sarcopenia.
Collapse
Affiliation(s)
- Daniel Olshvang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Carl Harris
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Rama Chellappa
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Prasanna Santhanam
- Division of Endocrinology, Diabetes, and Metabolism, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| |
Collapse
|
28
|
Wu D, Shi Y, Wang C, Li C, Lu Y, Wang C, Zhu W, Sun T, Han J, Zheng Y, Zhang L. Investigating the impact of extreme weather events and related indicators on cardiometabolic multimorbidity. Arch Public Health 2024; 82:128. [PMID: 39160599 PMCID: PMC11331640 DOI: 10.1186/s13690-024-01361-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 08/11/2024] [Indexed: 08/21/2024] Open
Abstract
BACKGROUND The impact of weather on human health has been proven, but the impact of extreme weather events on cardiometabolic multimorbidity (CMM) needs to be urgently explored. OBJECTIVES Investigating the impact of extreme temperature, relative humidity (RH), and laboratory testing parameters at admission on adverse events in CMM hospitalizations. DESIGNS Time-stratified case-crossover design. METHODS A distributional lag nonlinear model with a time-stratified case-crossover design was used to explore the nonlinear lagged association between environmental factors and CMM. Subsequently, unbalanced data were processed by 1:2 propensity score matching (PSM) and conditional logistic regression was employed to analyze the association between laboratory indicators and unplanned readmissions for CMM. Finally, the previously identified environmental factors and relevant laboratory indicators were incorporated into different machine learning models to predict the risk of unplanned readmission for CMM. RESULTS There are nonlinear associations and hysteresis effects between temperature, RH and hospital admissions for a variety of CMM. In addition, the risk of admission is higher under low temperature and high RH conditions with the addition of particulate matter (PM, PM2.5 and PM10) and O3_8h. The risk is greater for females and adults aged 65 and older. Compared with first quartile (Q1), the fourth quartile (Q4) had a higher association between serum calcium (HR = 1.3632, 95% CI: 1.0732 ~ 1.7334), serum creatinine (HR = 1.7987, 95% CI: 1.3528 ~ 2.3958), fasting plasma glucose (HR = 1.2579, 95% CI: 1.0839 ~ 1.4770), aspartate aminotransferase/ alanine aminotransferase ratio (HR = 2.3131, 95% CI: 1.9844 ~ 2.6418), alanine aminotransferase (HR = 1.7687, 95% CI: 1.2388 ~ 2.2986), and gamma-glutamyltransferase (HR = 1.4951, 95% CI: 1.2551 ~ 1.7351) were independently and positively associated with unplanned readmission for CMM. However, serum total bilirubin and High-Density Lipoprotein (HDL) showed negative correlations. After incorporating environmental factors and their lagged terms, eXtreme Gradient Boosting (XGBoost) demonstrated a more prominent predictive performance for unplanned readmission of CMM patients, with an average area under the receiver operating characteristic curve (AUC) of 0.767 (95% CI:0.7486 ~ 0.7854). CONCLUSIONS Extreme cold or wet weather is linked to worsened adverse health effects in female patients with CMM and in individuals aged 65 years and older. Moreover, meteorologic factors and environmental pollutants may elevate the likelihood of unplanned readmissions for CMM.
Collapse
Affiliation(s)
- Di Wu
- School of Public Health, Xinjiang Medical University, Urumqi, China
| | - Yu Shi
- School of Public Health, Xinjiang Medical University, Urumqi, China
| | - ChenChen Wang
- Center for Disease Control and Prevention of Xinjiang Uygur Autonomous Region, Urumqi, China
| | - Cheng Li
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Yaoqin Lu
- Center for Disease Control and Prevention of Urumqi, Urumqi, China
| | - Chunfang Wang
- School of Public Health, Nanjing Medical University, Nanjing, China
| | - Weidong Zhu
- School of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi, China
| | - Tingting Sun
- School of Agriculture, Xinjiang Agricultural University, Urumqi, China
| | - Junjie Han
- School of Nursing and Public Health, Yangzhou University, Yangzhou, China
| | - Yanling Zheng
- School of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China
| | - Liping Zhang
- School of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China.
| |
Collapse
|
29
|
Feng Z, Chen Y, Guo Y, Lyu J. Deciphering the environmental chemical basis of muscle quality decline by interpretable machine learning models. Am J Clin Nutr 2024; 120:407-418. [PMID: 38825185 DOI: 10.1016/j.ajcnut.2024.05.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 05/07/2024] [Accepted: 05/28/2024] [Indexed: 06/04/2024] Open
Abstract
BACKGROUND Sarcopenia is known as a decline in skeletal muscle quality and function that is associated with age. Sarcopenia is linked to diverse health problems, including endocrine-related diseases. Environmental chemicals (ECs), a broad class of chemicals released from industry, may influence muscle quality decline. OBJECTIVES In this work, we aimed to simultaneously elucidate the associations between muscle quality decline and diverse EC exposures based on the data from the 2011-2012 and 2013-2014 survey cycles in the National Health and Nutrition Examination Survey (NHANES) project using machine learning models. METHODS Six machine learning models were trained based on the EC and non-EC exposures from NHANES to distinguish low from normal muscle quality index status. Different machine learning metrics were evaluated for these models. The Shapley additive explanations (SHAP) approach was used to provide explainability for machine learning models. RESULTS Random forest (RF) performed best on the independent testing data set. Based on the testing data set, ECs can independently predict the binary muscle quality status with good performance by RF (area under the receiver operating characteristic curve = 0.793; area under the precision-recall curve = 0.808). The SHAP ranked the importance of ECs for the RF model. As a result, several metals and chemicals in urine, including 3-phenoxybenzoic acid and cobalt, were more associated with the muscle quality decline. CONCLUSIONS Altogether, our analyses suggest that ECs can independently predict muscle quality decline with a good performance by RF, and the SHAP-identified ECs can be closely related to muscle quality decline and sarcopenia. Our analyses may provide valuable insights into ECs that may be the important basis of sarcopenia and endocrine-related diseases in United States populations.
Collapse
Affiliation(s)
- Zhen Feng
- Joint Centre of Translational Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, People's Republic of China; Joint Centre of Translational Medicine, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China; College of Information and Engineering, Wenzhou Medical University, Wenzhou, Zhejiang, People's Republic of China
| | - Ying'ao Chen
- Postgraduate Training Base Alliance of Wenzhou Medical University, Wenzhou, Zhejiang, People's Republic of China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China
| | - Yuxin Guo
- College of Information and Engineering, Wenzhou Medical University, Wenzhou, Zhejiang, People's Republic of China
| | - Jie Lyu
- Joint Centre of Translational Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, People's Republic of China; Joint Centre of Translational Medicine, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, People's Republic of China; Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, Zhejiang, People's Republic of China.
| |
Collapse
|
30
|
Bu ZJ, Jiang N, Li KC, Lu ZL, Zhang N, Yan SS, Chen ZL, Hao YH, Zhang YH, Xu RB, Chi HW, Chen ZY, Liu JP, Wang D, Xu F, Liu ZL. Development and Validation of an Interpretable Machine Learning Model for Early Prognosis Prediction in ICU Patients with Malignant Tumors and Hyperkalemia. Medicine (Baltimore) 2024; 103:e38747. [PMID: 39058887 PMCID: PMC11272258 DOI: 10.1097/md.0000000000038747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/07/2024] [Indexed: 07/28/2024] Open
Abstract
This study aims to develop and validate a machine learning (ML) predictive model for assessing mortality in patients with malignant tumors and hyperkalemia (MTH). We extracted data on patients with MTH from the Medical Information Mart for Intensive Care-IV, version 2.2 (MIMIC-IV v2.2) database. The dataset was split into a training set (75%) and a validation set (25%). We used the Least Absolute Shrinkage and Selection Operator (LASSO) regression to identify potential predictors, which included clinical laboratory indicators and vital signs. Pearson correlation analysis tested the correlation between predictors. In-hospital death was the prediction target. The Area Under the Curve (AUC) and accuracy of the training and validation sets of 7 ML algorithms were compared, and the optimal 1 was selected to develop the model. The calibration curve was used to evaluate the prediction accuracy of the model further. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) enhanced model interpretability. 496 patients with MTH in the Intensive Care Unit (ICU) were included. After screening, 17 clinical features were included in the construction of the ML model, and the Pearson correlation coefficient was <0.8, indicating that the correlation between the clinical features was small. eXtreme Gradient Boosting (XGBoost) outperformed other algorithms, achieving perfect scores in the training set (accuracy: 1.000, AUC: 1.000) and high scores in the validation set (accuracy: 0.734, AUC: 0.733). The calibration curves indicated good predictive calibration of the model. SHAP analysis identified the top 8 predictive factors: urine output, mean heart rate, maximum urea nitrogen, minimum oxygen saturation, minimum mean blood pressure, maximum total bilirubin, mean respiratory rate, and minimum pH. In addition, SHAP and LIME performed in-depth individual case analyses. This study demonstrates the effectiveness of ML methods in predicting mortality risk in ICU patients with MTH. It highlights the importance of predictors like urine output and mean heart rate. SHAP and LIME significantly enhanced the model's interpretability.
Collapse
Affiliation(s)
- Zhi-Jun Bu
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Nan Jiang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- The Third Affiliated Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Ke-Cheng Li
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- Department of Andrology, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhi-Lin Lu
- First Clinical College, Hubei University of Chinese Medicine, Wuhan, China
| | - Nan Zhang
- School of International Studies, University of International Business and Economics, Beijing, China
| | - Shao-Shuai Yan
- Department of Thyropathy, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhi-Lin Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Yu-Han Hao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Yu-Huan Zhang
- School of Acupuncture and Orthopedics, Hubei University of Chinese Medicine, Wuhan, China
| | - Run-Bing Xu
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- Department of Hematology and Oncology, Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Han-Wei Chi
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Zu-Yi Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Jian-Ping Liu
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| | - Dan Wang
- Surgery of Thyroid Gland and Breast, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
- Hubei Shizhen Laboratory, Wuhan, China
| | - Feng Xu
- The Third Affiliated Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Zhao-Lan Liu
- Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, China
| |
Collapse
|
31
|
Zuo W, Yang X. A machine learning model predicts stroke associated with blood cadmium level. Sci Rep 2024; 14:14739. [PMID: 38926494 PMCID: PMC11208606 DOI: 10.1038/s41598-024-65633-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 06/21/2024] [Indexed: 06/28/2024] Open
Abstract
Stroke is the leading cause of death and disability worldwide. Cadmium is a prevalent environmental toxicant that may contribute to cardiovascular disease, including stroke. We aimed to build an effective and interpretable machine learning (ML) model that links blood cadmium to the identification of stroke. Our data exploring the association between blood cadmium and stroke came from the National Health and Nutrition Examination Survey (NHANES, 2013-2014). In total, 2664 participants were eligible for this study. We divided these data into a training set (80%) and a test set (20%). To analyze the relationship between blood cadmium and stroke, a multivariate logistic regression analysis was performed. We constructed and tested five ML algorithms including K-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), multilayer perceptron (MLP), and random forest (RF). The best-performing model was selected to identify stroke in US adults. Finally, the features were interpreted using the Shapley Additive exPlanations (SHAP) tool. In the total population, participants in the second, third, and fourth quartiles had an odds ratio of 1.32 (95% CI 0.55, 3.14), 1.65 (95% CI 0.71, 3.83), and 2.67 (95% CI 1.10, 6.49) for stroke compared with the lowest reference group for blood cadmium, respectively. This blood cadmium-based LR approach demonstrated the greatest performance in identifying stroke (area under the operator curve: 0.800, accuracy: 0.966). Employing interpretable methods, we found blood cadmium to be a notable contributor to the predictive model. We found that blood cadmium was positively correlated with stroke risk and that stroke risk from cadmium exposure could be effectively predicted by using ML modeling.
Collapse
Affiliation(s)
- Wenwei Zuo
- School of Gongli Hospital Medical Technology, University of Shanghai for Science and Technology, No. 516, Jungong Road, Yangpu Area, Shanghai, 200093, China
| | - Xuelian Yang
- Department of Neurology, Shanghai Pudong New Area Gongli Hospital, No. 219 Miaopu Road, Pudong New Area, Shanghai, 200135, China.
| |
Collapse
|
32
|
Jiang X, Zhou R, Jiang F, Yan Y, Zhang Z, Wang J. Construction of diagnostic models for the progression of hepatocellular carcinoma using machine learning. Front Oncol 2024; 14:1401496. [PMID: 38812780 PMCID: PMC11133637 DOI: 10.3389/fonc.2024.1401496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 04/29/2024] [Indexed: 05/31/2024] Open
Abstract
Liver cancer is one of the most prevalent forms of cancer worldwide. A significant proportion of patients with hepatocellular carcinoma (HCC) are diagnosed at advanced stages, leading to unfavorable treatment outcomes. Generally, the development of HCC occurs in distinct stages. However, the diagnostic and intervention markers for each stage remain unclear. Therefore, there is an urgent need to explore precise grading methods for HCC. Machine learning has emerged as an effective technique for studying precise tumor diagnosis. In this research, we employed random forest and LightGBM machine learning algorithms for the first time to construct diagnostic models for HCC at various stages of progression. We categorized 118 samples from GSE114564 into three groups: normal liver, precancerous lesion (including chronic hepatitis, liver cirrhosis, dysplastic nodule), and HCC (including early stage HCC and advanced HCC). The LightGBM model exhibited outstanding performance (accuracy = 0.96, precision = 0.96, recall = 0.96, F1-score = 0.95). Similarly, the random forest model also demonstrated good performance (accuracy = 0.83, precision = 0.83, recall = 0.83, F1-score = 0.83). When the progression of HCC was categorized into the most refined six stages: normal liver, chronic hepatitis, liver cirrhosis, dysplastic nodule, early stage HCC, and advanced HCC, the diagnostic model still exhibited high efficacy. Among them, the LightGBM model exhibited good performance (accuracy = 0.71, precision = 0.71, recall = 0.71, F1-score = 0.72). Also, performance of the LightGBM model was superior to that of the random forest model. Overall, we have constructed a diagnostic model for the progression of HCC and identified potential diagnostic characteristic gene for the progression of HCC.
Collapse
Affiliation(s)
- Xin Jiang
- Innovation Center for Cancer Research, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
- Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fuzhou, China
| | - Ruilong Zhou
- Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Fengle Jiang
- Innovation Center for Cancer Research, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
- Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fuzhou, China
| | - Yanan Yan
- Innovation Center for Cancer Research, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
- Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fuzhou, China
| | - Zheting Zhang
- Innovation Center for Cancer Research, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
- Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fuzhou, China
| | - Jianmin Wang
- Innovation Center for Cancer Research, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China
- Fujian Key Laboratory of Advanced Technology for Cancer Screening and Early Diagnosis, Fuzhou, China
| |
Collapse
|
33
|
Zhu G, Wen Y, Cao K, He S, Wang T. A review of common statistical methods for dealing with multiple pollutant mixtures and multiple exposures. Front Public Health 2024; 12:1377685. [PMID: 38784575 PMCID: PMC11113012 DOI: 10.3389/fpubh.2024.1377685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 04/15/2024] [Indexed: 05/25/2024] Open
Abstract
Traditional environmental epidemiology has consistently focused on studying the impact of single exposures on specific health outcomes, considering concurrent exposures as variables to be controlled. However, with the continuous changes in environment, humans are increasingly facing more complex exposures to multi-pollutant mixtures. In this context, accurately assessing the impact of multi-pollutant mixtures on health has become a central concern in current environmental research. Simultaneously, the continuous development and optimization of statistical methods offer robust support for handling large datasets, strengthening the capability to conduct in-depth research on the effects of multiple exposures on health. In order to examine complicated exposure mixtures, we introduce commonly used statistical methods and their developments, such as weighted quantile sum, bayesian kernel machine regression, toxic equivalency analysis, and others. Delineating their applications, advantages, weaknesses, and interpretability of results. It also provides guidance for researchers involved in studying multi-pollutant mixtures, aiding them in selecting appropriate statistical methods and utilizing R software for more accurate and comprehensive assessments of the impact of multi-pollutant mixtures on human health.
Collapse
Affiliation(s)
- Guiming Zhu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, Taiyuan, China
| | - Yanchao Wen
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, Taiyuan, China
| | - Kexin Cao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, Taiyuan, China
| | - Simin He
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, Taiyuan, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, Taiyuan, China
| |
Collapse
|
34
|
Abstract
Heavy metals are harmful environmental pollutants that have attracted widespread attention due to their health hazards to human cardiovascular disease. Heavy metals, including lead, cadmium, mercury, arsenic, and chromium, are found in various sources such as air, water, soil, food, and industrial products. Recent research strongly suggests a connection between cardiovascular disease and exposure to toxic heavy metals. Epidemiological, basic, and clinical studies have revealed that heavy metals can promote the production of reactive oxygen species, which can then exacerbate reactive oxygen species generation and induce inflammation, resulting in endothelial dysfunction, lipid metabolism distribution, disruption of ion homeostasis, and epigenetic changes. Over time, heavy metal exposure eventually results in an increased risk of hypertension, arrhythmia, and atherosclerosis. Strengthening public health prevention and the application of chelation or antioxidants, such as vitamins and beta-carotene, along with minerals, such as selenium and zinc, can diminish the burden of cardiovascular disease attributable to metal exposure.
Collapse
Affiliation(s)
- Ziwei Pan
- Key Laboratory of Combined Multi Organ Transplantation, Ministry of Public Health, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China (Z.P., P.L.)
- Institute of Translational Medicine, Zhejiang University, Hangzhou, China (Z.P., P.L.)
| | - Tingyu Gong
- Shulan International Medical College, Zhejiang Shuren University, Hangzhou, China (T.G.)
| | - Ping Liang
- Key Laboratory of Combined Multi Organ Transplantation, Ministry of Public Health, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China (Z.P., P.L.)
- Institute of Translational Medicine, Zhejiang University, Hangzhou, China (Z.P., P.L.)
| |
Collapse
|
35
|
Song W, Wu F, Yan Y, Li Y, Wang Q, Hu X, Li Y. Gut microbiota landscape and potential biomarker identification in female patients with systemic lupus erythematosus using machine learning. Front Cell Infect Microbiol 2023; 13:1289124. [PMID: 38169617 PMCID: PMC10758415 DOI: 10.3389/fcimb.2023.1289124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/28/2023] [Indexed: 01/05/2024] Open
Abstract
Objectives Systemic Lupus Erythematosus (SLE) is a complex autoimmune disease that disproportionately affects women. Early diagnosis and prevention are crucial for women's health, and the gut microbiota has been found to be strongly associated with SLE. This study aimed to identify potential biomarkers for SLE by characterizing the gut microbiota landscape using feature selection and exploring the use of machine learning (ML) algorithms with significantly dysregulated microbiotas (SDMs) for early identification of SLE patients. Additionally, we used the SHapley Additive exPlanations (SHAP) interpretability framework to visualize the impact of SDMs on the risk of developing SLE in females. Methods Stool samples were collected from 54 SLE patients and 55 Negative Controls (NC) for microbiota analysis using 16S rRNA sequencing. Feature selection was performed using Elastic Net and Boruta on species-level taxonomy. Subsequently, four ML algorithms, namely logistic regression (LR), Adaptive Boosting (AdaBoost), Random Forest (RF), and eXtreme gradient boosting (XGBoost), were used to achieve early identification of SLE with SDMs. Finally, the best-performing algorithm was combined with SHAP to explore how SDMs affect the risk of developing SLE in females. Results Both alpha and beta diversity were found to be different in SLE group. Following feature selection, 68 and 21 microbiota were retained in Elastic Net and Boruta, respectively, with 16 microbiota overlapping between the two, i.e., SDMs for SLE. The four ML algorithms with SDMs could effectively identify SLE patients, with XGBoost performing the best, achieving Accuracy, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, and AUC values of 0.844, 0.750, 0.938, 0.923, 0.790, and 0.930, respectively. The SHAP interpretability framework showed a complex non-linear relationship between the relative abundance of SDMs and the risk of SLE, with Escherichia_fergusonii having the largest SHAP value. Conclusions This study revealed dysbiosis in the gut microbiota of female SLE patients. ML classifiers combined with SDMs can facilitate early identification of female patients with SLE, particularly XGBoost. The SHAP interpretability framework provides insight into the impact of SDMs on the risk of SLE and may inform future scientific treatment for SLE.
Collapse
Affiliation(s)
- Wenzhu Song
- School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Feng Wu
- Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Yan Yan
- Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Yaheng Li
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, Shanxiuan, China
| | - Qian Wang
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, Shanxiuan, China
| | - Xueli Hu
- Department of Nephrology, Hejin People’s Hospital, Yuncheng, Shanxi, China
| | - Yafeng Li
- Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, Shanxiuan, China
- Core Laboratory, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
- Academy of Microbial Ecology, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
36
|
Li Q, Zheng JX, Jia TW, Feng XY, Lv C, Zhang LJ, Yang GJ, Xu J, Zhou XN. Optimized strategy for schistosomiasis elimination: results from marginal benefit modeling. Parasit Vectors 2023; 16:419. [PMID: 37968661 PMCID: PMC10652544 DOI: 10.1186/s13071-023-06001-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 10/06/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND Poverty contributes to the transmission of schistosomiasis via multiple pathways, with the insufficiency of appropriate interventions being a crucial factor. The aim of this article is to provide more economical and feasible intervention measures for endemic areas with varying levels of poverty. METHODS We collected and analyzed the prevalence patterns along with the cost of control measures in 11 counties over the last 20 years in China. Seven machine learning models, including XGBoost, support vector machine, generalized linear model, regression tree, random forest, gradient boosting machine and neural network, were used for developing model and calculate marginal benefits. RESULTS The XGBoost model had the highest prediction accuracy with an R2 of 0.7308. Results showed that risk surveillance, snail control with molluscicides and treatment were the most effective interventions in controlling schistosomiasis prevalence. The best combination of interventions was interlacing seven interventions, including risk surveillance, treatment, toilet construction, health education, snail control with molluscicides, cattle slaughter and animal chemotherapy. The marginal benefit of risk surveillance is the most effective intervention among nine interventions, which was influenced by the prevalence of schistosomiasis and cost. CONCLUSIONS In the elimination phase of the national schistosomiasis program, emphasizing risk surveillance holds significant importance in terms of cost-saving.
Collapse
Affiliation(s)
- Qin Li
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Jin-Xin Zheng
- Ruijin Hospital Affiliated to The Shanghai Jiao Tong University Medical School, Shanghai, 200025, China
| | - Tie-Wu Jia
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Xin-Yu Feng
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Chao Lv
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
- School of Global Health, Chinese Center for Tropical Diseases Research and Shanghai Jiao Tong University School of Medicine, One Health Center, Shanghai Jiao Tong University and The Edinburgh University, Shanghai, 200025, China
| | - Li-Juan Zhang
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Guo-Jing Yang
- School of Tropical Medicine, Hainan Medical University, Haikou, 571199, China
| | - Jing Xu
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Xiao-Nong Zhou
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China.
- School of Global Health, Chinese Center for Tropical Diseases Research and Shanghai Jiao Tong University School of Medicine, One Health Center, Shanghai Jiao Tong University and The Edinburgh University, Shanghai, 200025, China.
| |
Collapse
|
37
|
Li X, Zhang D, Zhao Y, Kuang L, Huang H, Chen W, Fu X, Wu Y, Li T, Zhang J, Yuan L, Hu H, Liu Y, Hu F, Zhang M, Sun X, Hu D. Correlation of heavy metals' exposure with the prevalence of coronary heart disease among US adults: findings of the US NHANES from 2003 to 2018. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2023; 45:6745-6759. [PMID: 37378736 DOI: 10.1007/s10653-023-01670-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 06/19/2023] [Indexed: 06/29/2023]
Abstract
We sought to explore the association between heavy metal exposure and coronary heart disease (CHD) based on data from the US National Health and Nutrition Examination Survey (NHANES, 2003-2018). In the analyses, participants were all aged > 20 and had participated in heavy metal sub-tests with valid CHD status. The Mann-Kendall test was employed to assess the trends in heavy metals' exposure and the trends in CHD prevalence over 16 years. Spearman's rank correlation coefficient and a logistics regression (LR) model were used to estimate the association between heavy metals and CHD prevalence. 42,749 participants were included in our analyses, 1802 of whom had a CHD diagnosis. Total arsenic, dimethylarsonic acid, monomethylarsonic acid, barium, cadmium, lead, and antimony in urine, and cadmium, lead, and total mercury in blood all showed a substantial decreasing exposure level tendency over the 16 years (all Pfor trend < 0.05). CHD prevalence varied from 3.53 to 5.23% between 2003 and 2018. The correlation between 15 heavy metals and CHD ranges from - 0.238 to 0.910. There was also a significant positive correlation between total arsenic, monomethylarsonic acid, and thallium in urine and CHD by data release cycles (all P < 0.05). The cesium in urine showed a negative correlation with CHD (P < 0.05). We found that exposure trends of total arsenic, dimethylarsonic acid, monomethylarsonic acid, barium, cadmium, lead, and antimony in urine and blood decreased. CHD prevalence fluctuated, however. Moreover, total arsenic, monomethylarsonic acid, and thallium in urine all showed positive relationships with CHD, while cesium in urine showed a negative relationship with CHD.
Collapse
Affiliation(s)
- Xi Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Dongdong Zhang
- Department of General Practice, The Affiliated Luohu Hospital of Shenzhen University Medical School, Shenzhen, People's Republic of China
| | - Yang Zhao
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Lei Kuang
- Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Medical School, Shenzhen, Guangdong, People's Republic of China
| | - Hao Huang
- Department of General Practice, The Affiliated Luohu Hospital of Shenzhen University Medical School, Shenzhen, People's Republic of China
| | - Weiling Chen
- Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Medical School, Shenzhen, Guangdong, People's Republic of China
| | - Xueru Fu
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Yuying Wu
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Tianze Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Jinli Zhang
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Lijun Yuan
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Huifang Hu
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China
| | - Yu Liu
- Department of General Practice, The Affiliated Luohu Hospital of Shenzhen University Medical School, Shenzhen, People's Republic of China
| | - Fulan Hu
- Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Medical School, Shenzhen, Guangdong, People's Republic of China
| | - Ming Zhang
- Department of Biostatistics and Epidemiology, School of Public Health, Shenzhen University Medical School, Shenzhen, Guangdong, People's Republic of China
| | - Xizhuo Sun
- Department of General Practice, The Affiliated Luohu Hospital of Shenzhen University Medical School, Shenzhen, People's Republic of China
| | - Dongsheng Hu
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, People's Republic of China.
| |
Collapse
|
38
|
Li W, Huang G, Tang N, Lu P, Jiang L, Lv J, Qin Y, Lin Y, Xu F, Lei D. Effects of heavy metal exposure on hypertension: A machine learning modeling approach. CHEMOSPHERE 2023; 337:139435. [PMID: 37422210 DOI: 10.1016/j.chemosphere.2023.139435] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/10/2023]
Abstract
Heavy metal exposure is a common risk factor for hypertension. To develop an interpretable predictive machine learning (ML) model for hypertension based on levels of heavy metal exposure, data from the NHANES (2003-2016) were employed. Random forest (RF), support vector machine (SVM), decision tree (DT), multilayer perceptron (MLP), ridge regression (RR), AdaBoost (AB), gradient boosting decision tree (GBDT), voting classifier (VC), and K-nearest neighbour (KNN) algorithms were utilized to generate an optimal predictive model for hypertension. Three interpretable methods, the permutation feature importance analysis, partial dependence plot (PDP), and Shapley additive explanations (SHAP) methods, were integrated into a pipeline and embedded in ML for model interpretation. A total of 9005 eligible individuals were randomly allocated into two distinct sets for predictive model training and validation. The results showed that among the predictive models, the RF model demonstrated the highest performance, achieving an accuracy rate of 77.40% in the validation set. The AUC and F1 score for the model were 0.84 and 0.76, respectively. Blood Pb, urinary Cd, urinary Tl, and urinary Co levels were identified as the main influencers of hypertension, and their contribution weights were 0.0504 ± 0.0482, 0.0389 ± 0.0256, 0.0307 ± 0.0179, and 0.0296 ± 0.0162, respectively. Blood Pb (0.55-2.93 μg/dL) and urinary Cd (0.06-0.15 μg/L) levels exhibited the most pronounced upwards trend with the risk of hypertension within a specific value range, while urinary Tl (0.06-0.26 μg/L) and urinary Co (0.02-0.32 μg/L) levels demonstrated a declining trend with hypertension. The findings on the synergistic effects indicated that Pb and Cd were the primary determinants of hypertension. Our findings underscore the predictive value of heavy metals for hypertension. By utilizing interpretable methods, we discerned that Pb, Cd, Tl, and Co emerged as noteworthy contributors within the predictive model.
Collapse
Affiliation(s)
- Wenxiang Li
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China.
| | - Guangyi Huang
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Ningning Tang
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Peng Lu
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Li Jiang
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Jian Lv
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Yuanjun Qin
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Yunru Lin
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Fan Xu
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China.
| | - Daizai Lei
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China.
| |
Collapse
|
39
|
Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One 2023; 18:e0281922. [PMID: 36821544 PMCID: PMC9949629 DOI: 10.1371/journal.pone.0281922] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 02/05/2023] [Indexed: 02/24/2023] Open
Abstract
Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset.
Collapse
Affiliation(s)
- Alexander A. Huang
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America
- Department of MD Education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Samuel Y. Huang
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, Virginia, United States of America
| |
Collapse
|