1
|
Lu X, Chen Y, Zhang G, Zeng X, Lai L, Qu C. Application of interpretable machine learning algorithms to predict acute kidney injury in patients with cerebral infarction in ICU. J Stroke Cerebrovasc Dis 2024; 33:107729. [PMID: 38657830 DOI: 10.1016/j.jstrokecerebrovasdis.2024.107729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/14/2024] [Accepted: 04/20/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Acute kidney injury (AKI) is not only a complication but also a serious threat to patients with cerebral infarction (CI). This study aimed to explore the application of interpretable machine learning algorithms in predicting AKI in patients with cerebral infarction. METHODS The study included 3920 patients with CI admitted to the Intensive Care Unit and Emergency Medicine of the Central Hospital of Lishui City, Zhejiang Province. Nine machine learning techniques, including XGBoost, logistics, LightGBM, random forest (RF), AdaBoost, GaussianNB (GNB), Multi-Layer Perceptron (MLP), support vector machine (SVM), and k-nearest neighbors (KNN) classification, were used to develop a predictive model for AKI in these patients. SHapley Additive exPlanations (SHAP) analysis provided visual explanations for each patient. Finally, model effectiveness was assessed using metrics such as average precision (AP), sensitivity, specificity, accuracy, F1 score, precision-recall (PR) curve, calibration plot, and decision curve analysis (DCA). RESULTS The XGBoost model performed better in the internal validation set and the external validation set, with an AUC of 0.940 and 0.887, respectively. The five most important variables in the model were, in order, glomerular filtration rate, low-density lipoprotein, total cholesterol, hemiplegia and serum kalium. CONCLUSION This study demonstrates the potential of interpretable machine learning algorithms in predicting CI patients with AKI.
Collapse
Affiliation(s)
- Xiaochi Lu
- Department of Emergency medicine, Lishui Municipal Central Hospital, Lishui, 323000, PR China
| | - Yi Chen
- Department of Emergency medicine, Lishui Municipal Central Hospital, Lishui, 323000, PR China
| | - Gongping Zhang
- Department of Emergency medicine, Lishui Municipal Central Hospital, Lishui, 323000, PR China
| | - Xu Zeng
- Department of Emergency medicine, Lishui Municipal Central Hospital, Lishui, 323000, PR China
| | - Linjie Lai
- Department of Emergency medicine, Lishui Municipal Central Hospital, Lishui, 323000, PR China
| | - Chaojun Qu
- Department of Intensive care unit, Lishui Municipal Central Hospital, Lishui, 323000, PR China.
| |
Collapse
|
2
|
Hyun S, Lee H, Park W. Individual-specific postural discomfort prediction using decision tree models. APPLIED ERGONOMICS 2024; 118:104282. [PMID: 38574593 DOI: 10.1016/j.apergo.2024.104282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/19/2024] [Accepted: 03/29/2024] [Indexed: 04/06/2024]
Abstract
The objective of the current study was to explore the utilization of the decision tree (DT) algorithm to model posture-discomfort relationships at the individual level. The DT algorithm has the advantage that it makes no assumptions about the distribution of data, is robust in representing non-linear data with noise, and produces white-box models that are interpretable. Individual-level modelling is essential for examining individual-specific postural discomfort perception processes and understanding the inter-individual variability. It also has practical applications, including the development of individual-specific digital human models and more precise and informative population accommodation analysis. Individual-specific DT models were generated using postural discomfort rating data for various seated upper body postures to predict discomfort based on postural and task variables. The individual-specific DT models accurately predicted postural discomfort and revealed large inter-individual variability in the modelling results. DT modelling is expected to greatly facilitate investigating the human discomfort perception process.
Collapse
Affiliation(s)
- Soomin Hyun
- Industrial Engineering, Seoul National University, Seoul, 151-744, South Korea
| | - Hyunju Lee
- Industrial Engineering, Seoul National University, Seoul, 151-744, South Korea
| | - Woojin Park
- Industrial Engineering, Seoul National University, Seoul, 151-744, South Korea; Institute for Industrial Systems Innovation, Seoul National University, Seoul, 151-744, South Korea.
| |
Collapse
|
3
|
Li X, Zhang F, Zheng L, Guo J. Advancing ecotoxicity assessment: Leveraging pre-trained model for bee toxicity and compound degradability prediction. JOURNAL OF HAZARDOUS MATERIALS 2024; 475:134828. [PMID: 38876015 DOI: 10.1016/j.jhazmat.2024.134828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/09/2024] [Accepted: 06/03/2024] [Indexed: 06/16/2024]
Abstract
The prediction of ecological toxicity plays an increasingly important role in modern society. However, the existing models often suffer from poor performance and limited predictive capabilities. In this study, we propose a novel approach for ecological toxicity assessment based on pre-trained models. By leveraging pre-training techniques and graph neural network models, we establish a highperformance predictive model. Furthermore, we incorporate a variational autoencoder to optimize the model, enabling simultaneous discrimination of toxicity to bees and molecular degradability. Additionally, despite the low similarity between the endogenous hormones in bees and the compounds in our dataset, our model confidently predicts that these hormones are non-toxic to bees, which further strengthens the credibility and accuracy of our model. We also discovered the negative correlation between the degradation and bee toxicity of compounds. In summary, this study presents an ecological toxicity assessment model with outstanding performance. The proposed model accurately predicts the toxicity of chemicals to bees and their degradability capabilities, offering valuable technical support to relevant fields.
Collapse
Affiliation(s)
- Xinkang Li
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao
| | - Feng Zhang
- College of Plant Protection, Nanjing Agricultural University, Nanjing 210095, China
| | - Liangzhen Zheng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518000, China; Zelixir Biotech Company Ltd. Shanghai, China.
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, 999078, Macao.
| |
Collapse
|
4
|
Sun Z, Wang Z, Qi X, Wang D, Gu X, Wang J, Lu H, Chen Y. Understanding key contributing factors on the severity of traffic violations by elderly drivers: a hybrid approach of latent class analysis and XGBoost based SHAP. Int J Inj Contr Saf Promot 2024; 31:273-293. [PMID: 38284989 DOI: 10.1080/17457300.2023.2300479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/24/2023] [Indexed: 01/30/2024]
Abstract
Traffic violation is one of the leading causes of traffic crashes. In the context of global aging, it is important to study traffic violations by elderly drivers for improving traffic safety in preparation for a worldwide aging population. In this study, a hybrid approach of Latent Class Analysis (LCA) and XGBoost based SHAP is proposed to identify hidden clusters and to understand the key contributing factors on the severity of traffic violations by elderly drivers, based on the police-reported traffic violation dataset of Beijing (China). First, LCA is applied to segment the dataset into several latent homogeneous clusters, then XGBoost based SHAP is established on each cluster to identify feature contributions and the interaction effects of the key contributing factors on the severity of traffic violations by elderly drivers. Two comparison groups were set up to analyze factors, which are responsible for the different severities of traffic violations. The results show that elderly drivers can be classified into four groups by age, urban or not, license, and season; factors such as less annual number of traffic violations, national & provincial highway, night and winter are key contributing factors for higher severity of traffic violations, which are consistent with common cognition; key contributing factors for all clusters are similar but not identical, for example, more annual number of traffic violations contribute to more severe violation for all clusters except for Cluster 2; some factors which are not key contributing factors may affect the severity of traffic violations when they are combined with other factors, for example, the combination of lower annual number of traffic violations and county & township highway contributes to more severe violation for Cluster 1. These findings can help government to formulate targeted countermeasures to decrease the severity of traffic violations by specific elderly groups and improve road service for the driving population.
Collapse
Affiliation(s)
- Zhiyuan Sun
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
| | - Zhicheng Wang
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
| | - Xin Qi
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
| | - Duo Wang
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
| | - Xin Gu
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
| | - Jianyu Wang
- Beijing Key Laboratory of General Aviation Technology, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Huapu Lu
- Institute of Transportation Engineering, Tsinghua University, Beijing, China
| | - Yanyan Chen
- Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing, China
| |
Collapse
|
5
|
Liu H, Dong S, Yang H, Wang L, Liu J, Du Y, Liu J, Lyu Z, Wang Y, Jiang L, Yu S, Fu X. Comparing the accuracy of four machine learning models in predicting type 2 diabetes onset within the Chinese population: a retrospective study. J Int Med Res 2024; 52:3000605241253786. [PMID: 38870271 DOI: 10.1177/03000605241253786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2024] Open
Abstract
OBJECTIVE To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records. METHODS We included 46,247 patients (32,372 and 13,875 in training and validation sets, respectively) from a national health checkup center database. Univariate and multivariate Cox analyses were performed to identify factors influencing T2DM risk. Extreme Gradient Boosting (XGBoost), support vector machine (SVM), logistic regression (LR), and random forest (RF) models were trained to predict 5-year T2DM risk. Model performances were analyzed using receiver operating characteristic (ROC) curves for discrimination and calibration plots for prediction accuracy. RESULTS Key variables included fasting plasma glucose, age, and sedentary time. The LR model showed good accuracy with respective areas under the ROC (AUCs) of 0.914 and 0.913 in training and validation sets; the RF model exhibited favorable AUCs of 0.998 and 0.838. In calibration analysis, the LR model displayed good fit for low-risk patients; the RF model exhibited satisfactory fit for low- and high-risk patients. CONCLUSIONS LR and RF models can effectively predict T2DM risk in the Chinese population. These models may help identify high-risk patients and guide interventions to prevent complications and disabilities.
Collapse
Affiliation(s)
- Hongzhou Liu
- Department of Endocrinology, Aerospace Center Hospital, Beijing, China
- Department of Endocrinology, First Hospital of Handan City, Handan, China
| | - Song Dong
- Department of Endocrinology, Aerospace Center Hospital, Beijing, China
| | - Hua Yang
- Department of Outpatient, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Linlin Wang
- Department of Endocrinology, Aerospace Center Hospital, Beijing, China
| | - Jia Liu
- Department of Endocrinology, Aerospace Center Hospital, Beijing, China
| | - Yangfan Du
- Department of Endocrinology, Aerospace Center Hospital, Beijing, China
| | - Jing Liu
- Clinics of Cadre, Department of Outpatient, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Zhaohui Lyu
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Yuhan Wang
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Li Jiang
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Shasha Yu
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Xiaomin Fu
- Clinics of Cadre, Department of Outpatient, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
6
|
Ziaikin E, Tello E, Peterson DG, Niv MY. BitterMasS: Predicting Bitterness from Mass Spectra. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:10537-10547. [PMID: 38685906 PMCID: PMC11082931 DOI: 10.1021/acs.jafc.3c09767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/02/2024]
Abstract
Bitter compounds are common in nature and among drugs. Previously, machine learning tools were developed to predict bitterness from the chemical structure. However, known structures are estimated to represent only 5-10% of the metabolome, and the rest remain unassigned or "dark". We present BitterMasS, a Random Forest classifier that was trained on 5414 experimental mass spectra of bitter and nonbitter compounds, achieving precision = 0.83 and recall = 0.90 for an internal test set. Next, the model was tested against spectra newly extracted from the literature 106 bitter and nonbitter compounds and for additional spectra measured for 26 compounds. For these external test cases, BitterMasS exhibited 67% precision and 93% recall for the first and 58% accuracy and 99% recall for the second. The spectrum-bitterness prediction strategy was more effective than the spectrum-structure-bitterness prediction strategy and covered more compounds. These encouraging results suggest that BitterMasS can be used to predict bitter compounds in the metabolome without the need for structural assignment of individual molecules. This may enable identification of bitter compounds from metabolomics analyses, for comparing potential bitterness levels obtained by different treatments of samples and for monitoring bitterness changes overtime.
Collapse
Affiliation(s)
- Evgenii Ziaikin
- Food
Science and Nutrition, The Robert H. Smith Faculty of Agriculture,
Food and Environment, The Institute of Biochemistry, Food and Nutrition, The Hebrew University of Jerusalem, 76100 Rehovot, Israel
| | - Edisson Tello
- Department
of Food Science and Technology, College of Food, Agriculture, and
Environmental Sciences, The Ohio State University, Columbus, Ohio 43210, United States
| | - Devin G. Peterson
- Department
of Food Science and Technology, College of Food, Agriculture, and
Environmental Sciences, The Ohio State University, Columbus, Ohio 43210, United States
| | - Masha Y. Niv
- Food
Science and Nutrition, The Robert H. Smith Faculty of Agriculture,
Food and Environment, The Institute of Biochemistry, Food and Nutrition, The Hebrew University of Jerusalem, 76100 Rehovot, Israel
| |
Collapse
|
7
|
Matin M, Dehghanian A, Dastranj M, Darijani H. Explainable artificial intelligence modeling of internal arc in a medium voltage switchgear based on different CFD simulations. Heliyon 2024; 10:e29594. [PMID: 38665570 PMCID: PMC11044042 DOI: 10.1016/j.heliyon.2024.e29594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/17/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
The internal arc represents an unintentional release of electrical energy within the switchgear industry. Manufacturers must address this electro-thermal issue in their switchgears. Over the past decades, various researchers and engineering groups have examined the internal arc pressure rise in switchgears to mitigate damages. The high variability in pressure rise among switchgears due to diverse factors such as design, manufacturing, and electrical parameters results in varying reported pressure increases. This issue motivates the application of artificial intelligence (AI) in interpreting internal arc modeling. The present paper explores the impact of manufacturing parameters such as total duct width (TDW), height (H), and ducts condition (DC), along with environmental parameters like initial pressure (IP) and initial temperature (IT), on the maximum pressure (MP) generated during an internal arc in a medium voltage (MV) switchgear. For this purpose, 54 different computational fluid dynamics (CFD) models were built using the parameters indicated. An extreme gradient boosting (XGBoost) machine learning (ML) model was trained using different CFD models, with MP serving as the target variable for the ML model. The obtained results reveal a variation in the MP of the internal arc under the mentioned parameters, ranging from 17835.45 Pa to 144423.2 Pa. Using SHAP data revealed that IP, TDW, and DC were the most significant factors affecting the pressure increase of the internal arc phenomena.
Collapse
Affiliation(s)
- Mahmood Matin
- R&D Department, Kerman Tablo Corporation, Kerman, Iran
- Mechanical Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Amir Dehghanian
- Department of Mechanical Engineering, Shiraz University of Technology, Shiraz, Iran
| | - Mohammad Dastranj
- R&D Department, Kerman Tablo Corporation, Kerman, Iran
- Mechanical Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Hossein Darijani
- R&D Department, Kerman Tablo Corporation, Kerman, Iran
- Mechanical Engineering Department, Shahid Bahonar University of Kerman, Kerman, Iran
| |
Collapse
|
8
|
Liu X, Niu H, Peng J. Improving predictions: Enhancing in-hospital mortality forecast for ICU patients with sepsis-induced coagulopathy using a stacking ensemble model. Medicine (Baltimore) 2024; 103:e37634. [PMID: 38579092 PMCID: PMC10994494 DOI: 10.1097/md.0000000000037634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/26/2024] [Indexed: 04/07/2024] Open
Abstract
The incidence of sepsis-induced coagulopathy (SIC) is high, leading to increased mortality rates and prolonged hospitalization and intensive care unit (ICU) stays. Early identification of SIC patients at risk of in-hospital mortality can improve patient prognosis. The objective of this study is to develop and validate machine learning (ML) models to dynamically predict in-hospital mortality risk in SIC patients. A ML model is established based on the Medical Information Mart for Intensive Care IV (MIMIC-IV) database to predict in-hospital mortality in SIC patients. Utilizing univariate feature selection for feature screening. The optimal model was determined by calculating the area under the curve (AUC) with a 95% confidence interval (CI). The optimal model was interpreted using Shapley Additive Explanation (SHAP) values. Among the 3112 SIC patients included in MIMIC-IV, a total of 757 (25%) patients experienced mortality during their ICU stay. Univariate feature selection helps us to pick out the 20 most critical variables from the original feature. Among the 10 developed machine learning models, the stacking ensemble model exhibited the highest AUC (0.795, 95% CI: 0.763-0.827). Anion gap and age emerged as the most significant features for predicting the mortality risk in SIC. In this study, an ML model was constructed that exhibited excellent performance in predicting in-hospital mortality risk in SIC patients. Specifically, the stacking ensemble model demonstrated superior predictive ability.
Collapse
Affiliation(s)
- Xuhui Liu
- Youjiang Medical University for Nationalities, Baise, China
- Baise People’s Hospital, Baise, China
| | - Hao Niu
- Beijing Neurosurgical Institute, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | | |
Collapse
|
9
|
Liang X, Liu S, Li Z, Deng Y, Jiang Y, Yang H. Efficient cocrystal coformer screening based on a Machine learning Strategy: A case study for the preparation of imatinib cocrystal with enhanced physicochemical properties. Eur J Pharm Biopharm 2024; 196:114201. [PMID: 38309538 DOI: 10.1016/j.ejpb.2024.114201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 01/18/2024] [Accepted: 01/29/2024] [Indexed: 02/05/2024]
Abstract
Cocrystal engineering, which involves the self-assembly of two or more components into a solid-state supramolecular structure through non-covalent interactions, has emerged as a promising approach to tailor the physicochemical properties of active pharmaceutical ingredient (API). Efficient coformer screening for cocrystal remains a challenge. Herein, a prediction strategy based on machine learning algorithms was employed to predict cocrystal formation and seven reliable models with accuracy over 0.890 were successfully constructed. Imatinib was selected as the model drug and the models established were applied to screen 31 potential coformers. Experimental verification results indicated RF-8 is the optimal model among seven models with an accuracy of 0.839. When the seven models were combined for coformer screening of Imatinib, the combinational model achieved an accuracy of 0.903, and eight new solid forms were observed and characterized. Benefiting from intermolecular interactions, the obtained multicomponent crystals displayed enhanced physicochemical properties. Dissolution and solubility experiments showed the prepared multicomponent crystals had higher cumulative dissolution rate and remarkably improved the solubility of imatinib, and IM-MC exhibited comparable solubility to Imatinib mesylate α form. Stability test and cytotoxicity results showed that multicomponent crystals exhibited excellent stability and the drug-drug cocrystal IM-5F exhibited higher cytotoxicity than pure API.
Collapse
Affiliation(s)
- Xiaoxiao Liang
- Guangdong Provincial Key Lab of Green Chemical Product Technology, School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou 510640, China
| | - Shiyuan Liu
- Guangdong Provincial Key Lab of Green Chemical Product Technology, School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou 510640, China
| | - Zebin Li
- Guangdong Provincial Key Lab of Green Chemical Product Technology, School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou 510640, China
| | - Yuehua Deng
- Guangdong Provincial Key Lab of Green Chemical Product Technology, School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou 510640, China
| | - Yanbin Jiang
- Guangdong Provincial Key Lab of Green Chemical Product Technology, School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou 510640, China; School of Chemical Engineering, Guangdong University of Petrochemical Technology, Maoming 525000, China.
| | - Huaiyu Yang
- Department of Chemical Engineering, Loughborough University, Loughborough Leicestershire LE11 3TU, UK
| |
Collapse
|
10
|
Joe H, Kim HG. Multi-label classification with XGBoost for metabolic pathway prediction. BMC Bioinformatics 2024; 25:52. [PMID: 38297220 PMCID: PMC10832249 DOI: 10.1186/s12859-024-05666-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open
Abstract
BACKGROUND Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism's metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learning-based approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance. RESULTS In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoost-based metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on single-organism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks. CONCLUSIONS The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning.
Collapse
Affiliation(s)
- Hyunwhan Joe
- Biomedical Knowledge Engineering Lab., Seoul National University, Seoul, Republic of Korea
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, Seoul, Republic of Korea.
- School of Dentistry and Dental Research Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
11
|
Qian S, Qiao X, Zhang W, Yu Z, Dong S, Feng J. Machine learning-based prediction for settling velocity of microplastics with various shapes. WATER RESEARCH 2024; 249:121001. [PMID: 38113602 DOI: 10.1016/j.watres.2023.121001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/22/2023] [Accepted: 12/07/2023] [Indexed: 12/21/2023]
Abstract
Microplastics can easily enter the aquatic environment and be transported between water bodies. The terminal settling velocity of microplastics, which affects their transport and distribution in the aquatic environment, is mainly influenced by their size, density, and shape. Due to the difficulty in accurately predicting the terminal settling velocity of microplastics with various shapes, this study focuses on establishing high-performance prediction models and understanding the importance and effect of each feature parameter using machine learning. Based on the number of principal dimensions, the shapes of microplastics are classified into fiber, film, and fragment, and their thresholds are identified. The microplastics of different shape categories have different optimal shape parameters for predicting the terminal settling velocity: Corey shape factor, flatness, elongation, and sphericity for the fragment, film, fiber, and mixed-shape MPs, respectively. By including the dimensionless diameter, relative density and optimal shape parameter in the input parameter combination, the machine learning models can well predict the terminal settling velocity for the microplastics of different shape categories and mixed-shape with R2 > 0.867, achieving significantly higher performance than the existing theoretical and regression models. The interpretable analysis of machine learning reveals the highest importance of the microplastic size and its marginal effect when the dimensionless diameter D* = dn(g/v2)1/3 > 80, where dn is the equivalent diameter, g is the gravitational acceleration, and ν is the fluid kinematic viscosity. The effect of shape is weak for small microplastics and becomes significant when D* exceeds 65.
Collapse
Affiliation(s)
- Shangtuo Qian
- National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, Jiangsu 210024, China; College of Agricultural Science and Engineering, Hohai University, Nanjing 211100, China
| | - Xuyang Qiao
- National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, Jiangsu 210024, China; College of Water Conservancy and Hydropower Engineering, Hohai University, Nanjing 210098, China
| | - Wenming Zhang
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton AB T6G 1H9, Canada
| | - Zijian Yu
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton AB T6G 1H9, Canada
| | - Shunan Dong
- College of Agricultural Science and Engineering, Hohai University, Nanjing 211100, China
| | - Jiangang Feng
- National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, Jiangsu 210024, China; College of Agricultural Science and Engineering, Hohai University, Nanjing 211100, China.
| |
Collapse
|
12
|
Feng S, Wang J. Prediction of Organic-Inorganic Hybrid Perovskite Band Gap by Multiple Machine Learning Algorithms. Molecules 2024; 29:499. [PMID: 38276577 PMCID: PMC10820808 DOI: 10.3390/molecules29020499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 01/13/2024] [Accepted: 01/16/2024] [Indexed: 01/27/2024] Open
Abstract
As an indicator of the optical characteristics of perovskite materials, the band gap is a crucial parameter that impacts the functionality of a wide range of optoelectronic devices. Obtaining the band gap of a material via a labor-intensive, time-consuming, and inefficient high-throughput calculation based on first principles is possible. However, it does not yield the most accurate results. Machine learning techniques emerge as a viable and effective substitute for conventional approaches in band gap prediction. This paper collected 201 pieces of data through the literature and open-source databases. By separating the features related to bits A, B, and X, a dataset of 1208 pieces of data containing 30 feature descriptors was established. The dataset underwent preprocessing, and the Pearson correlation coefficient method was employed to eliminate non-essential features as a subset of features. The band gap was predicted using the GBR algorithm, the random forest algorithm, the LightGBM algorithm, and the XGBoost algorithm, in that order, to construct a prediction model for organic-inorganic hybrid perovskites. The outcomes demonstrate that the XGBoost algorithm yielded an MAE value of 0.0901, an MSE value of 0.0173, and an R2 value of 0.991310. These values suggest that, compared to the other two models, the XGBoost model exhibits the lowest prediction error, suggesting that the input features may better fit the prediction model. Finally, analysis of the XGBoost-based prediction model's prediction results using the SHAP model interpretation method reveals that the occupancy rate of the A-position ion has the greatest impact on the prediction of the band gap and has an A-negative correlation with the prediction results of the band gap. The findings provide valuable insights into the relationship between the prediction of band gaps and significant characteristics of organic-inorganic hybrid perovskites.
Collapse
Affiliation(s)
- Shun Feng
- Xi’an Key Laboratory of Advanced Photo-Electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing University, Xi’an 710123, China;
| | - Juan Wang
- Xi’an Key Laboratory of Advanced Photo-Electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing University, Xi’an 710123, China;
- Shaanxi Engineering Research Center of Controllable Neutron Source, School of Electronic Information, Xijing University, Xi’an 710123, China
| |
Collapse
|
13
|
Munshi RM. Novel ensemble learning approach with SVM-imputed ADASYN features for enhanced cervical cancer prediction. PLoS One 2024; 19:e0296107. [PMID: 38198475 PMCID: PMC10781159 DOI: 10.1371/journal.pone.0296107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/06/2023] [Indexed: 01/12/2024] Open
Abstract
Cervical cancer remains a leading cause of female mortality, particularly in developing regions, underscoring the critical need for early detection and intervention guided by skilled medical professionals. While Pap smear images serve as valuable diagnostic tools, many available datasets for automated cervical cancer detection contain missing data, posing challenges for machine learning models' efficacy. To address these hurdles, this study presents an automated system adept at managing missing information using ADASYN characteristics, resulting in exceptional accuracy. The proposed methodology integrates a voting classifier model harnessing the predictive capacity of three distinct machine learning models. It further incorporates SVM Imputer and ADASYN up-sampled features to mitigate missing value concerns, while leveraging CNN-generated features to augment the model's capabilities. Notably, this model achieves remarkable performance metrics, boasting a 99.99% accuracy, precision, recall, and F1 score. A comprehensive comparative analysis evaluates the proposed model against various machine learning algorithms across four scenarios: original dataset usage, SVM imputation, ADASYN feature utilization, and CNN-generated features. Results indicate the superior efficacy of the proposed model over existing state-of-the-art techniques. This research not only introduces a novel approach but also offers actionable suggestions for refining automated cervical cancer detection systems. Its impact extends to benefiting medical practitioners by enabling earlier detection and improved patient care. Furthermore, the study's findings have substantial societal implications, potentially reducing the burden of cervical cancer through enhanced diagnostic accuracy and timely intervention.
Collapse
Affiliation(s)
- Raafat M. Munshi
- Department of Medical Laboratory Technology (MLT), Faculty of Applied Medical Sciences, King Abdulaziz University, Rabigh, Saudi Arabia
| |
Collapse
|
14
|
Ma J, Zhang S, Liu X, Wang J. Machine learning prediction of biochar yield based on biomass characteristics. BIORESOURCE TECHNOLOGY 2023; 389:129820. [PMID: 37805089 DOI: 10.1016/j.biortech.2023.129820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/01/2023] [Accepted: 10/01/2023] [Indexed: 10/09/2023]
Abstract
Slow pyrolysis is a widely used thermochemical pathway that can convert organic waste into biochar. We employed six machine learning models to predictively model 13 selected variables using pearson feature selection. Additionally, partial dependence analysis is used to reveal the deep relationship between feature variables. Both the gradient boosting decision tree and the Levenberg-Marquardt backpropagation neural network achieved training set R2 > 0.9 and testing set R2 > 0.8. But the other models displayed lower performance on the testing set, with R2 < 0.8. The partial dependence plot demonstrates that pyrolysis conditions have greater impact on biochar yield than biomass composition. Furthermore, the highest treatment temperature, being the sole consistently changing feature, can serve as a guiding factor for regulating biochar yield. This study highlights the immense potential of machine learning in experimental prediction, providing a scientific reference for reducing time and economic costs in pyrolysis experiments and process development.
Collapse
Affiliation(s)
- Jingjing Ma
- School of Human Settlements and Civil Engineering, Xi'an Jiaotong University, 710049, China
| | - Shuai Zhang
- School of Human Settlements and Civil Engineering, Xi'an Jiaotong University, 710049, China
| | - Xiangjun Liu
- School of Human Settlements and Civil Engineering, Xi'an Jiaotong University, 710049, China
| | - Junqi Wang
- School of Human Settlements and Civil Engineering, Xi'an Jiaotong University, 710049, China.
| |
Collapse
|
15
|
Mavaie P, Holder L, Skinner MK. Hybrid deep learning approach to improve classification of low-volume high-dimensional data. BMC Bioinformatics 2023; 24:419. [PMID: 37936066 PMCID: PMC10631218 DOI: 10.1186/s12859-023-05557-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 11/01/2023] [Indexed: 11/09/2023] Open
Abstract
BACKGROUND The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.
Collapse
Affiliation(s)
- Pegah Mavaie
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA
| | - Lawrence Holder
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA
| | - Michael K Skinner
- School of Biological Sciences, Center for Reproductive Biology, Washington State University, Pullman, WA, 99164-4236, USA.
| |
Collapse
|
16
|
Meng F, Wang J, Chen Z, Qiao F, Yang D. Shaping the concentration of petroleum hydrocarbon pollution in soil: A machine learning and resistivity-based prediction method. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 345:118817. [PMID: 37597372 DOI: 10.1016/j.jenvman.2023.118817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/03/2023] [Accepted: 08/12/2023] [Indexed: 08/21/2023]
Abstract
A new method relying on machine learning and resistivity to predict concentrations of petroleum hydrocarbon pollution in soil was proposed as a means of investigation and monitoring. Currently, determining pollutant concentrations in soil is primarily achieved through costly sampling and testing of numerous borehole samples, which carries the risk of further contamination by penetrating the aquifer. Additionally, conventional petroleum hydrocarbon geophysical surveys struggle to establish a correlation between survey results and pollutant concentration. To overcome these limitations, three machine learning models (KNN, RF, and XGBOOST) were combined with the geoelectrical method to predict petroleum hydrocarbon concentrations in the source area. The results demonstrate that the resistivity-based prediction method utilizing machine learning is effective, as validated by R-squared values of 0.91 and 0.94 for the test and validation sets, respectively, and a root mean squared error of 0.19. Furthermore, this study confirmed the feasibility of the approach using actual site data, along with a discussion of its advantages and limitations, establishing it as an inexpensive option to investigate and monitor changes in petroleum hydrocarbon concentration in soil.
Collapse
Affiliation(s)
- Fansong Meng
- School of Earth Science and Engineering, Hohai University, Nanjing, 210098, China
| | - Jinguo Wang
- School of Earth Science and Engineering, Hohai University, Nanjing, 210098, China.
| | - Zhou Chen
- School of Earth Science and Engineering, Hohai University, Nanjing, 210098, China
| | - Fei Qiao
- School of Earth Science and Engineering, Hohai University, Nanjing, 210098, China
| | - Dong Yang
- School of Earth Science and Engineering, Hohai University, Nanjing, 210098, China
| |
Collapse
|
17
|
Bakasa W, Viriri S. Stacked ensemble deep learning for pancreas cancer classification using extreme gradient boosting. Front Artif Intell 2023; 6:1232640. [PMID: 37876961 PMCID: PMC10591225 DOI: 10.3389/frai.2023.1232640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/04/2023] [Indexed: 10/26/2023] Open
Abstract
Ensemble learning aims to improve prediction performance by combining several models or forecasts. However, how much and which ensemble learning techniques are useful in deep learning-based pipelines for pancreas computed tomography (CT) image classification is a challenge. Ensemble approaches are the most advanced solution to many machine learning problems. These techniques entail training multiple models and combining their predictions to improve the predictive performance of a single model. This article introduces the idea of Stacked Ensemble Deep Learning (SEDL), a pipeline for classifying pancreas CT medical images. The weak learners are Inception V3, VGG16, and ResNet34, and we employed a stacking ensemble. By combining the first-level predictions, an input train set for XGBoost, the ensemble model at the second level of prediction, is created. Extreme Gradient Boosting (XGBoost), employed as a strong learner, will make the final classification. Our findings showed that SEDL performed better, with a 98.8% ensemble accuracy, after some adjustments to the hyperparameters. The Cancer Imaging Archive (TCIA) public access dataset consists of 80 pancreas CT scans with a resolution of 512 * 512 pixels, from 53 male and 27 female subjects. A sample of two hundred and twenty-two images was used for training and testing data. We concluded that implementing the SEDL technique is an effective way to strengthen the robustness and increase the performance of the pipeline for classifying pancreas CT medical images. Interestingly, grouping like-minded or talented learners does not make a difference.
Collapse
Affiliation(s)
| | - Serestina Viriri
- School of Mathematics Statistics & Computer Science, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
18
|
Tomita K, Yamasaki A, Katou R, Ikeuchi T, Touge H, Sano H, Tohda Y. Construction of a Diagnostic Algorithm for Diagnosis of Adult Asthma Using Machine Learning with Random Forest and XGBoost. Diagnostics (Basel) 2023; 13:3069. [PMID: 37835811 PMCID: PMC10572917 DOI: 10.3390/diagnostics13193069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/25/2023] [Accepted: 09/26/2023] [Indexed: 10/15/2023] Open
Abstract
An evidence-based diagnostic algorithm for adult asthma is necessary for effective treatment and management. We present a diagnostic algorithm that utilizes a random forest (RF) and an optimized eXtreme Gradient Boosting (XGBoost) classifier to diagnose adult asthma as an auxiliary tool. Data were gathered from the medical records of 566 adult outpatients who visited Kindai University Hospital with complaints of nonspecific respiratory symptoms. Specialists made a thorough diagnosis of asthma based on symptoms, physical indicators, and objective testing, including airway hyperresponsiveness. We used two decision-tree classifiers to identify the diagnostic algorithms: RF and XGBoost. Bayesian optimization was used to optimize the hyperparameters of RF and XGBoost. Accuracy and area under the curve (AUC) were used as evaluation metrics. The XGBoost classifier outperformed the RF classifier with an accuracy of 81% and an AUC of 85%. A combination of symptom-physical signs and lung function tests was successfully used to construct a diagnostic algorithm on importance features for diagnosing adult asthma. These results indicate that the proposed model can be reliably used to construct diagnostic algorithms with selected features from objective tests in different settings.
Collapse
Affiliation(s)
- Katsuyuki Tomita
- Department of Respiratory Medicine, Yonago Medical Center, National Hospital Organization, Yonago 683-0006, Japan; (R.K.); (T.I.); (H.T.)
| | - Akira Yamasaki
- Division of Respiratory Medicine and Rheumatology, Department of Multidisciplinary Internal Medicine, School of Medicine, Tottori University, Yonago 683-8503, Japan;
| | - Ryohei Katou
- Department of Respiratory Medicine, Yonago Medical Center, National Hospital Organization, Yonago 683-0006, Japan; (R.K.); (T.I.); (H.T.)
| | - Tomoyuki Ikeuchi
- Department of Respiratory Medicine, Yonago Medical Center, National Hospital Organization, Yonago 683-0006, Japan; (R.K.); (T.I.); (H.T.)
| | - Hirokazu Touge
- Department of Respiratory Medicine, Yonago Medical Center, National Hospital Organization, Yonago 683-0006, Japan; (R.K.); (T.I.); (H.T.)
| | - Hiroyuki Sano
- Allergy Center, Kindai University Hospital, Osakasayama 589-8511, Japan;
| | - Yuji Tohda
- Department of Respiratory and Allergorogy, Kindai University, Osakasayama 589-8511, Japan;
| |
Collapse
|
19
|
Yan Y, Shi Z, Wei H. ROSes-FINDER: a multi-task deep learning framework for accurate prediction of microorganism reactive oxygen species scavenging enzymes. Front Microbiol 2023; 14:1245805. [PMID: 37744924 PMCID: PMC10513406 DOI: 10.3389/fmicb.2023.1245805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 08/21/2023] [Indexed: 09/26/2023] Open
Abstract
Reactive oxygen species (ROS) are highly reactive molecules that play important roles in microbial biological processes. However, excessive accumulation of ROS can lead to oxidative stress and cellular damage. Microorganism have evolved a diverse suite of enzymes to mitigate the harmful effects of ROS. Accurate prediction of ROS scavenging enzymes classes (ROSes) is crucial for understanding the mechanisms of oxidative stress and developing strategies to combat related diseases. Nevertheless, the existing approaches for categorizing ROS-related proteins exhibit certain drawbacks with regards to their precision and inclusiveness. To address this, we propose a new multi-task deep learning framework called ROSes-FINDER. This framework integrates three component methods using a voting-based approach to predict multiple ROSes properties simultaneously. It can identify whether a given protein sequence is a ROSes and determine its type. The three component methods used in the framework are ROSes-CNN, which extracts raw sequence encoding features, ROSes-NN, which predicts protein functions based on sequence information, and ROSes-XGBoost, which performs functional classification using ensemble machine learning. Comprehensive experiments demonstrate the superior performance and robustness of our method. ROSes-FINDER is freely available at https://github.com/alienn233/ROSes-Finder for predicting ROSes classes.
Collapse
Affiliation(s)
- Yueyang Yan
- College of Veterinary Medicine, Jilin University, Changchun, China
| | - Zhanpeng Shi
- College of Veterinary Medicine, Jilin University, Changchun, China
| | - Haijian Wei
- Department of Organ Transplantation, The Affiliated Yantai Yuhuangding Hospital of Qingdao University, Yantai City, China
| |
Collapse
|
20
|
Luo G, Zou F, Guo F, Liu J, Cai X, Cai Q, Xia C. An over-the-horizon potential safety threat vehicle identification method based on ETC big data. Heliyon 2023; 9:e20050. [PMID: 37810065 PMCID: PMC10559829 DOI: 10.1016/j.heliyon.2023.e20050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 09/08/2023] [Accepted: 09/09/2023] [Indexed: 10/10/2023] Open
Abstract
Smart cars rely on sensors like LIDAR and high-precision map-based perception for driving environment sensing. However, they can't detect low-speed vehicles beyond visual range, affecting safety and comfort. Manual vehicles face similar challenges. Low-speed driving contributes to expressway accidents due to limited visibility, road design, and equipment performance. To enhance safety, an over-the-horizon potential safety threat vehicle identification method using ETC big data is proposed. It consists of three layers. The first layer is the vehicle section travel speed sensing layer based on the wlp-XGBoost algorithm. The second layer is the in-transit vehicle position estimation layer based on the DR-HMM algorithm. The third layer is the Multi-information fusion of potential safety threat vehicle identification layer. Dynamic real-time detection and identification of potential safety threats in expressway sections were achieved, and simulations were conducted using real-time ETC data from Quanxia section on an ETC platform. Results show accurate prediction of vehicle speed and position in different road sections and traffic situations, with over 95% accuracy and recall in identifying potential safety threat vehicles. It perceives changes in the traffic conditions of road sections in real-time based on the changing trend of potential safety threat vehicle numbers, providing a vital reference for speed planning and risk avoidance.
Collapse
Affiliation(s)
- Guanghao Luo
- Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou, 350108, China
- Renewable Energy Technology Research Institute of Fujian University of Technology, Ningde, 352101, China
| | - Fumin Zou
- Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou, 350108, China
- Renewable Energy Technology Research Institute of Fujian University of Technology, Ningde, 352101, China
| | - Feng Guo
- Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou, 350108, China
- Renewable Energy Technology Research Institute of Fujian University of Technology, Ningde, 352101, China
| | - Jishun Liu
- Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou, 350108, China
- Renewable Energy Technology Research Institute of Fujian University of Technology, Ningde, 352101, China
| | - Xinjian Cai
- Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou, 350108, China
- Renewable Energy Technology Research Institute of Fujian University of Technology, Ningde, 352101, China
| | - Qiqin Cai
- School of Mechanical Engineering and Automation, Huaqiao University, Xiamen, 361021, China
| | - Chenxi Xia
- Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou, 350108, China
- Renewable Energy Technology Research Institute of Fujian University of Technology, Ningde, 352101, China
| |
Collapse
|
21
|
Unal M, Bostanci E, Ozkul C, Acici K, Asuroglu T, Guzel MS. Crohn's Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome. Diagnostics (Basel) 2023; 13:2835. [PMID: 37685376 PMCID: PMC10486516 DOI: 10.3390/diagnostics13172835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/24/2023] [Accepted: 08/31/2023] [Indexed: 09/10/2023] Open
Abstract
Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In this study, we employed several modern Machine Learning techniques to predict Inflammatory Bowel Disease using raw sequence data. The dataset was obtained from NCBI preprocessed graph representations and converted into a structured form. Seven well-known Machine Learning frameworks, including Random Forest, Support Vector Machines, Extreme Gradient Boosting, Light Gradient Boosting Machine, Gaussian Naïve Bayes, Logistic Regression, and k-Nearest Neighbor, were used. Grid Search was employed for hyperparameter optimization. The performance of the Machine Learning models was evaluated using various metrics such as accuracy, precision, fscore, kappa, and area under the receiver operating characteristic curve. Additionally, Mc Nemar's test was conducted to assess the statistical significance of the experiment. The data was constructed using k-mer lengths of 3, 4 and 5. The Light Gradient Boosting Machine model overperformed over other models with 67.24%, 74.63% and 76.47% accuracy for k-mer lengths of 3, 4 and 5, respectively. The LightGBM model also demonstrated the best performance in each metric. The study showed promising results predicting disease from raw sequence data. Finally, Mc Nemar's test results found statistically significant differences between different Machine Learning approaches.
Collapse
Affiliation(s)
- Metehan Unal
- Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey; (M.U.)
| | - Erkan Bostanci
- Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey; (M.U.)
| | - Ceren Ozkul
- Department of Pharmaceutical Microbiology, Faculty of Pharmacy, Hacettepe University, 06230 Ankara, Turkey
| | - Koray Acici
- Department of Artificial Intelligence and Data Engineering, Ankara University, 06830 Ankara, Turkey
| | - Tunc Asuroglu
- Faculty of Medicine and Health Technology, Tampere University, 33720 Tampere, Finland
| | - Mehmet Serdar Guzel
- Department of Computer Engineering, Ankara University, 06830 Ankara, Turkey; (M.U.)
| |
Collapse
|
22
|
Faraz A, Tırınk C, Önder H, Şen U, Ishaq HM, Tauqir NA, Waheed A, Nabeel MS. Usage of the XGBoost and MARS algorithms for predicting body weight in Kajli sheep breed. Trop Anim Health Prod 2023; 55:276. [PMID: 37500805 DOI: 10.1007/s11250-023-03700-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 07/19/2023] [Indexed: 07/29/2023]
Abstract
This study aimed to utilize the XGBoost and MARS algorithms to predict present weight from body measurements. The algorithms have the potential to model nonlinear relationships between body measurements and weight, and this study attempted to find a model that provided the most accurate predictions of present weight. The current study was conducted with 152 animals in order to achieve a certain goal. To compare the model performances, goodness-of-fit criteria such as R2, r, RMSE, CV, SDratio, PI, MAPE, AIC were used. According to the results of this study, the XGBoost algorithm was the most reliable model for predicting present weight from body measurement. Even if the XGBoost algorithm was the most accurate model, the MARS algorithm was the reliable model for the same aim. In addition, it is hoped that the results of this study will help researchers and breeders better understand the relationship between body measurements and weight and ultimately be able to help individuals better manage their weight. As a conclusion, in the current study, the XGBoost algorithm is an effective, efficient, and reliable tool for accurately estimating present weight from body measurements. This makes it an invaluable tool in rural areas, where traditional weighing scales may not be available or reliable.
Collapse
Affiliation(s)
- Asim Faraz
- Department of Livestock and Poultry Production, Bahauddin Zakariya University, Multan, Pakistan
| | - Cem Tırınk
- Department of Animal Science, Faculty of Agriculture, Igdir University, Igdir, Turkey.
| | - Hasan Önder
- Department of Animal Science, Faculty of Agriculture, Ondokuz Mayis University, Samsun, Turkey
| | - Uğur Şen
- Department of Agricultural Biotechnology, Faculty of Agriculture, Ondokuz Mayis University, Samsun, Turkey
| | - Hafiz Muhammad Ishaq
- Department of Livestock and Poultry Production, Bahauddin Zakariya University, Multan, Pakistan
| | - Nasir Ali Tauqir
- Department of Animal Nutrition, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Abdul Waheed
- Department of Livestock and Poultry Production, Bahauddin Zakariya University, Multan, Pakistan
| | | |
Collapse
|
23
|
Xiang T, Li T, Li J, Li X, Wang J. Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs. FASEB J 2023; 37:e22961. [PMID: 37178007 DOI: 10.1096/fj.202300245r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 03/30/2023] [Accepted: 04/25/2023] [Indexed: 05/15/2023]
Abstract
Genomic prediction, which is based on solving linear mixed-model (LMM) equations, is the most popular method for predicting breeding values or phenotypic performance for economic traits in livestock. With the need to further improve the performance of genomic prediction, nonlinear methods have been considered as an alternative and promising approach. The excellent ability to predict phenotypes in animal husbandry has been demonstrated by machine learning (ML) approaches, which have been rapidly developed. To investigate the feasibility and reliability of implementing genomic prediction using nonlinear models, the performances of genomic predictions for pig productive traits using the linear genomic selection model and nonlinear machine learning models were compared. Then, to reduce the high-dimensional features of genome sequence data, different machine learning algorithms, including the random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost) and convolutional neural network (CNN) algorithms, were used to perform genomic feature selection as well as genomic prediction on reduced feature genome data. All of the analyses were processed on two real pig datasets: the published PIC pig dataset and a dataset comprising data from a national pig nucleus herd in Chifeng, North China. Overall, the accuracies of predicted phenotypic performance for traits T1, T2, T3 and T5 in the PIC dataset and average daily gain (ADG) in the Chifeng dataset were higher using the ML methods than the LMM method, while those for trait T4 in the PIC dataset and total number of piglets born (TNB) in the Chifeng dataset were slightly lower using the ML methods than the LMM method. Among all the different ML algorithms, SVM was the most appropriate for genomic prediction. For the genomic feature selection experiment, the most stable and most accurate results across different algorithms were achieved using XGBoost in combination with the SVM algorithm. Through feature selection, the number of genomic markers can be reduced to 1 in 20, while the predictive performance on some traits can even be improved compared to using the full genome data. Finally, we developed a new tool that can be used to execute combined XGBoost and SVM algorithms to realize genomic feature selection and phenotypic prediction.
Collapse
Affiliation(s)
- Tao Xiang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, Huazhong Agricultural University, Wuhan, China
| | - Tao Li
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Jielin Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Laboratory of Swine Genetics and Breeding of Ministry of Agriculture, Huazhong Agricultural University, Wuhan, China
| | - Xin Li
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Jia Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
24
|
Li Z, Zhao Y, Duan T, Dai J. Configurational patterns for COVID-19 related social media rumor refutation effectiveness enhancement based on machine learning and fsQCA. Inf Process Manag 2023; 60:103303. [PMID: 36741251 PMCID: PMC9889264 DOI: 10.1016/j.ipm.2023.103303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 01/25/2023] [Accepted: 01/26/2023] [Indexed: 02/04/2023]
Abstract
Infodemics are intertwined with the COVID-19 pandemic, affecting people's perception and social order. To curb the spread of COVID-19 related false rumors, fuzzy-set qualitative comparative analysis (fsQCA) is used to find configurational pathways to enhance rumor refutation effectiveness. In this paper, a total of 1,903 COVID-19 related false rumor refutation microblogs on Sina Weibo are collected by a web crawler from January 1, 2022 to April 20, 2022, and 10 main conditions affecting rumor refutation effectiveness index (REI) are identified based on "three rules of epidemics". To reduce data redundancy, five ensemble machine learning models are established and tuned, among which Light Gradient Boosting Machine (LGBM) regression model has the best performance. Then five core conditions are extracted by feature importance ranking of LGBM. Based on fsQCA with the five core conditions, REI enhancement can be achieved through three different pathway elements configurations solutions: "Highly influential microblogger * high followers' stickiness microblogger", "high followers' stickiness microblogger * highly active microblogger * concise information description" and "high followers' stickiness microblogger * the sentiment tendency of the topic * concise information description". Finally, decision-making suggestions for false rumor refutation platforms and new ideas for improving false rumor refutation effectiveness are proposed. The innovation of this paper reflects in exploring the REI enhancement strategy from the perspective of configuration for the first time.
Collapse
Affiliation(s)
- Zongmin Li
- Business School, Sichuan University, Chengdu 610065, China
| | - Ye Zhao
- Business School, Sichuan University, Chengdu 610065, China
| | - Tie Duan
- Business School, Sichuan University, Chengdu 610065, China
| | - Jingqi Dai
- School of Economics and Management, Civil Aviation Flight University of China, Guanghan 618300, China,Corresponding author
| |
Collapse
|
25
|
Pezoa R, Basso F, Quilodrán P, Varas M. Estimation of trip purposes in public transport during the COVID-19 pandemic: The case of Santiago, Chile. JOURNAL OF TRANSPORT GEOGRAPHY 2023; 109:103594. [PMID: 37123884 PMCID: PMC10121142 DOI: 10.1016/j.jtrangeo.2023.103594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 03/15/2023] [Accepted: 04/17/2023] [Indexed: 05/03/2023]
Abstract
The COVID-19 pandemic strongly affected the mobility of people. Several studies have quantified these changes, for example, measuring the effectiveness of quarantine measures and calculating the decrease in the use of public transport. Regarding the latter, however, a low level of understanding persists as to how the pandemic affected the distribution of trip purposes, hindering the design of policies aimed at increasing the demand for public transport in a post-pandemic era. To address this gap, in this article, we study how the purposes of trips made by public transport evolved during the COVID-19 pandemic in the city of Santiago, Chile. For this, we develop an XGBoost model using the latest available origin-destination survey as input. The calibrated model is applied to the information from smart payment cards during one week in 2018, 2020, and 2021. The results show that during the week of maximum restriction, that is, during 2020, the distribution of trips by purpose varied considerably, with the proportion of trips to work increasing, recreational trips decreasing, and trips for health purposes remaining unchanged. In sociodemographic terms, in the higher-income communes, the decrease in the proportion of trips for work purposes was much greater than that in the communes with lower income. Finally, with the gradual return to in-person activities in 2021, the distribution of trip purposes returned to values similar to those before the pandemic, although with a lower total amount, which suggests that unless relevant measures are taken, the low use of public transportation could be permanent.
Collapse
Affiliation(s)
- Raúl Pezoa
- Escuela de Ingeniería Industrial, Universidad Diego Portales, Santiago, Chile
| | - Franco Basso
- School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
- Instituto Sistemas Complejos de Ingeniería, Chile
| | - Paulina Quilodrán
- Escuela de Ingeniería Industrial, Universidad Diego Portales, Santiago, Chile
| | - Mauricio Varas
- Centro de Investigación en Sustentabilidad y Gestión Estratégica de Recursos, Facultad de Ingeniería, Universidad del Desarrollo, Santiago, Chile
| |
Collapse
|
26
|
Shao X, Wang H, Zhu X, Xiong F, Mu T, Zhang Y. EFFECT: Explainable framework for meta-learning in automatic classification algorithm selection. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.11.144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
27
|
Chen X, Lin S, Zheng Y, He L, Fang Y. Long-term trajectories of depressive symptoms and machine learning techniques for fall prediction in older adults:Evidence from the China Health and Retirement Longitudinal Study (CHARLS). Arch Gerontol Geriatr 2023; 111:105012. [PMID: 37030148 DOI: 10.1016/j.archger.2023.105012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 04/01/2023]
Abstract
BACKGROUND Falls are the most common adverse outcome of depression in older adults, yet a accurate risk prediction model for falls stratified by distinct long-term trajectories of depressive symptoms is still lacking. METHODS We collected the data of 1617 participants from the China Health and Retirement Longitudinal Study register, spanning between 2011 and 2018. The 36 input variables included in the baseline survey were regarded as candidate features. The trajectories of depressive symptoms were classified by the latent class growth model and growth mixture model. Three data balancing technologies and four machine learning algorithms were utilized to develop predictive models for fall classification of depressive prognosis. RESULTS Depressive symptom trajectories were divided into four categories, i.e., non-symptoms, new-onset increasing symptoms, slowly decreasing symptoms, and persistent high symptoms. The random forest-TomekLinks model achieved the best performance among the case and incident models with an AUC-ROC of 0.844 and 0.731, respectively. In the chronic model, the gradient boosting decision tree-synthetic minority oversampling technique obtained an AUC-ROC of 0.783. In the three models, the depressive symptom score was the most crucial component. The lung function was a common and significant feature in both the case and the chronic models. CONCLUSIONS This study suggests that the ideal model has a good chance of identifying older persons with a high risk of falling stratified by long-term trajectories of depressive symptoms. Baseline depressive symptom score, lung function, income, and injury experience are influential factors associated with falls of depression evolution.
Collapse
|
28
|
Wang X, Sheng Y, Ning J, Xi J, Xi L, Qiu D, Yang J, Ke X. A Critical Review of Machine Learning Techniques on Thermoelectric Materials. J Phys Chem Lett 2023; 14:1808-1822. [PMID: 36763950 DOI: 10.1021/acs.jpclett.2c03073] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Thermoelectric (TE) materials can directly convert heat to electricity and vice versa and have broad application potential for solid-state power generation and refrigeration. Over the past few decades, efforts have been made to develop new TE materials with high performance. However, traditional experiments and simulations are expensive and time-consuming, limiting the development of new materials. Machine learning (ML) has been increasingly applied to study TE materials in recent years. This paper reviews the recent progress in ML-based TE material research. The application of ML in predicting and optimizing the properties of TE materials, including electrical and thermal transport properties and optimization of functional materials with targeted TE properties, is reviewed. Finally, future research directions are discussed.
Collapse
Affiliation(s)
- Xiangdong Wang
- Materials Genome Institute, Shanghai University, Shanghai200444, China
- School of Physics and Electronic Science, East China Normal University, Shanghai200241, China
| | - Ye Sheng
- Materials Genome Institute, Shanghai University, Shanghai200444, China
| | - Jinyan Ning
- Materials Genome Institute, Shanghai University, Shanghai200444, China
| | - Jinyang Xi
- Materials Genome Institute, Shanghai University, Shanghai200444, China
- Zhejiang Laboratory, Hangzhou, Zhejiang311100, China
| | - Lili Xi
- Materials Genome Institute, Shanghai University, Shanghai200444, China
- Zhejiang Laboratory, Hangzhou, Zhejiang311100, China
| | - Di Qiu
- Materials Genome Institute, Shanghai University, Shanghai200444, China
- Zhejiang Laboratory, Hangzhou, Zhejiang311100, China
| | - Jiong Yang
- Materials Genome Institute, Shanghai University, Shanghai200444, China
- Zhejiang Laboratory, Hangzhou, Zhejiang311100, China
| | - Xuezhi Ke
- School of Physics and Electronic Science, East China Normal University, Shanghai200241, China
| |
Collapse
|
29
|
Huang C, Gao W, Zheng Y, Wang W, Zhang Y, Liu K. Universal machine-learning algorithm for predicting adsorption performance of organic molecules based on limited data set: Importance of feature description. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 859:160228. [PMID: 36402319 DOI: 10.1016/j.scitotenv.2022.160228] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/09/2022] [Accepted: 11/12/2022] [Indexed: 06/16/2023]
Abstract
Adsorption of organic molecules from aqueous solution offers a simple and effective method for their removal. Recently, there have been several attempts to apply machine learning (ML) for this problem. To this end, polyparameter linear free energy relationships (pp-LFERs) were employed, and poor prediction results were observed outside model applicability domain of pp-LFERs. In this study, we improved the applicability of ML methods by adopting a chemical-structure (CS) based approach. We used the prediction of adsorption of organic molecules on carbon-based adsorbents as an example. Our results show that this approach can fully differentiate the structural differences between any organic molecules, while providing significant information that is relevant to their interaction with the adsorbents. We compared two CS feature descriptors: 3D-coordination and simplified molecular-input line-entry system (SMILES). We then built CS-ML models based on neural networks (NN) and extreme gradient boosting (XGB). They all outperformed pp-LFERs based models and are capable to accurately predict adsorption isotherm of isomers with similar physiochemical properties such as chiral molecules, even though they are trained with achiral molecules and racemates. We found for predicting adsorption isotherm, XGB shows better performance than NN, and 3D-coordinations allow effective differentiation between organic molecules.
Collapse
Affiliation(s)
- Chaoyi Huang
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Wenyang Gao
- Division of Artificial Intelligence and Data Science, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Yingdie Zheng
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Wei Wang
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Yue Zhang
- Division of Artificial Intelligence and Data Science, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Kai Liu
- Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China.
| |
Collapse
|
30
|
Fei Z, Liang S, Cai Y, Shen Y. Ensemble Machine-Learning-Based Prediction Models for the Compressive Strength of Recycled Powder Mortar. MATERIALS (BASEL, SWITZERLAND) 2023; 16:583. [PMID: 36676320 PMCID: PMC9862350 DOI: 10.3390/ma16020583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/27/2022] [Accepted: 01/04/2023] [Indexed: 06/17/2023]
Abstract
Recycled powder (RP) serves as a potential and prospective substitute for cementitious materials in concrete. The compressive strength of RP mortar is a pivotal factor affecting the mechanical properties of RP concrete. The application of machine learning (ML) approaches in the engineering problems, particularly for predicting the mechanical properties of construction materials, leads to high prediction accuracy and low experimental costs. In this study, 204 groups of RP mortar compression experimental data are collected from the literature to establish a dataset for ML, including 163 groups in the training set and 41 groups in the test set. Four ensemble ML models, namely eXtreme Gradient-Boosting (XGBoost), Random Forest (RF), Light Gradient-Boosting Machine (LightGBM) and Adaptive Boosting (AdaBoost), were selected to predict the compressive strength of RP mortar. The comparative results demonstrate that XGBoost has the highest prediction accuracy when the a10-index, MAE, RMSE and R2 of the training set are 0.926, 1.596, 2.155 and 0.950 and the a10-index, MAE, RMSE and R2 of the test set are 0.659, 3.182, 4.285 and 0.842, respectively. SHapley Additive exPlanation (SHAP) is adopted to interpret the prediction process of XGBoost and explain the influence of influencing factors on the compressive strength of RP mortar. According to the importance of influencing factors, the order is the mass replacement rate of RP, the size of RP, the kind of RP and the water binder ratio of RP. The compressive strength of RP mortar decreases with the increase in the RP mass replacement rate. The compressive strength of RBP mortar is slightly higher than that of RCP mortar. Machine learning technologies will benefit the construction industry by facilitating the rapid and cost-effective evaluation of RP material properties.
Collapse
|
31
|
Interpretable Machine Learning Techniques in ECG-Based Heart Disease Classification: A Systematic Review. Diagnostics (Basel) 2022; 13:diagnostics13010111. [PMID: 36611403 PMCID: PMC9818170 DOI: 10.3390/diagnostics13010111] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/31/2022] Open
Abstract
Heart disease is one of the leading causes of mortality throughout the world. Among the different heart diagnosis techniques, an electrocardiogram (ECG) is the least expensive non-invasive procedure. However, the following are challenges: the scarcity of medical experts, the complexity of ECG interpretations, the manifestation similarities of heart disease in ECG signals, and heart disease comorbidity. Machine learning algorithms are viable alternatives to the traditional diagnoses of heart disease from ECG signals. However, the black box nature of complex machine learning algorithms and the difficulty in explaining a model's outcomes are obstacles for medical practitioners in having confidence in machine learning models. This observation paves the way for interpretable machine learning (IML) models as diagnostic tools that can build a physician's trust and provide evidence-based diagnoses. Therefore, in this systematic literature review, we studied and analyzed the research landscape in interpretable machine learning techniques by focusing on heart disease diagnosis from an ECG signal. In this regard, the contribution of our work is manifold; first, we present an elaborate discussion on interpretable machine learning techniques. In addition, we identify and characterize ECG signal recording datasets that are readily available for machine learning-based tasks. Furthermore, we identify the progress that has been achieved in ECG signal interpretation using IML techniques. Finally, we discuss the limitations and challenges of IML techniques in interpreting ECG signals.
Collapse
|
32
|
Osman SMI, Sabit A. Predictors of COVID-19 vaccination rate in USA: A machine learning approach. MACHINE LEARNING WITH APPLICATIONS 2022; 10:100408. [PMID: 36128042 PMCID: PMC9479385 DOI: 10.1016/j.mlwa.2022.100408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 09/02/2022] [Accepted: 09/02/2022] [Indexed: 12/14/2022] Open
Abstract
In this study, we examine state-level features and policies that are most important in achieving a threshold level vaccination rate to curve the effects of the COVID-19 pandemic. We employ CHAID, a decision tree algorithm, on three different model specifications to answer this question based on a dataset that includes all the states in the United States. Workplace travel emerges as the most important predictor; however, the governors' political affiliation (PA) replaces it in a more conservative feature set that includes economic features and the growth rate of COVID-19 cases. We also employ several alternative algorithms as a robustness check. Results from these checks confirm our original findings regarding workplace travels and political affiliation. The accuracy under different model specifications ranges from 80%-88%, whereas the sensitivity is between 92.5%-100%. Our findings provide actionable policy insights to increase vaccination rates and combat the COVID-19 pandemic.
Collapse
Affiliation(s)
- Syed Muhammad Ishraque Osman
- Jack Welch College of Business & Technology, Sacred Heart University, West Campus, East Building - 1st Floor, 3135 Easton Turnpike, Fairfield, CT 06825, United States of America
| | - Ahmed Sabit
- Department of Biostatistics, The Johns Hopkins University, 615 North Wolfe Street, Baltimore, MD 21244, United States of America,Corresponding author
| |
Collapse
|
33
|
Li Z, Du X, Zhao Y, Tu Y, Lev B, Gan L. Lifecycle research of social media rumor refutation effectiveness based on machine learning and visualization technology. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.103077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Liu L, Qiao C, Zha JR, Qin H, Wang XR, Zhang XY, Wang YO, Yang XM, Zhang SL, Qin J. Early prediction of clinical scores for left ventricular reverse remodeling using extreme gradient random forest, boosting, and logistic regression algorithm representations. Front Cardiovasc Med 2022; 9:864312. [PMID: 36061535 PMCID: PMC9428443 DOI: 10.3389/fcvm.2022.864312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 07/13/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectiveAt present, there is no early prediction model of left ventricular reverse remodeling (LVRR) for people who are in cardiac arrest with an ejection fraction (EF) of ≤35% at first diagnosis; thus, the purpose of this article is to provide a supplement to existing research.Materials and methodsA total of 109 patients suffering from heart attack with an EF of ≤35% at first diagnosis were involved in this single-center research study. LVRR was defined as an absolute increase in left ventricular ejection fraction (LVEF) from ≥10% to a final value of >35%, with analysis features including demographic characteristics, diseases, biochemical data, echocardiography, and drug therapy. Extreme gradient boosting (XGBoost), random forest, and logistic regression algorithm models were used to distinguish between LVRR and non-LVRR cases and to obtain the most important features.ResultsThere were 47 cases (42%) of LVRR in patients suffering from heart failure with an EF of ≤35% at first diagnosis after optimal drug therapy. General statistical analysis and machine learning methods were combined to exclude a number of significant feature groups. The median duration of disease in the LVRR group was significantly lower than that in the non-LVRR group (7 vs. 48 months); the mean values of creatine kinase (CK) and MB isoenzyme of creatine kinase (CK-MB) in the LVRR group were lower than those in the non-LVRR group (80.11 vs. 94.23 U/L; 2.61 vs. 2.99 ng/ml; 27.19 vs. 28.54 mm). Moreover, AUC values for our feature combinations ranged from 97 to 94% and to 87% when using the XGBoost, random forest, and logistic regression techniques, respectively. The ablation test revealed that beats per minute (BPM) and disease duration had a greater impact on the model’s ability to accurately forecast outcomes.ConclusionShorter disease duration, slightly lower CK and CK-MB levels, slightly smaller right and left ventricular and left atrial dimensions, and lower mean heart rates were found to be most strongly predictive of LVRR development (BPM).
Collapse
Affiliation(s)
- Lu Liu
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Cen Qiao
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Jun-Ren Zha
- School of Software Engineering, Dalian University, Dalian, China
| | - Huan Qin
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Xiao-Rui Wang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Xin-Yu Zhang
- Medical College, Dalian University, Dalian, China
| | - Yi-Ou Wang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Xiu-Mei Yang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
| | - Shu-Long Zhang
- Heart Centre, Affiliated Zhongshan Hospital of Dalian University, Dalian, China
- *Correspondence: Shu-Long Zhang,
| | - Jing Qin
- School of Software Engineering, Dalian University, Dalian, China
- Jing Qin,
| |
Collapse
|
35
|
Wang S, Jia Z, Cao N. Research on optimization and application of Spark decision tree algorithm under cloud‐edge collaboration. INT J INTELL SYST 2022. [DOI: 10.1002/int.22970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Suzhen Wang
- Hebei University of Economics and Business Shijiazhuang Hebei China
| | - Zhiting Jia
- Hebei University of Economics and Business Shijiazhuang Hebei China
| | - Ning Cao
- School of Artificial Intelligence Wuxi Vocational College of Science and Technology Jiangsu China
| |
Collapse
|
36
|
Using Deep Learning Networks to Identify Cyber Attacks on Intrusion Detection for In-Vehicle Networks. ELECTRONICS 2022. [DOI: 10.3390/electronics11142180] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With rapid advancements in in-vehicle network (IVN) technology, the demand for multiple advanced functions and networking in electric vehicles (EVs) has recently increased. To enable various intelligent functions, the electrical system of existing vehicles incorporates a controller area network (CAN) bus system that enables communication among electrical control units (ECUs). In practice, traditional network-based intrusion detection systems (NIDSs) cannot easily identify threats to the CAN bus system. Therefore, it is necessary to develop a new type of NIDS—namely, on-the-move Intrusion Detection System (OMIDS)—to categorise these threats. Accordingly, this paper proposes an intrusion detection model for IVNs, based on the VGG16 classifier deep learning model, to learn attack behaviour characteristics and classify threats. The experimental dataset was provided by the Hacking and Countermeasure Research Lab (HCRL) to validate classification performance for denial of service (DoS), fuzzy attacks, spoofing gear, and RPM in vehicle communications. The proposed classifier’s performance was compared with that of the XBoost ensemble learning scheme to identify threats from in-vehicle networks. In particular, the test cases can detect anomalies in terms of accuracy, precision, recall, and F1-score to ensure detection accuracy and identify false alarm threats. The experimental results show that the classification accuracy of the dataset for HCRL Car-Hacking by the VGG16 and XBoost classifiers (n = 50) reached 97.8241% and 99.9995% for the 5-subcategory classification results on the testing data, respectively.
Collapse
|
37
|
Ding Y, Liu C, Zhu H, Chen Q, Liu J. Visualizing Deep Networks using Segmentation Recognition and Interpretation Algorithm. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
38
|
Short- and Medium-Term Power Demand Forecasting with Multiple Factors Based on Multi-Model Fusion. MATHEMATICS 2022. [DOI: 10.3390/math10122148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the continuous development of economy and society, power demand forecasting has become an important task of the power industry. Accurate power demand forecasting can promote the operation and development of the power supply industry. However, since power consumption is affected by a number of factors, it is difficult to accurately predict the power demand data. With the accumulation of data in the power industry, machine learning technology has shown great potential in power demand forecasting. In this study, gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) are integrated by stacking to build an XLG-LR fusion model to predict power demand. Firstly, preprocessing was carried out on 13 months of electricity and meteorological data. Next, the hyperparameters of each model were adjusted and optimized. Secondly, based on the optimal hyperparameter configuration, a prediction model was built using the training set (70% of the data). Finally, the test set (30% of the data) was used to evaluate the performance of each model. Mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and goodness-of-fit coefficient (R^2) were utilized to analyze each model at different lengths of time, including their seasonal, weekly, and monthly forecast effect. Furthermore, the proposed fusion model was compared with other neural network models such as the GRU, LSTM and TCN models. The results showed that the XLG-LR model achieved the best prediction results at different time lengths, and at the same time consumed the least time compared to the neural network model. This method can provide a more reliable reference for the operation and dispatch of power enterprises and future power construction and planning.
Collapse
|
39
|
Xu Q, Peng Y, Tan J, Zhao W, Yang M, Tian J. Prediction of Atrial Fibrillation in Hospitalized Elderly Patients With Coronary Heart Disease and Type 2 Diabetes Mellitus Using Machine Learning: A Multicenter Retrospective Study. Front Public Health 2022; 10:842104. [PMID: 35309227 PMCID: PMC8931193 DOI: 10.3389/fpubh.2022.842104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 02/09/2022] [Indexed: 12/01/2022] Open
Abstract
Background The objective of this study was to use machine learning algorithms to construct predictive models for atrial fibrillation (AF) in elderly patients with coronary heart disease (CHD) and type 2 diabetes mellitus (T2DM). Methods The diagnosis and treatment data of elderly patients with CHD and T2DM, who were treated in four tertiary hospitals in Chongqing, China from 2015 to 2021, were collected. Five machine learning algorithms: logistic regression, logistic regression+least absolute shrinkage and selection operator, classified regression tree (CART), random forest (RF) and extreme gradient lifting (XGBoost) were used to construct the prediction models. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy were used as the comparison measures between different models. Results A total of 3,858 elderly patients with CHD and T2DM were included. In the internal validation cohort, XGBoost had the highest AUC (0.743) and sensitivity (0.833), and RF had the highest specificity (0.753) and accuracy (0.735). In the external verification, RF had the highest AUC (0.726) and sensitivity (0.686), and CART had the highest specificity (0.925) and accuracy (0.841). Total bilirubin, triglycerides and uric acid were the three most important predictors of AF. Conclusion The risk prediction models of AF in elderly patients with CHD and T2DM based on machine learning algorithms had high diagnostic value. The prediction models constructed by RF and XGBoost were more effective. The results of this study can provide reference for the clinical prevention and treatment of AF.
Collapse
Affiliation(s)
- Qian Xu
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
- Collection Development Department of Library, Chongqing Medical University, Chongqing, China
| | - Yan Peng
- Department of Cardiology, University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Juntao Tan
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Wenlong Zhao
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Meijie Yang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Jie Tian
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
- Department of Cardiology, Ministry of Education Key Laboratory of Child Development and Disorders, National Clinical Research Center for Child Health and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, Children's Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Key Laboratory of Pediatrics, Chongqing, China
- *Correspondence: Jie Tian
| |
Collapse
|
40
|
Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach. JOURNAL OF THEORETICAL AND APPLIED ELECTRONIC COMMERCE RESEARCH 2022. [DOI: 10.3390/jtaer17010009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
This study is a comprehensive and modern approach to predict customer churn in the example of an e-commerce retail store operating in Brazil. Our approach consists of three stages in which we combine and use three different datasets: numerical data on orders, textual after-purchase reviews and socio-geo-demographic data from the census. At the pre-processing stage, we find topics from text reviews using Latent Dirichlet Allocation, Dirichlet Multinomial Mixture and Gibbs sampling. In the spatial analysis, we apply DBSCAN to get rural/urban locations and analyse neighbourhoods of customers located with zip codes. At the modelling stage, we apply machine learning extreme gradient boosting and logistic regression. The quality of models is verified with area-under-curve and lift metrics. Explainable artificial intelligence represented with a permutation-based variable importance and a partial dependence profile help to discover the determinants of churn. We show that customers’ propensity to churn depends on: (i) payment value for the first order, number of items bought and shipping cost; (ii) categories of the products bought; (iii) demographic environment of the customer; and (iv) customer location. At the same time, customers’ propensity to churn is not influenced by: (i) population density in the customer’s area and division into rural and urban areas; (ii) quantitative review of the first purchase; and (iii) qualitative review summarised as a topic.
Collapse
|
41
|
Understanding Query Combination Behavior in Exploratory Searches. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12020706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
In exploratory search, users sometimes combine two or more issued queries into new queries. We present such a kind of search behavior as query combination behavior. We find that the queries after combination usually can better meet users’ information needs. We also observe that users combine queries for different motivations, which leads to different types of query combination behaviors. Previous work on understanding user exploratory search behaviors has focused on how people reformulate queries, but not on how and why they combine queries. Being able to answer these questions is important for exploring how users search and learn during information retrieval processes and further developing support to assist searchers. In this paper, we first describe a two-layer hierarchical structure for understanding the space of query combination behavior types. We manually classify query combination behavior sessions from AOL and Sogou search engines and explain the relationship from combining queries to success. We then characterize some key aspects of this behavior and propose a classifier that can automatically classify types of query combination behavior using behavioral features. Finally, we summarize our findings and show how search engines can better assist searchers.
Collapse
|
42
|
Rostami M, Oussalah M. A novel explainable COVID-19 diagnosis method by integration of feature selection with random forest. INFORMATICS IN MEDICINE UNLOCKED 2022; 30:100941. [PMID: 35399333 PMCID: PMC8985417 DOI: 10.1016/j.imu.2022.100941] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/01/2022] [Accepted: 04/01/2022] [Indexed: 12/12/2022] Open
Abstract
Several Artificial Intelligence-based models have been developed for COVID-19 disease diagnosis. In spite of the promise of artificial intelligence, there are very few models which bridge the gap between traditional human-centered diagnosis and the potential future of machine-centered disease diagnosis. Under the concept of human-computer interaction design, this study proposes a new explainable artificial intelligence method that exploits graph analysis for feature visualization and optimization for the purpose of COVID-19 diagnosis from blood test samples. In this developed model, an explainable decision forest classifier is employed to COVID-19 classification based on routinely available patient blood test data. The approach enables the clinician to use the decision tree and feature visualization to guide the explainability and interpretability of the prediction model. By utilizing this novel feature selection phase, the proposed diagnosis model will not only improve diagnosis accuracy but decrease the execution time as well.
Collapse
Affiliation(s)
- Mehrdad Rostami
- Centre for Machine Vision and Signal Processing, Faculty of Information Technology, University of Oulu, Oulu, Finland
| | - Mourad Oussalah
- Centre for Machine Vision and Signal Processing, Faculty of Information Technology, University of Oulu, Oulu, Finland
- Research Unit of Medical Imaging, Physics, and Technology, Faculty of Medicine, University of Oulu, Finland
| |
Collapse
|
43
|
Fu X, Wang Y, Cates RS, Li N, Liu J, Ke D, Liu J, Liu H, Yan S. Implementation of five machine learning methods to predict the 52-week blood glucose level in patients with type 2 diabetes. Front Endocrinol (Lausanne) 2022; 13:1061507. [PMID: 36743935 PMCID: PMC9895792 DOI: 10.3389/fendo.2022.1061507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 12/30/2022] [Indexed: 01/22/2023] Open
Abstract
OBJECTIVE For the patients who are suffering from type 2 diabetes, blood glucose level could be affected by multiple factors. An accurate estimation of the trajectory of blood glucose is crucial in clinical decision making. Frequent glucose measurement serves as a good source of data to train machine learning models for prediction purposes. This study aimed at using machine learning methods to predict blood glucose for type 2 diabetic patients. We investigated various parameters influencing blood glucose, as well as determined the most effective machine learning algorithm in predicting blood glucose. PATIENTS AND METHODS 273 patients were recruited in this research. Several parameters such as age, diet, family history, BMI, alcohol intake, smoking status et al were analyzed. Patients who had glycosylated hemoglobin less than 6.5% after 52 weeks were considered as having achieved glycemic control and the rest as not achieving it. Five machine learning methods (KNN algorithm, logistic regression algorithm, random forest algorithm, support vector machine, and XGBoost algorithm) were compared to evaluate their performances in prediction accuracy. R 3.6.3 and Python 3.12 were used in data analysis. RESULTS The statistical variables for which p< 0.05 was obtained were BMI, pulse, Na, Cl, AKP. Compared with the other four algorithms, XGBoost algorithm has the highest accuracy (Accuracy=99.54% in training set and 78.18% in testing set) and AUC values (1.0 in training set and 0.68 in testing set), thus it is recommended to be used for prediction in clinical practice. CONCLUSION When it comes to future blood glucose level prediction using machine learning methods, XGBoost algorithm scores the highest in effectiveness. This algorithm could be applied to assist clinical decision making, as well as guide the lifestyle of diabetic patients, in pursuit of minimizing risks of hyperglycemic or hypoglycemic events.
Collapse
Affiliation(s)
- Xiaomin Fu
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Yuhan Wang
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ryan S. Cates
- Department of Emergency Medicine Stanford Healthcare TriValley, Stanford University School of Medicine, Stanford, Pleasanton, CA, United States
| | - Nan Li
- Department of Endocrinology, The Second Medical Center & National Clinical Research Center for Geriatric Diseases, Chinese PLA General Hospital, Beijing, China
| | - Jing Liu
- Clinics of Cadre, Department of Outpatient, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Dianshan Ke
- Department of Orthopedics, Fujian Provincial Hospital, Fuzhou, China
| | - Jinghua Liu
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Hongzhou Liu
- Department of Endocrinology, The First Medical Center, Chinese PLA General Hospital, Beijing, China
- *Correspondence: Hongzhou Liu, ; Shuangtong Yan,
| | - Shuangtong Yan
- Department of Endocrinology, The Second Medical Center & National Clinical Research Center for Geriatric Diseases, Chinese PLA General Hospital, Beijing, China
- *Correspondence: Hongzhou Liu, ; Shuangtong Yan,
| |
Collapse
|
44
|
Cui R, Hua W, Qu K, Yang H, Tong Y, Li Q, Wang H, Ma Y, Liu S, Lin T, Zhang J, Sun J, Liu C. An Interpretable Early Dynamic Sequential Predictor for Sepsis-Induced Coagulopathy Progression in the Real-World Using Machine Learning. Front Med (Lausanne) 2021; 8:775047. [PMID: 34926518 PMCID: PMC8678506 DOI: 10.3389/fmed.2021.775047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 11/08/2021] [Indexed: 11/17/2022] Open
Abstract
Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.
Collapse
Affiliation(s)
- Ruixia Cui
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Wenbo Hua
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Kai Qu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Heran Yang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Yingmu Tong
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Qinglin Li
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Hai Wang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Yanfen Ma
- Department of Clinical Laboratory, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Sinan Liu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Ting Lin
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Jingyao Zhang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Biobank, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Jian Sun
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Chang Liu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Department of SICU, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.,Biobank, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|