1
|
Chowdhury MNH, Bin Ibne Reaz M, Ali SHM, Crespo ML, Ahmad S, Salim GM, Haque F, Ordóñez LGG, Islam MJ, Mahdee TM, Zaman KS, Hemel MSK, Bhuiyan MAS. Deep learning for early detection of chronic kidney disease stages in diabetes patients: A TabNet approach. Artif Intell Med 2025; 166:103153. [PMID: 40347843 DOI: 10.1016/j.artmed.2025.103153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 04/03/2025] [Accepted: 05/01/2025] [Indexed: 05/14/2025]
Abstract
Chronic kidney disease (CKD) poses a significant risk for diabetes patients, often leading to severe complications. Early and accurate CKD stage detection is crucial for timely intervention. However, it remains challenging due to its asymptomatic progression, the oversight of routine CKD tests during diabetes checkups, and limited access to nephrologists. This study aimed to address these challenges by developing a multiclass CKD stage prediction model for diabetes patients using longitudinal data from the Chronic Renal Insufficiency Cohort (CRIC) study. A novel iterative backward feature selection strategy was employed to determine key predictors of the CKD stage. TabNet, an attention-based deep learning architecture, was used to build classification models in complete and simplified categories. The complete model used 31 features, including complex kidney biomarkers, while the simplified model used 15 features readily available from routine checkups. The performance of TabNet was compared against traditional tree-based ensemble methods (XGBoost, random forest, AdaBoost) and a multi-layer perceptron. Model-specific and model-agnostic explainable AI (XAI) techniques were applied to interpret model decisions, enhancing the transparency and clinical applicability of the proposed approach. The TabNet models demonstrated superior performance, achieving 94.06 % and 92.71 % accuracy in cross-validation for the complete and simplified models, respectively, and 91.00 % and 88.00 % accuracy on test sets. XAI analysis identified serum creatinine, cystatin C, sex, and age as the most influential factors in CKD stage classification. The proposed TabNet models offer a robust approach for early CKD severity detection in diabetes patients, potentially improving clinical decision-making and patient outcomes.
Collapse
Affiliation(s)
- Md Nakib Hayat Chowdhury
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia; Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST), Saidpur 5310, Nilphamari, Bangladesh
| | - Mamun Bin Ibne Reaz
- Faculty of Electronic Engineering and Technology, Universiti Malaysia Perlis (UniMAP), 02600 Arau, Perlis, Malaysia; Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia.
| | - Sawal Hamid Md Ali
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia; Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
| | - María Liz Crespo
- Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| | - Shamim Ahmad
- Department of Computer Science and Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh
| | - Ghassan Maan Salim
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia; Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
| | - Fahmida Haque
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, Bethesda, MD, USA
| | | | - Md Johirul Islam
- Department of Physics, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Taher Muhammad Mahdee
- Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST), Saidpur 5310, Nilphamari, Bangladesh
| | - Kh Shahriya Zaman
- Faculty of Electronic Engineering and Technology, Universiti Malaysia Perlis (UniMAP), 02600 Arau, Perlis, Malaysia
| | - Md Shahriar Khan Hemel
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
| | - Mohammad Arif Sobhan Bhuiyan
- Department of Electrical and Electronics Engineering, Xiamen University Malaysia, Bandar Sunsuria, Sepang 43900, Selangor, Malaysia
| |
Collapse
|
2
|
Long Z, Tan S, Sun B, Qin Y, Wang S, Han Z, Han T, Lin F, Lei M. PREDICTING IN-HOSPITAL MORTALITY IN CRITICAL ORTHOPEDIC TRAUMA PATIENTS WITH SEPSIS USING MACHINE LEARNING MODELS. Shock 2025; 63:815-825. [PMID: 39637363 DOI: 10.1097/shk.0000000000002516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
ABSTRACT Purpose: This study aims to establish and validate machine learning-based models to predict death in hospital among critical orthopedic trauma patients with sepsis or respiratory failure. Methods: This study collected 523 patients from the Medical Information Mart for Intensive Care database. All patients were randomly classified into a training cohort and a validation cohort. Six algorithms, including logistic regression (LR), extreme gradient boosting machine (eXGBM), support vector machine (SVM), random forest (RF), neural network (NN), and decision tree (DT), were used to develop and optimize models in the training cohort, and internal validation of these models were conducted in the validation cohort. Based on a comprehensive scoring system, which incorporated 10 evaluation metrics, the optimal model was obtained with the highest scores. An artificial intelligence (AI) application was deployed based on the optimal model in the study. Results: The in-hospital mortality was 19.69%. Among all developed models, the eXGBM had the highest area under the curve (AUC) value (0.951, 95% CI: 0.934-0.967), and it also showed the highest accuracy (0.902), precise (0.893), recall (0.915), and F1 score (0.904). Based on the scoring system, the eXGBM had the highest score of 53, followed by the RF model (43) and the NN model (39). The scores for the LR, SVM, and DT were 22, 36, and 17, respectively. The decision curve analysis confirmed that both the eXGBM and RF models provided substantial clinical net benefits. However, the eXGBM model consistently outperformed the RF model across multiple evaluation metrics, establishing itself as the superior option for predictive modeling in this scenario, with the RF model as a strong secondary choice. The Shapley Additive Explanation analysis revealed that Simplified Acute Physiology Score II, age, respiratory rate, Oxford Acute Severity of Illness Score, and temperature were the most important five features contributing to the outcome. Conclusions: This study develops an artificial intelligence application to predict in-hospital mortality among critical orthopedic trauma patients with sepsis or respiratory failure.
Collapse
Affiliation(s)
- Ze Long
- Department of Orthopedics, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Shengzhi Tan
- Secondary Department of Spinal Surgery, The 9th Medical Centre of Chinese PLA General Hospital, Beijing, China
| | - Baisheng Sun
- Department of Critical Care Medicine, The First Medical Centre, PLA General Hospital, Beijing, China
| | - Yong Qin
- Department of Joint and Sports Medicine Surgery, The Second affiliated Hospital of Harbin Medical University, Harbin, China
| | - Shengjie Wang
- Department of Orthopaedic Surgery, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University, Shanghai, China
| | - Zhencan Han
- Department of Orthopedics, Peking University Third Hospital, Beijing, China
| | - Tao Han
- Department of Orthopedic Surgery, Hainan Hospital of PLA General Hospital, Sanya, China
| | - Feng Lin
- Department of Orthopedic Surgery, Hainan Hospital of PLA General Hospital, Sanya, China
| | | |
Collapse
|
3
|
Du Z, Liu X, Li J, Min H, Ma Y, Hua W, Zhang L, Zhang Y, Shang M, Chen H, Yin H, Tian L. Development and external validation of a machine learning model to predict diabetic nephropathy in T1DM patients in the real-world. Acta Diabetol 2024:10.1007/s00592-024-02404-z. [PMID: 39527297 DOI: 10.1007/s00592-024-02404-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024]
Abstract
AIMS Studies on machine learning (ML) for the prediction of diabetic nephropathy (DN) in type 1 diabetes mellitus (T1DM) patients are rare. This study focused on the development and external validation of an explainable ML model to predict the risk of DN among individuals with T1DM. METHODS This was a retrospective, multicenter study conducted across 19 hospitals in Gansu Province, China (No: 2022-473). In total, 1368 patients were eligible for analysis among 1633 collected T1DM patients from January 2016 to December 2023. Recursive feature elimination using random forest and fivefold cross-validation was conducted to identify key features. Among the 12 initial ML algorithms, the optimal ML model was developed and validated externally in a distinct population, and its predictive outcomes were explained via the SHapley additive exPlanations method, which offered personalized decision insights. RESULTS Among the 1368 T1DM patients, 324 had DN. The extreme gradient boosting (XGBoost) model, which achieved optimal performance with an AUC of 83% (95% confidence interval [CI]: 76‒89), was selected to predict the risk of DN among T1DM patients. The DN predictive model included variables such as T1DM duration, postprandial glucose (PPG), systolic blood pressure (SBP), glycated hemoglobin (HbA1c), serum creatinine (Scr) and low-density lipoprotein cholesterol (LDL-C). External validation confirmed the reliability of the model, with an AUC of 76% (95% CI: 70‒82). CONCLUSIONS The ML prediction tool has potential for advancing early and precise identification of the risk of DN among T1DM patients. Although successful external validation indicated that the developed model can provide a promising strategy for clinical adoption and help improve patient outcomes through timely and accurate risk assessment, additional prospective data and further validation in diverse populations are necessary.
Collapse
Affiliation(s)
- Zouxi Du
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, Gansu, China
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, Gansu, China
- Clinical Research Center for Metabolic Diseases, Lanzhou, Gansu, China
| | - Xiaoning Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Lanzhou University, Lanzhou, Gansu, China
| | - Jiayu Li
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, Gansu, China
| | - Hang Min
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, Gansu, China
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, Gansu, China
- Clinical Research Center for Metabolic Diseases, Lanzhou, Gansu, China
| | - Yuhu Ma
- Department of Anesthesiology, The First Hospital of Lanzhou University, Lanzhou, Gansu, China
| | - Wenting Hua
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, Gansu, China
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, Gansu, China
- Clinical Research Center for Metabolic Diseases, Lanzhou, Gansu, China
| | - Leyuan Zhang
- The First Clinical Medical College, Gansu University of Traditional Chinese Medicine, Lanzhou, Gansu, China
| | - Yue Zhang
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, Gansu, China
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, Gansu, China
- Clinical Research Center for Metabolic Diseases, Lanzhou, Gansu, China
| | - Mengmeng Shang
- The First Clinical Medical College, Gansu University of Traditional Chinese Medicine, Lanzhou, Gansu, China
| | - Hui Chen
- Department of Endocrinology, The Second Hospital of Lanzhou University, Lanzhou, Gansu, China
| | - Hong Yin
- First People's Hospital of Lanzhou, Lanzhou, Gansu, China
| | - Limin Tian
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, Gansu, China.
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, Gansu, China.
- Clinical Research Center for Metabolic Diseases, Lanzhou, Gansu, China.
| |
Collapse
|
4
|
Ferdaus J, Rochy EA, Biswas U, Tiang JJ, Nahid AA. Analyzing Diabetes Detection and Classification: A Bibliometric Review (2000-2023). SENSORS (BASEL, SWITZERLAND) 2024; 24:5346. [PMID: 39205040 PMCID: PMC11359783 DOI: 10.3390/s24165346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 08/11/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024]
Abstract
Bibliometric analysis is a rigorous method to analyze significant quantities of bibliometric data to assess their impact on a particular field. This study used bibliometric analysis to investigate the academic research on diabetes detection and classification from 2000 to 2023. The PRISMA 2020 framework was followed to identify, filter, and select relevant papers. This study used the Web of Science database to determine relevant publications concerning diabetes detection and classification using the keywords "diabetes detection", "diabetes classification", and "diabetes detection and classification". A total of 863 publications were selected for analysis. The research applied two bibliometric techniques: performance analysis and science mapping. Various bibliometric parameters, including publication analysis, trend analysis, citation analysis, and networking analysis, were used to assess the performance of these articles. The analysis findings showed that India, China, and the United States are the top three countries with the highest number of publications and citations on diabetes detection and classification. The most frequently used keywords are machine learning, diabetic retinopathy, and deep learning. Additionally, the study identified "classification", "diagnosis", and "validation" as the prevailing topics for diabetes identification. This research contributes valuable insights into the academic landscape of diabetes detection and classification.
Collapse
Affiliation(s)
- Jannatul Ferdaus
- Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh; (J.F.), (E.A.R.)
| | - Esmay Azam Rochy
- Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh; (J.F.), (E.A.R.)
| | - Uzzal Biswas
- Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh; (J.F.), (E.A.R.)
| | - Jun Jiat Tiang
- Centre for Wireless Technology (CWT), Faculty of Engineering, Multimedia University, Cyberjaya 63100, Malaysia
| | - Abdullah-Al Nahid
- Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh; (J.F.), (E.A.R.)
| |
Collapse
|
5
|
Liu X, Chang Y, Xu C, Li Y, Wang Y, Sun Y, Duan M, Li W, Cui J. Association of volatile organic compound levels with chronic obstructive pulmonary diseases in NHANES 2013-2016. Sci Rep 2024; 14:16085. [PMID: 38992113 PMCID: PMC11239907 DOI: 10.1038/s41598-024-67210-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 07/09/2024] [Indexed: 07/13/2024] Open
Abstract
Volatile organic compounds (VOCs) represent a significant component of air pollution. However, studies evaluating the impact of VOC exposure on chronic obstructive pulmonary disease (COPD) have predominantly focused on single pollutant models. This study aims to comprehensively assess the relationship between multiple VOC exposures and COPD. A large cross-sectional study was conducted on 4983 participants from the National Health and Nutrition Examination Survey. Four models, including weighted logistic regression, restricted cubic splines (RCS), weighted quantile sum regression (WQS), and the dual-pollution model, were used to explore the association between blood VOC levels and the prevalence of COPD in the U.S. general population. Additionally, six machine learning algorithms were employed to develop a predictive model for COPD risk, with the model's predictive capacity assessed using the area under the curve (AUC) indices. Elevated blood concentrations of benzene, toluene, ortho-xylene, and para-xylene were significantly associated with the incidence of COPD. RCS analysis further revealed a non-linear and non-monotonic relationship between blood levels of toluene and m-p-xylene with COPD prevalence. WQS regression indicated that different VOCs had varying effects on COPD, with benzene and ortho-xylene having the greatest weights. Among the six models, the Extreme Gradient Boosting (XGBoost) model demonstrated the strongest predictive power, with an AUC value of 0.781. Increased blood concentrations of benzene and toluene are significantly correlated with a higher prevalence of COPD in the U.S. population, demonstrating a non-linear relationship. Exposure to environmental VOCs may represent a new risk factor in the etiology of COPD.
Collapse
Affiliation(s)
- Xiangliang Liu
- The First Hospital of Jilin University, No.1 Xinmin Street, Changchun, 130012, China
| | - Yu Chang
- The First Hospital of Jilin University, No.1 Xinmin Street, Changchun, 130012, China
| | - Chengyao Xu
- Jilin Provincial Institute for Drug Control, Changchun, 130022, China
| | - Yuguang Li
- The First Hospital of Jilin University, No.1 Xinmin Street, Changchun, 130012, China
| | - Yao Wang
- The First Hospital of Jilin University, No.1 Xinmin Street, Changchun, 130012, China
| | - Yao Sun
- Jilin Provincial Institute for Drug Control, Changchun, 130022, China
| | - Meilin Duan
- Jilin Provincial Institute for Drug Control, Changchun, 130022, China
| | - Wei Li
- The First Hospital of Jilin University, No.1 Xinmin Street, Changchun, 130012, China.
| | - Jiuwei Cui
- The First Hospital of Jilin University, No.1 Xinmin Street, Changchun, 130012, China.
| |
Collapse
|
6
|
Mesquita F, Bernardino J, Henriques J, Raposo JF, Ribeiro RT, Paredes S. Machine learning techniques to predict the risk of developing diabetic nephropathy: a literature review. J Diabetes Metab Disord 2024; 23:825-839. [PMID: 38932857 PMCID: PMC11196462 DOI: 10.1007/s40200-023-01357-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/20/2023] [Indexed: 06/28/2024]
Abstract
Purpose Diabetes is a major public health challenge with widespread prevalence, often leading to complications such as Diabetic Nephropathy (DN)-a chronic condition that progressively impairs kidney function. In this context, it is important to evaluate if Machine learning models can exploit the inherent temporal factor in clinical data to predict the risk of developing DN faster and more accurately than current clinical models. Methods Three different databases were used for this literature review: Scopus, Web of Science, and PubMed. Only articles written in English and published between January 2015 and December 2022 were included. Results We included 11 studies, from which we discuss a number of algorithms capable of extracting knowledge from clinical data, incorporating dynamic aspects in patient assessment, and exploring their evolution over time. We also present a comparison of the different approaches, their performance, advantages, disadvantages, interpretation, and the value that the time factor can bring to a more successful prediction of diabetic nephropathy. Conclusion Our analysis showed that some studies ignored the temporal factor, while others partially exploited it. Greater use of the temporal aspect inherent in Electronic Health Records (EHR) data, together with the integration of omics data, could lead to the development of more reliable and powerful predictive models.
Collapse
Affiliation(s)
- F. Mesquita
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
| | - J. Bernardino
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - J. Henriques
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| | - JF. Raposo
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - RT. Ribeiro
- Education and Research Center, APDP Diabetes Portugal, Rua Do Salitre 118-120, 1250-203 Lisbon, Portugal
| | - S. Paredes
- Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes - Quinta da Nora, 3030-199 Coimbra, Portugal
- Center for Informatics and Systems of University of Coimbra, University of Coimbra, Pólo II, 3030-290 Coimbra, Portugal
| |
Collapse
|
7
|
Ghosh SK, Khandoker AH. Investigation on explainable machine learning models to predict chronic kidney diseases. Sci Rep 2024; 14:3687. [PMID: 38355876 PMCID: PMC10866953 DOI: 10.1038/s41598-024-54375-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 02/12/2024] [Indexed: 02/16/2024] Open
Abstract
Chronic kidney disease (CKD) is a major worldwide health problem, affecting a large proportion of the world's population and leading to higher morbidity and death rates. The early stages of CKD sometimes present without visible symptoms, causing patients to be unaware. Early detection and treatments are critical in reducing complications and improving the overall quality of life for people afflicted. In this work, we investigate the use of an explainable artificial intelligence (XAI)-based strategy, leveraging clinical characteristics, to predict CKD. This study collected clinical data from 491 patients, comprising 56 with CKD and 435 without CKD, encompassing clinical, laboratory, and demographic variables. To develop the predictive model, five machine learning (ML) methods, namely logistic regression (LR), random forest (RF), decision tree (DT), Naïve Bayes (NB), and extreme gradient boosting (XGBoost), were employed. The optimal model was selected based on accuracy and area under the curve (AUC). Additionally, the SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) algorithms were utilized to demonstrate the influence of the features on the optimal model. Among the five models developed, the XGBoost model achieved the best performance with an AUC of 0.9689 and an accuracy of 93.29%. The analysis of feature importance revealed that creatinine, glycosylated hemoglobin type A1C (HgbA1C), and age were the three most influential features in the XGBoost model. The SHAP force analysis further illustrated the model's visualization of individualized CKD predictions. For further insights into individual predictions, we also utilized the LIME algorithm. This study presents an interpretable ML-based approach for the early prediction of CKD. The SHAP and LIME methods enhance the interpretability of ML models and help clinicians better understand the rationale behind the predicted outcomes more effectively.
Collapse
Affiliation(s)
- Samit Kumar Ghosh
- Department of Biomedical Engineering & Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Ahsan H Khandoker
- Department of Biomedical Engineering & Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
8
|
Qiu B, Shen Z, Wu S, Qin X, Yang D, Wang Q. A machine learning-based model for predicting distant metastasis in patients with rectal cancer. Front Oncol 2023; 13:1235121. [PMID: 37655097 PMCID: PMC10465697 DOI: 10.3389/fonc.2023.1235121] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 07/25/2023] [Indexed: 09/02/2023] Open
Abstract
Background Distant metastasis from rectal cancer usually results in poorer survival and quality of life, so early identification of patients at high risk of distant metastasis from rectal cancer is essential. Method The study used eight machine-learning algorithms to construct a machine-learning model for the risk of distant metastasis from rectal cancer. We developed the models using 23867 patients with rectal cancer from the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2017. Meanwhile, 1178 rectal cancer patients from Chinese hospitals were selected to validate the model performance and extrapolation. We tuned the hyperparameters by random search and tenfold cross-validation to construct the machine-learning models. We evaluated the models using the area under the receiver operating characteristic curves (AUC), the area under the precision-recall curve (AUPRC), decision curve analysis, calibration curves, and the precision and accuracy of the internal test set and external validation cohorts. In addition, Shapley's Additive explanations (SHAP) were used to interpret the machine-learning models. Finally, the best model was applied to develop a web calculator for predicting the risk of distant metastasis in rectal cancer. Result The study included 23,867 rectal cancer patients and 2,840 patients with distant metastasis. Multiple logistic regression analysis showed that age, differentiation grade, T-stage, N-stage, preoperative carcinoembryonic antigen (CEA), tumor deposits, perineural invasion, tumor size, radiation, and chemotherapy were-independent risk factors for distant metastasis in rectal cancer. The mean AUC value of the extreme gradient boosting (XGB) model in ten-fold cross-validation in the training set was 0.859. The XGB model performed best in the internal test set and external validation set. The XGB model in the internal test set had an AUC was 0.855, AUPRC was 0.510, accuracy was 0.900, and precision was 0.880. The metric AUC for the external validation set of the XGB model was 0.814, AUPRC was 0.609, accuracy was 0.800, and precision was 0.810. Finally, we constructed a web calculator using the XGB model for distant metastasis of rectal cancer. Conclusion The study developed and validated an XGB model based on clinicopathological information for predicting the risk of distant metastasis in patients with rectal cancer, which may help physicians make clinical decisions. rectal cancer, distant metastasis, web calculator, machine learning algorithm, external validation.
Collapse
Affiliation(s)
- Binxu Qiu
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| | - Zixiong Shen
- Department of Thoracic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Song Wu
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| | - Xinxin Qin
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| | - Dongliang Yang
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| | - Quan Wang
- Department of Gastric and Colorectal Surgery, General Surgery Center, The First Hospital of Jilin University, Changchun, China
| |
Collapse
|
9
|
Identifying Complex Emotions in Alexithymia Affected Adolescents Using Machine Learning Techniques. Diagnostics (Basel) 2022; 12:diagnostics12123188. [PMID: 36553197 PMCID: PMC9777297 DOI: 10.3390/diagnostics12123188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 10/30/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Many scientific researchers' study focuses on enhancing automated systems to identify emotions and thus relies on brain signals. This study focuses on how brain wave signals can be used to classify many emotional states of humans. Electroencephalography (EEG)-based affective computing predominantly focuses on emotion classification based on facial expression, speech recognition, and text-based recognition through multimodality stimuli. The proposed work aims to implement a methodology to identify and codify discrete complex emotions such as pleasure and grief in a rare psychological disorder known as alexithymia. This type of disorder is highly elicited in unstable, fragile countries such as South Sudan, Lebanon, and Mauritius. These countries are continuously affected by civil wars and disaster and politically unstable, leading to a very poor economy and education system. This study focuses on an adolescent age group dataset by recording physiological data when emotion is exhibited in a multimodal virtual environment. We decocted time frequency analysis and amplitude time series correlates including frontal alpha symmetry using a complex Morlet wavelet. For data visualization, we used the UMAP technique to obtain a clear district view of emotions. We performed 5-fold cross validation along with 1 s window subjective classification on the dataset. We opted for traditional machine learning techniques to identify complex emotion labeling.
Collapse
|
10
|
A Hybrid Risk Factor Evaluation Scheme for Metabolic Syndrome and Stage 3 Chronic Kidney Disease Based on Multiple Machine Learning Techniques. Healthcare (Basel) 2022; 10:healthcare10122496. [PMID: 36554020 PMCID: PMC9778302 DOI: 10.3390/healthcare10122496] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/28/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022] Open
Abstract
With the rapid development of medicine and technology, machine learning (ML) techniques are extensively applied to medical informatics and the suboptimal health field to identify critical predictor variables and risk factors. Metabolic syndrome (MetS) and chronic kidney disease (CKD) are important risk factors for many comorbidities and complications. Existing studies that utilize different statistical or ML algorithms to perform CKD data analysis mostly analyze the early-stage subjects directly, but few studies have discussed the predictive models and important risk factors for the stage-III CKD high-risk health screening population. The middle stages 3a and 3b of CKD indicate moderate renal failure. This study aims to construct an effective hybrid important risk factor evaluation scheme for subjects with MetS and CKD stages III based on ML predictive models. The six well-known ML techniques, namely random forest (RF), logistic regression (LGR), multivariate adaptive regression splines (MARS), extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost), and a light gradient boosting machine (LightGBM), were used in the proposed scheme. The data were sourced from the Taiwan health examination indicators and the questionnaire responses of 71,108 members between 2005 and 2017. In total, 375 stage 3a CKD and 50 CKD stage 3b CKD patients were enrolled, and 33 different variables were used to evaluate potential risk factors. Based on the results, the top five important variables, namely BUN, SBP, Right Intraocular Pressure (R-IOP), RBCs, and T-Cho/HDL-C (C/H), were identified as significant variables for evaluating the subjects with MetS and CKD stage 3a or 3b.
Collapse
|
11
|
Performance Analysis of Conventional Machine Learning Algorithms for Diabetic Sensorimotor Polyneuropathy Severity Classification Using Nerve Conduction Studies. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9690940. [PMID: 35510061 PMCID: PMC9061035 DOI: 10.1155/2022/9690940] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/14/2022] [Accepted: 03/18/2022] [Indexed: 02/06/2023]
Abstract
Background Diabetic sensorimotor polyneuropathy (DSPN) is a major form of complication that arises in long-term diabetic patients. Even though the application of machine learning (ML) in disease diagnosis is very common and well-established in the field of research, its application in DSPN diagnosis using nerve conduction studies (NCS), is very limited in the existing literature. Method In this study, the NCS data were collected from the Diabetes Control and Complications Trial (DCCT) and its follow-up Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials. The NCS variables are median motor velocity (m/sec), median motor amplitude (mV), median motor F-wave (msec), median sensory velocity (m/sec), median sensory amplitude (μV), Peroneal Motor Velocity (m/sec), peroneal motor amplitude (mv), peroneal motor F-wave (msec), sural sensory velocity (m/sec), and sural sensory amplitude (μV). Three different feature ranking techniques were used to analyze the performance of eight different conventional classifiers. Results The ensemble classifier outperformed other classifiers for the NCS data ranked when all the NCS features were used and provided an accuracy of 93.40%, sensitivity of 91.77%, and specificity of 98.44%. The random forest model exhibited the second-best performance using all the ten features with an accuracy of 93.26%, sensitivity of 91.95%, and specificity of 98.95%. Both ensemble and random forest showed the kappa value 0.82, which indicates that the models are in good agreement with the data and the variables used and are accurate to identify DSPN using these ML models. Conclusion This study suggests that the ensemble classifier using all the ten NCS variables can predict the DSPN severity which can enhance the management of DSPN patients.
Collapse
|
12
|
Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:7378307. [PMID: 35399848 PMCID: PMC8993553 DOI: 10.1155/2022/7378307] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/10/2022] [Accepted: 03/21/2022] [Indexed: 12/17/2022]
Abstract
Background Diabetic kidney disease (DKD), one of the complications of diabetes in patients, leads to progressive loss of kidney function. Timely intervention is known to improve outcomes. Therefore, screening patients to identify high-risk populations is important. Machine learning classification techniques can be applied to patient datasets to identify high-risk patients by building a predictive model. Objective This study aims to identify a suitable classification technique for predicting DKD by applying different classification techniques to a DKD dataset and comparing their performance using WEKA machine learning software. Methods The performance of nine different classification techniques was analyzed on a DKD dataset with 410 instances and 18 attributes. Data preprocessing was carried out using the PartitionMembershipFilter. A 10-fold cross validation was performed on the dataset. The performance was assessed on the basis of the execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error, and true values of the confusion matrix. Results With an accuracy of 93.6585% and a higher K value (0.8731), IBK and random tree classification techniques were found to be the best performing techniques. Moreover, they also exhibited the lowest root mean squared error rate (0.2496). There were 15 false-positive instances and 11 false-negative instances with these prediction models. Conclusions This study identified IBK and random tree classification techniques as the best performing classifiers and accurate prediction methods for DKD.
Collapse
|