1
|
Wang H, Jia Q, Wang Y, Xue W, Jiang Q, Ning F, Wang J, Zhu Z, Tian L. Stacking learning based on micro-CT radiomics for outcome prediction in the early-stage of silica-induced pulmonary fibrosis model. Heliyon 2024; 10:e30651. [PMID: 38765063 PMCID: PMC11098827 DOI: 10.1016/j.heliyon.2024.e30651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/28/2024] [Accepted: 05/01/2024] [Indexed: 05/21/2024] Open
Abstract
Silicosis is a progressive pulmonary fibrosis disease caused by long-term inhalation of silica. The early diagnosis and timely implementation of intervention measures are crucial in preventing silicosis deterioration further. However, the lack of screening and diagnostic measures for early-stage silicosis remains a significant challenge. In this study, silicosis models of varying severity were established through a single exposure to silica with different doses (2.5mg/mice or 5mg/mice) and durations (4 weeks or 12 weeks). The diagnostic performance of computed tomography (CT) quantitative analysis was assessed using lung density biomarkers and the lung density distribution histogram, with a particular focus on non-aerated lung volume. Subsequently, we developed and evaluated a stacking learning model for early diagnosis of silicosis after extracting and selecting features from CT images. The CT quantitative analysis reveals that while the lung densitometric biomarkers and lung density distribution histogram, as traditional indicators, effectively differentiate severe fibrosis models, they are unable to distinguish early-stage silicosis. Furthermore, these findings remained consistent even when employing non-aerated areas, which is a more sensitive indicator. By establishing a radiomics stacking learning model based on non-aerated areas, we can achieve remarkable diagnostic performance to distinguish early-stage silicosis, which can provide a valuable tool for clinical assistant diagnosis. This study reveals the potential of using non-aerated lung areas as a region of interest in stacking learning for early diagnosis of silicosis, providing new insights into early detection of this disease.
Collapse
Affiliation(s)
- Hongwei Wang
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Qiyue Jia
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Yan Wang
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Wenming Xue
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Qiyue Jiang
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Fuao Ning
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Jiaxin Wang
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Zhonghui Zhu
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| | - Lin Tian
- Department of Occupational and Environmental Health, School of Public Health, Capital Medical University, Beijing, 100069, China
- Beijing Key Laboratory of Environmental Toxicology, Capital Medical University, Beijing, 100069, China
| |
Collapse
|
2
|
Kuo DP, Chen YC, Li YT, Cheng SJ, Hsieh KLC, Kuo PC, Ou CY, Chen CY. Estimating the volume of penumbra in rodents using DTI and stack-based ensemble machine learning framework. Eur Radiol Exp 2024; 8:59. [PMID: 38744784 PMCID: PMC11093947 DOI: 10.1186/s41747-024-00455-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/05/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND This study investigates the potential of diffusion tensor imaging (DTI) in identifying penumbral volume (PV) compared to the standard gadolinium-required perfusion-diffusion mismatch (PDM), utilizing a stack-based ensemble machine learning (ML) approach with enhanced explainability. METHODS Sixteen male rats were subjected to middle cerebral artery occlusion. The penumbra was identified using PDM at 30 and 90 min after occlusion. We used 11 DTI-derived metrics and 14 distance-based features to train five voxel-wise ML models. The model predictions were integrated using stack-based ensemble techniques. ML-estimated and PDM-defined PVs were compared to evaluate model performance through volume similarity assessment, the Pearson correlation analysis, and Bland-Altman analysis. Feature importance was determined for explainability. RESULTS In the test rats, the ML-estimated median PV was 106.4 mL (interquartile range 44.6-157.3 mL), whereas the PDM-defined median PV was 102.0 mL (52.1-144.9 mL). These PVs had a volume similarity of 0.88 (0.79-0.96), a Pearson correlation coefficient of 0.93 (p < 0.001), and a Bland-Altman bias of 2.5 mL (2.4% of the mean PDM-defined PV), with 95% limits of agreement ranging from -44.9 to 49.9 mL. Among the features used for PV prediction, the mean diffusivity was the most important feature. CONCLUSIONS Our study confirmed that PV can be estimated using DTI metrics with a stack-based ensemble ML approach, yielding results comparable to the volume defined by the standard PDM. The model explainability enhanced its clinical relevance. Human studies are warranted to validate our findings. RELEVANCE STATEMENT The proposed DTI-based ML model can estimate PV without the need for contrast agent administration, offering a valuable option for patients with kidney dysfunction. It also can serve as an alternative if perfusion map interpretation fails in the clinical setting. KEY POINTS • Penumbral volume can be estimated by DTI combined with stack-based ensemble ML. • Mean diffusivity was the most important feature used for predicting penumbral volume. • The proposed approach can be beneficial for patients with kidney dysfunction.
Collapse
Affiliation(s)
- Duen-Pang Kuo
- Department of Medical Imaging, Taipei Medical University Hospital, No.250, Wu Hsing Street, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Radiology, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Yung-Chieh Chen
- Department of Medical Imaging, Taipei Medical University Hospital, No.250, Wu Hsing Street, Taipei, Taiwan.
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan.
- Department of Radiology, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
| | - Yi-Tien Li
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
- Research Center for Neuroscience, Taipei Medical University, Taipei, Taiwan
- Ph.D. Program in Medical Neuroscience, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Sho-Jen Cheng
- Department of Medical Imaging, Taipei Medical University Hospital, No.250, Wu Hsing Street, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| | - Kevin Li-Chun Hsieh
- Department of Medical Imaging, Taipei Medical University Hospital, No.250, Wu Hsing Street, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
- Department of Radiology, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Po-Chih Kuo
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Chen-Yin Ou
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| | - Cheng-Yu Chen
- Department of Medical Imaging, Taipei Medical University Hospital, No.250, Wu Hsing Street, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
- Department of Radiology, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- Department of Radiology, National Defense Medical Center, Taipei, Taiwan
| |
Collapse
|
3
|
Idris NF, Ismail MA, Jaya MIM, Ibrahim AO, Abulfaraj AW, Binzagr F. Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus. PLoS One 2024; 19:e0302595. [PMID: 38718024 PMCID: PMC11078423 DOI: 10.1371/journal.pone.0302595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
Diabetes Mellitus is one of the oldest diseases known to humankind, dating back to ancient Egypt. The disease is a chronic metabolic disorder that heavily burdens healthcare providers worldwide due to the steady increment of patients yearly. Worryingly, diabetes affects not only the aging population but also children. It is prevalent to control this problem, as diabetes can lead to many health complications. As evolution happens, humankind starts integrating computer technology with the healthcare system. The utilization of artificial intelligence assists healthcare to be more efficient in diagnosing diabetes patients, better healthcare delivery, and more patient eccentric. Among the advanced data mining techniques in artificial intelligence, stacking is among the most prominent methods applied in the diabetes domain. Hence, this study opts to investigate the potential of stacking ensembles. The aim of this study is to reduce the high complexity inherent in stacking, as this problem contributes to longer training time and reduces the outliers in the diabetes data to improve the classification performance. In addressing this concern, a novel machine learning method called the Stacking Recursive Feature Elimination-Isolation Forest was introduced for diabetes prediction. The application of stacking with Recursive Feature Elimination is to design an efficient model for diabetes diagnosis while using fewer features as resources. This method also incorporates the utilization of Isolation Forest as an outlier removal method. The study uses accuracy, precision, recall, F1 measure, training time, and standard deviation metrics to identify the classification performances. The proposed method acquired an accuracy of 79.077% for PIMA Indians Diabetes and 97.446% for the Diabetes Prediction dataset, outperforming many existing methods and demonstrating effectiveness in the diabetes domain.
Collapse
Affiliation(s)
- Nur Farahaina Idris
- Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
| | - Mohd Arfian Ismail
- Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
- Centre of Excellence for Artificial Intelligence & Data Science, Universiti, Al-Sultan Pahang, Lebuhraya Tun Razak, Gambang, Malaysia
| | - Mohd Izham Mohd Jaya
- Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
| | - Ashraf Osman Ibrahim
- Creative Advanced Machine Intelligence Research Centre, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Sabah, Malaysia
| | - Anas W. Abulfaraj
- Department of Information Systems, King Abdulaziz University, Rabigh, Saudi Arabia
| | - Faisal Binzagr
- Department of Computer Science, King Abdulaziz University, Rabigh, Saudi Arabia
| |
Collapse
|
4
|
Gu J, Cao Y, Chai L, Xu E, Liu K, Chong Z, Zhang Y, Zou D, Xu Y, Wang J, Müller O, Cao J, Zhu G, Lu G. Delayed care-seeking in international migrant workers with imported malaria in China. J Travel Med 2024; 31:taae021. [PMID: 38335249 DOI: 10.1093/jtm/taae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/12/2023] [Accepted: 02/08/2024] [Indexed: 02/12/2024]
Abstract
BACKGROUND Imported malaria cases continue to pose major challenges in China as well as in other countries that have achieved elimination. Early diagnosis and treatment of each imported malaria case is the key to successfully maintaining malaria elimination success. This study aimed to build an easy-to-use predictive nomogram to predict and intervene against delayed care-seeking among international migrant workers with imported malaria. METHODS A prediction model was built based on cases with imported malaria from 2012 to 2019, in Jiangsu Province, China. Routine surveillance information (e.g. sex, age, symptoms, origin country and length of stay abroad), data on the place of initial care-seeking and the gross domestic product (GDP) of the destination city were extracted. Multivariate logistic regression was performed to identify independent predictors and a nomogram was established to predict the risk of delayed care-seeking. The discrimination and calibration of the nomogram was performed using area under the curve and calibration plots. In addition, four machine learning models were used to make a comparison. RESULTS Of 2255 patients with imported malaria, 636 (28.2%) sought care within 24 h after symptom onset, and 577 (25.6%) sought care 3 days after symptom onset. Development of symptoms before entry into China, initial care-seeking from superior healthcare facilities and a higher GDP level of the destination city were significantly associated with delayed care-seeking among migrant workers with imported malaria. Based on these independent risk factors, an easy-to-use and intuitive nomogram was established. The calibration curves of the nomogram showed good consistency. CONCLUSIONS The tool provides public health practitioners with a method for the early detection of delayed care-seeking risk among international migrant workers with imported malaria, which may be of significance in improving post-travel healthcare for labour migrants, reducing the risk of severe malaria, preventing malaria reintroduction and sustaining achievements in malaria elimination.
Collapse
Affiliation(s)
- Jiyue Gu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Yuanyuan Cao
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, Jiangsu Province, 214064, China
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province, 211166, China
| | - Liying Chai
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Enyu Xu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Kaixuan Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Zeyin Chong
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Yuying Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Dandan Zou
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
| | - Yuhui Xu
- Center for Disease Control and Prevention, Yangzhou, Jiangsu Province, 225007, China
| | - Jian Wang
- Yangzhou Schistosomiasis and Parasitic Disease Control Office, Yangzhou, Jiangsu Province, 225007, China
| | - Olaf Müller
- Institute of Global Health, Medical School, Ruprecht-Karls-University Heidelberg, Heidelberg, 69117, Germany
| | - Jun Cao
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, Jiangsu Province, 214064, China
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province, 211166, China
| | - Guoding Zhu
- National Health Commission Key Laboratory of Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, Jiangsu Province, 214064, China
- Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province, 211166, China
| | - Guangyu Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Yangzhou University, Yangzhou University, Yangzhou, Jiangsu Province, 225009, China
- Jiangsu Key Laboratory of Zoonosis, Yangzhou, 225009, China
| |
Collapse
|
5
|
Xing M, Zhao Y, Li Z, Zhang L, Yu Q, Zhou W, Huang R, Lv X, Ma Y, Li W. Development and validation of a stacking ensemble model for death prediction in the Chinese Longitudinal Healthy Longevity Survey (CLHLS). Maturitas 2024; 182:107919. [PMID: 38290423 DOI: 10.1016/j.maturitas.2024.107919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 11/12/2023] [Accepted: 01/15/2024] [Indexed: 02/01/2024]
Abstract
OBJECTIVE This study aimed to develop and validate a mortality risk prediction model for older people based on the Chinese Longitudinal Healthy Longevity Survey using the stacking ensemble strategy. MATERIAL AND METHODS A total of 12,769 participants aged 65 or more at baseline were included. Ensemble machine learning models were applied to develop a mortality prediction model. We selected three base learners, including logistic regression, eXtreme Gradient Boosting, and Categorical + Boosting, and used logistic regression as the meta-learner. The primary outcome was five-year survival. Variable importance was evaluated by the SHapley Additive exPlanations method. RESULTS The mean age at baseline was 88, and 57.8 % of participants were women. The CatBoost model performed the best among the three base learners, the area under the receiver operating characteristics curve (AUC) reached 0.8469 (95%CI: 0.8345-0.8593), and the stacking ensemble model further improved the discrimination ability (AUC = 0.8486, 95%CI: 0.8367-0.8612, P = 0.046). Conventional logistic regression had comparable performance (AUC = 0.8470, 95 % CI: 0.8346-0.8595). Older age, higher scores for self-care activities of daily living, being male, higher objective physical performance capacity scores, not undertaking housework, and lower scores on the Mini-Mental State Examination contributed to higher risk. CONCLUSIONS We successfully constructed and validated a few death risk prediction models for a Chinese population of older adults. While the stacking ensemble approach had the best prediction performance, the improvement over conventional logistic regression was insubstantial.
Collapse
Affiliation(s)
- Muqi Xing
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Yunfeng Zhao
- School of Public Health, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Zihan Li
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Lingzhi Zhang
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Qi Yu
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Wenhui Zhou
- Department of Biostatistics and Epidemiology, School of Public Health, China Medical University, Shenyang 110122, China
| | - Rong Huang
- Department of Biostatistics and Epidemiology, School of Public Health, China Medical University, Shenyang 110122, China
| | - Xiaozhen Lv
- Peking University Institute of Mental Health (Sixth Hospital), National Clinical Research Center for Mental Disorders, NHC Key Laboratory of Mental Health, Peking University, 51 Huayuan North Road, Haidian District, Beijing 100191, China.
| | - Yanan Ma
- Department of Biostatistics and Epidemiology, School of Public Health, China Medical University, Shenyang 110122, China.
| | - Wenyuan Li
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
| |
Collapse
|
6
|
Li J, Wu YJ, Liu MF, Li N, Dang LH, An GS, Lu XJ, Wang LL, Du QX, Cao J, Sun JH. Multi-omics integration strategy in the post-mortem interval of forensic science. Talanta 2024; 268:125249. [PMID: 37839320 DOI: 10.1016/j.talanta.2023.125249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/13/2023] [Accepted: 09/25/2023] [Indexed: 10/17/2023]
Abstract
Estimates of post-mortem interval (PMI), which often serve as pivotal evidence in forensic contexts, are fundamentally based on assessments of variability among diverse molecular markers (including proteins and metabolites), their correlations, and their temporal changes in post-mortem organisms. Nevertheless, the present approach to estimating the PMI is not comprehensive and exhibits poor performance. We developed an innovative approach that integrates multi-omics and artificial intelligence, using multimolecular, multimarker, and multidimensional information to accurately describe the intricate biological processes that occur after death, ultimately enabling inference of the PMI. Called the multi-omics stacking model (MOSM), it combines metabolomics, protein microarray electrophoresis, and fourier transform-infrared spectroscopy data. It shows improved prediction accuracy of the PMI, which is urgently needed in the forensic field. It achieved an accuracy of 0.93, generalized area under the receiver operating characteristic curve of 0.98, and minimum mean absolute error of 0.07. The MOSM integration framework not only considers multiple markers but also incorporates machine-learning models with distinct algorithmic principles. The diversity of biological mechanisms and algorithmic models further ensures the generalizability and robustness of PMI estimation.
Collapse
Affiliation(s)
- Jian Li
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Yan-Juan Wu
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Ming-Feng Liu
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Na Li
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Li-Hong Dang
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Guo-Shuai An
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Xiao-Jun Lu
- Criminal Investigation Detachment, Baotou City Public Security Bureau, No. 191, Jianshe Road, Qingshan District, Baotou City, Inner Mongolia Autonomous Region, 014030, PR China
| | - Liang-Liang Wang
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Qiu-Xiang Du
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China
| | - Jie Cao
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China.
| | - Jun-Hong Sun
- School of Forensic Medicine, Shanxi Medical University, No. 98, University Street, Wujinshan Town, Yuci District, Jinzhong City, Shanxi Province, 030604, PR China; Shanxi Key Laboratory of Forensic Medicine, Jinzhong, 030600, Shanxi, China.
| |
Collapse
|
7
|
Arukonda S, Cheruku R. Nested genetic algorithm-based classifier selection and placement in multi-level ensemble framework for effective disease diagnosis. Comput Methods Biomech Biomed Engin 2023:1-24. [PMID: 38126276 DOI: 10.1080/10255842.2023.2294264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 12/05/2023] [Indexed: 12/23/2023]
Abstract
Effective disease diagnosis is a critical unmet need on a global scale. The intricacies of the numerous disease mechanisms and underlying symptoms make developing a model for early diagnosis and effective treatment extremely difficult. Machine learning (ML) can help to solve some of these issues. Recently, various ensemble-based ML models have benefited clinicians in early diagnosis. However, one of the most difficult challenges in multi-level ensemble approaches is the classifier selection and their placement in the ensemble framework as it improves the overall performance. Let m classifiers have to select from n classifiers there are ( n m ) ways. Again, these ( n m ) possibilities can be arranged in m ! ways. Finding the best m classifiers and their positions from total ( n m ) m ! ways is a challenging and hard problem. To address this challenge, a dynamic three-level ensemble framework is proposed. A nested Genetic Algorithm (GA) and ensemble-based fitness function are employed to optimize the classifier selection and their placement in a three-level ensemble framework. Our approach used eleven classifiers and chose seven classifiers by maximizing the fitness function. The proposed model experiments on 12 disease datasets. The proposed model outperformed in terms of accuracy, F1, and G-measure on the Chronic Kidney Disease (CKD) dataset is 0.987, 0.988, and 0.989, respectively. In terms of AUC on the Heart disease dataset (HDD) is 0.998 and in terms of recall on the Hypothyroid disease dataset (HyDD) is 0.988. In addition, the proposed model superiority is statically evaluated by Wilcoxon-Signed-Rank (WSR) test compared with other ensemble models, such as random forest (RF), bagging classifier (BC), XGBoost (XGB), and gradient boost classifier (GBC) with probability value p < 0.05 results shows all the traditional ensemble model differs with proposed model and also effective size evaluated with using the matched-pairs rank biserial correlation coefficient wc and statistical results shows effective size is large with RF and BC and effective size is medium with XGB and GBC. Proposed model has outperformed comparing with State-Of-The-Art (SOTA) ensemble and non-ensemble models. Further, the proposed model outperformed in terms of the ROC curve in the majority of the disease datasets. The results suggest the usage of the proposed model for disease diagnosis applications.
Collapse
Affiliation(s)
- Srinivas Arukonda
- Department of Computer Science and Engineering, National Institute of Technology Warangal, Hanamkonda, India
| | - Ramalingaswamy Cheruku
- Department of Computer Science and Engineering, National Institute of Technology Warangal, Hanamkonda, India
| |
Collapse
|
8
|
Zheng J, Zhang Z, Wang J, Zhao R, Liu S, Yang G, Liu Z, Deng Z. Metabolic syndrome prediction model using Bayesian optimization and XGBoost based on traditional Chinese medicine features. Heliyon 2023; 9:e22727. [PMID: 38125549 PMCID: PMC10730568 DOI: 10.1016/j.heliyon.2023.e22727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 12/23/2023] Open
Abstract
Metabolic syndrome (MetS) has a high prevalence and is prone to many complications. However, current MetS diagnostic methods require blood tests that are not conducive to self-testing, so a user-friendly and accurate method for predicting MetS is needed to facilitate early detection and treatment. In this study, a MetS prediction model based on a simple, small number of Traditional Chinese Medicine (TCM) clinical indicators and biological indicators combined with machine learning algorithms is investigated. Electronic medical record data from 2040 patients who visited outpatient clinics at Guangdong Chinese medicine hospitals from 2020 to 2021 were used to investigate the fusion of Bayesian optimization (BO) and eXtreme gradient boosting (XGBoost) in order to create a BO-XGBoost model for screening nineteen key features in three categories: individual bio-information, TCM indicators, and TCM habits that influence MetS prediction. Subsequently, the predictive diagnostic model for MetS was developed. The experimental results revealed that the model proposed in this paper achieved values of 93.35 %, 90.67 %, 80.40 %, and 0.920 for the F1, sensitivity, FRS, and AUC metrics, respectively. These values outperformed those of the seven other tested machine learning models. Finally, this study developed an intelligent prediction application for MetS based on the proposed model, which can be utilized by ordinary users to perform self-diagnosis through a web-based questionnaire, thereby accomplishing the objective of early detection and intervention for MetS.
Collapse
Affiliation(s)
- Jianhua Zheng
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
- Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, 510630, China
| | - Zihao Zhang
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Jinhe Wang
- Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, 100091, China
| | - Ruolin Zhao
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Shuangyin Liu
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
- Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, 510630, China
| | - Gaolin Yang
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Zhengjie Liu
- Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, 510120, China
- The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, 510120, China
| | - Zhengyuan Deng
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
- Network and Educational Technology Center, Jinan University, Guangzhou, 510630, China
| |
Collapse
|
9
|
Chellappan D, Rajaguru H. Enhancement of Classifier Performance Using Swarm Intelligence in Detection of Diabetes from Pancreatic Microarray Gene Data. Biomimetics (Basel) 2023; 8:503. [PMID: 37887634 PMCID: PMC10604158 DOI: 10.3390/biomimetics8060503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/08/2023] [Accepted: 10/20/2023] [Indexed: 10/28/2023] Open
Abstract
In this study, we focused on using microarray gene data from pancreatic sources to detect diabetes mellitus. Dimensionality reduction (DR) techniques were used to reduce the dimensionally high microarray gene data. DR methods like the Bessel function, Discrete Cosine Transform (DCT), Least Squares Linear Regression (LSLR), and Artificial Algae Algorithm (AAA) are used. Subsequently, we applied meta-heuristic algorithms like the Dragonfly Optimization Algorithm (DOA) and Elephant Herding Optimization Algorithm (EHO) for feature selection. Classifiers such as Nonlinear Regression (NLR), Linear Regression (LR), Gaussian Mixture Model (GMM), Expectation Maximum (EM), Bayesian Linear Discriminant Classifier (BLDC), Logistic Regression (LoR), Softmax Discriminant Classifier (SDC), and Support Vector Machine (SVM) with three types of kernels, Linear, Polynomial, and Radial Basis Function (RBF), were utilized to detect diabetes. The classifier's performance was analyzed based on parameters like accuracy, F1 score, MCC, error rate, FM metric, and Kappa. Without feature selection, the SVM (RBF) classifier achieved a high accuracy of 90% using the AAA DR methods. The SVM (RBF) classifier using the AAA DR method for EHO feature selection outperformed the other classifiers with an accuracy of 95.714%. This improvement in the accuracy of the classifier's performance emphasizes the role of feature selection methods.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India
| |
Collapse
|
10
|
Jiang L, Xia Z, Zhu R, Gong H, Wang J, Li J, Wang L. Diabetes risk prediction model based on community follow-up data using machine learning. Prev Med Rep 2023; 35:102358. [PMID: 37654514 PMCID: PMC10465943 DOI: 10.1016/j.pmedr.2023.102358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Diabetes is a chronic metabolic disease characterized by hyperglycemia, the follow-up management of diabetes patients is mostly in the community, but the relationship between key lifestyle indicators in community follow-up and the risk of diabetes is unclear. In order to explore the association between key life characteristic indicators of community follow-up and the risk of diabetes, 252,176 follow-up records of people with diabetes patients from 2016 to 2023 were obtained from Haizhu District, Guangzhou. According to the follow-up data, the key life characteristic indicators that affect diabetes are determined, and the optimal feature subset is obtained through feature selection technology to accurately assess the risk of diabetes. A diabetes risk assessment model based on a random forest classifier was designed, which used optimal feature parameter selection and algorithm model comparison, with an accuracy of 91.24% and an AUC corresponding to the ROC curve of 97%. In order to improve the applicability of the model in clinical and real life, a diabetes risk score card was designed and tested using the original data, the accuracy was 95.15%, and the model reliability was high. The diabetes risk prediction model based on community follow-up big data mining can be used for large-scale risk screening and early warning by community doctors based on patient follow-up data, further promoting diabetes prevention and control strategies, and can also be used for wearable devices or intelligent biosensors for individual patient self examination, in order to improve lifestyle and reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zhenhua Xia
- Electronics & Information School of Yangtze University, Jingzhou, China
| | - Ronghui Zhu
- Shenzhen Nanshan Medical Group HQ, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Jing Wang
- E-link Wisdom Co., Ltd, Shenzhen, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
11
|
Zhou H, Xin Y, Li S. A diabetes prediction model based on Boruta feature selection and ensemble learning. BMC Bioinformatics 2023; 24:224. [PMID: 37264332 DOI: 10.1186/s12859-023-05300-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 04/21/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND AND OBJECTIVE As a common chronic disease, diabetes is called the "second killer" among modern diseases. Currently, there is no medical cure for diabetes. We can only rely on medication for auxiliary treatment. However, many diabetic patients still die each year. In addition, a considerable number of people do not pay attention to their physical health or opt out of treatment due to lack of money, which eventually leads to various complications. Therefore, diagnosing diabetes at an early stage and intervening early is necessary; thus, developing an early detection method for diabetes is essential. METHODS In this study, a diabetes prediction model based on Boruta feature selection and ensemble learning is proposed. The model contains the use of Boruta feature selection, the extraction of salient features from datasets, the use of the K-Means++ algorithm for unsupervised clustering of data and stacking of an ensemble learning method for classification. It has been validated on a diabetes dataset. RESULTS The experiments were performed on the PIMA Indian diabetes dataset. The model was evaluated by accuracy, precision and F1 index. The obtained results show that the accuracy rate of the model reaches 98% and achieves good results. CONCLUSION Compared with other diabetes prediction models, this model achieved better results, and the obtained results indicate that this model is superior to other models in diabetes prediction and has better performance.
Collapse
Affiliation(s)
- Hongfang Zhou
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China.
- Shaanxi Key Laboratory of Network Computing and Security Technology, Xi'an, 710048, China.
| | - Yinbo Xin
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China
| | - Suli Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China
| |
Collapse
|
12
|
Liu C, Yao Z, Liu P, Tu Y, Chen H, Cheng H, Xie L, Xiao K. Early prediction of MODS interventions in the intensive care unit using machine learning. JOURNAL OF BIG DATA 2023; 10:55. [PMID: 37193361 PMCID: PMC10158675 DOI: 10.1186/s40537-023-00719-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/21/2023] [Indexed: 05/18/2023]
Abstract
Background Multiple organ dysfunction syndrome (MODS) is one of the leading causes of death in critically ill patients. MODS is the result of a dysregulated inflammatory response that can be triggered by various causes. Owing to the lack of an effective treatment for patients with MODS, early identification and intervention are the most effective strategies. Therefore, we have developed a variety of early warning models whose prediction results can be interpreted by Kernel SHapley Additive exPlanations (Kernel-SHAP) and reversed by diverse counterfactual explanations (DiCE). So we can predict the probability of MODS 12 h in advance, quantify the risk factors, and automatically recommend relevant interventions. Methods We used various machine learning algorithms to complete the early risk assessment of MODS, and used a stacked ensemble to improve the prediction performance. The kernel-SHAP algorithm was used to quantify the positive and minus factors corresponding to the individual prediction results, and finally, the DiCE method was used to automatically recommend interventions. We completed the model training and testing based on the MIMIC-III and MIMIC-IV databases, in which the sample features in the model training included the patients' vital signs, laboratory test results, test reports, and data related to the use of ventilators. Results The customizable model called SuperLearner, which integrated multiple machine learning algorithms, had the highest authenticity of screening, and its Yordon index (YI), sensitivity, accuracy, and utility_score on the MIMIC-IV test set were 0.813, 0.884, 0.893, and 0.763, respectively, which were all maximum values of eleven models. The area under the curve of the deep-wide neural network (DWNN) model on the MIMIC-IV test set was 0.960, and the specificity was 0.935, which were both the maximum values of all these models. The Kernel-SHAP algorithm combined with SuperLearner was used to determine the minimum value of glasgow coma scale (GCS) in the current hour (OR = 0.609, 95% CI 0.606-0.612), maximum value of MODS score corresponding to GCS in the past 24 h (OR = 2.632, 95% CI 2.588-2.676), and maximum score of MODS corresponding to creatinine in the past 24 h (OR = 3.281, 95% CI 3.267-3.295) were generally the most influential factors. Conclusion The MODS early warning model based on machine learning algorithms has considerable application value, and the prediction efficiency of SuperLearner is superior to those of SubSuperLearner, DWNN, and other eight common machine learning models. Considering that the attribution analysis of Kernel-SHAP is a static analysis of the prediction results, we introduce the DiCE algorithm to automatically recommend counterfactuals to reverse the prediction results, which will be an important step towards the practical application of automatic MODS early intervention. Supplementary Information The online version contains supplementary material available at 10.1186/s40537-023-00719-2.
Collapse
Affiliation(s)
- Chang Liu
- Center of Pulmonary & Critical Care Medicine, Chinese People’s Liberation Army (PLA) General Hospital, Beijing, 100039 China
- School of Medicine, Nankai University, Tianjin, 300071 China
| | - Zhenjie Yao
- Institute of Microelectronics, Chinese Academy of Sciences, Beijing, 100029 China
| | - Pengfei Liu
- Center of Pulmonary & Critical Care Medicine, Chinese People’s Liberation Army (PLA) General Hospital, Beijing, 100039 China
| | - Yanhui Tu
- Purple Mountain Laboratory: Networking, Communications and Security, Nanjing, 211111 China
| | - Hu Chen
- Purple Mountain Laboratory: Networking, Communications and Security, Nanjing, 211111 China
| | - Haibo Cheng
- Purple Mountain Laboratory: Networking, Communications and Security, Nanjing, 211111 China
| | - Lixin Xie
- Center of Pulmonary & Critical Care Medicine, Chinese People’s Liberation Army (PLA) General Hospital, Beijing, 100039 China
- School of Medicine, Nankai University, Tianjin, 300071 China
| | - Kun Xiao
- Center of Pulmonary & Critical Care Medicine, Chinese People’s Liberation Army (PLA) General Hospital, Beijing, 100039 China
| |
Collapse
|
13
|
A Novel Proposal for Deep Learning-Based Diabetes Prediction: Converting Clinical Data to Image Data. Diagnostics (Basel) 2023; 13:diagnostics13040796. [PMID: 36832284 PMCID: PMC9955314 DOI: 10.3390/diagnostics13040796] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 02/22/2023] Open
Abstract
Diabetes, one of the most common diseases worldwide, has become an increasingly global threat to humans in recent years. However, early detection of diabetes greatly inhibits the progression of the disease. This study proposes a new method based on deep learning for the early detection of diabetes. Like many other medical data, the PIMA dataset used in the study contains only numerical values. In this sense, the application of popular convolutional neural network (CNN) models to such data are limited. This study converts numerical data into images based on the feature importance to use the robust representation of CNN models in early diabetes diagnosis. Three different classification strategies are then applied to the resulting diabetes image data. In the first, diabetes images are fed into the ResNet18 and ResNet50 CNN models. In the second, deep features of the ResNet models are fused and classified with support vector machines (SVM). In the last approach, the selected fusion features are classified by SVM. The results demonstrate the robustness of diabetes images in the early diagnosis of diabetes.
Collapse
|
14
|
Novel Prediction Method Applied to Wound Age Estimation: Developing a Stacking Ensemble Model to Improve Predictive Performance Based on Multi-mRNA. Diagnostics (Basel) 2023; 13:diagnostics13030395. [PMID: 36766500 PMCID: PMC9914838 DOI: 10.3390/diagnostics13030395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 01/24/2023] Open
Abstract
(1) Background: Accurate diagnosis of wound age is crucial for investigating violent cases in forensic practice. However, effective biomarkers and forecast methods are lacking. (2) Methods: Samples were collected from rats divided randomly into control and contusion groups at 0, 4, 8, 12, 16, 20, and 24 h post-injury. The characteristics of concern were nine mRNA expression levels. Internal validation data were used to train different machine learning algorithms, namely random forest (RF), support vector machine (SVM), multilayer perceptron (MLP), gradient boosting (GB), and stochastic gradient descent (SGD), to predict wound age. These models were considered the base learners, which were then applied to developing 26 stacking ensemble models combining two, three, four, or five base learners. The best-performing stacking model and base learner were evaluated through external validation data. (3) Results: The best results were obtained using a stacking model of RF + SVM + MLP (accuracy = 92.85%, area under the receiver operating characteristic curve (AUROC) = 0.93, root-mean-square-error (RMSE) = 1.06 h). The wound age prediction performance of the stacking models was also confirmed for another independent dataset. (4) Conclusions: We illustrate that machine learning techniques, especially ensemble algorithms, have a high potential to be used to predict wound age. According to the results, the strategy can be applied to other types of forensic forecasts.
Collapse
|
15
|
Joseph LP, Joseph EA, Prasad R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Comput Biol Med 2022; 151:106178. [PMID: 36306578 DOI: 10.1016/j.compbiomed.2022.106178] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/23/2022] [Accepted: 10/01/2022] [Indexed: 12/27/2022]
Abstract
Diabetes is a deadly chronic disease that occurs when the pancreas is not able to produce ample insulin or when the body cannot use insulin effectively. If undetected, it may lead to a host of health complications. Hence, accurate and explainable early-stage detection of diabetes is essential for the proper administration of treatment options in leading a healthy and productive life. For this, we developed an interpretable TabNet model tuned via Bayesian optimization (BO). To achieve model-specific interpretability, the attention mechanism of TabNet architecture was used, which offered the local and global model explanations on the influence of the attributes on the outcomes. The model was further explained locally and globally using more robust model-agnostic LIME and SHAP eXplainable Artificial Intelligence (XAI) tools. The proposed model outperformed all benchmarked models by obtaining high accuracy of 92.2% and 99.4% using the Pima Indians diabetes dataset (PIDD) and the early-stage diabetes risk prediction dataset (ESDRPD), respectively. Based on the XAI results, it was clear that the most influential attribute for diabetes classification using PIDD and ESDRPD were Insulin and Polyuria, respectively. The feature importance values registered for insulin was 0.301 (PIDD) and for polyuria 0.206 was registered (ESDRPD). The high accuracy and ancillary interpretability of our objective model is expected to increase end-users trust and confidence in early-stage detection of diabetes.
Collapse
Affiliation(s)
- Lionel P Joseph
- School of Mathematics, Physics, and Computing, University of Southern Queensland, Springfield, QLD, 4300, Australia
| | - Erica A Joseph
- Umanand Prasad School of Medicine and Health Sciences, The University of Fiji, Saweni, Lautoka, Fiji
| | - Ramendra Prasad
- Department of Science, School of Science and Technology, The University of Fiji, Saweni, Lautoka, Fiji.
| |
Collapse
|
16
|
Zhu X, Zhang M, Wen Y, Shang D. Machine learning advances the integration of covariates in population pharmacokinetic models: Valproic acid as an example. Front Pharmacol 2022; 13:994665. [PMID: 36324679 PMCID: PMC9621318 DOI: 10.3389/fphar.2022.994665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/03/2022] [Indexed: 11/24/2022] Open
Abstract
Background and Aim: Many studies associated with the combination of machine learning (ML) and pharmacometrics have appeared in recent years. ML can be used as an initial step for fast screening of covariates in population pharmacokinetic (popPK) models. The present study aimed to integrate covariates derived from different popPK models using ML. Methods: Two published popPK models of valproic acid (VPA) in Chinese epileptic patients were used, where the population parameters were influenced by some covariates. Based on the covariates and a one-compartment model that describes the pharmacokinetics of VPA, a dataset was constructed using Monte Carlo simulation, to develop an XGBoost model to estimate the steady-state concentrations (Css) of VPA. We utilized SHapley Additive exPlanation (SHAP) values to interpret the prediction model, and calculated estimates of VPA exposure in four assumed scenarios involving different combinations of CYP2C19 genotypes and co-administered antiepileptic drugs. To develop an easy-to-use model in the clinic, we built a simplified model by using CYP2C19 genotypes and some noninvasive clinical parameters, and omitting several features that were infrequently measured or whose clinically available values were inaccurate, and verified it on our independent external dataset. Results: After data preprocessing, the finally generated combined dataset was divided into a derivation cohort and a validation cohort (8:2). The XGBoost model was developed in the derivation cohort and yielded excellent performance in the validation cohort with a mean absolute error of 2.4 mg/L, root-mean-squared error of 3.3 mg/L, mean relative error of 0%, and percentages within ±20% of actual values of 98.85%. The SHAP analysis revealed that daily dose, time, CYP2C19*2 and/or *3 variants, albumin, body weight, single dose, and CYP2C19*1*1 genotype were the top seven confounding factors influencing the Css of VPA. Under the simulated dosage regimen of 500 mg/bid, the VPA exposure in patients who had CYP2C19*2 and/or *3 variants and no carbamazepine, phenytoin, or phenobarbital treatment, was approximately 1.74-fold compared to those with CYP2C19*1/*1 genotype and co-administered carbamazepine + phenytoin + phenobarbital. The feasibility of the simplified model was fully illustrated by its performance in our external dataset. Conclusion: This study highlighted the bridging role of ML in big data and pharmacometrics, by integrating covariates derived from different popPK models.
Collapse
Affiliation(s)
- Xiuqing Zhu
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
| | - Ming Zhang
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
| | - Yuguan Wen
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
- *Correspondence: Yuguan Wen, ; Dewei Shang,
| | - Dewei Shang
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
- *Correspondence: Yuguan Wen, ; Dewei Shang,
| |
Collapse
|
17
|
Zhu X, Hu J, Xiao T, Huang S, Wen Y, Shang D. An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine. Front Pharmacol 2022; 13:975855. [PMID: 36238557 PMCID: PMC9552071 DOI: 10.3389/fphar.2022.975855] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/05/2022] [Indexed: 11/13/2022] Open
Abstract
Background and Aim: Therapeutic drug monitoring (TDM) has evolved over the years as an important tool for personalized medicine. Nevertheless, some limitations are associated with traditional TDM. Emerging data-driven model forecasting [e.g., through machine learning (ML)-based approaches] has been used for individualized therapy. This study proposes an interpretable stacking-based ML framework to predict concentrations in real time after olanzapine (OLZ) treatment. Methods: The TDM-OLZ dataset, consisting of 2,142 OLZ measurements and 472 features, was formed by collecting electronic health records during the TDM of 927 patients who had received OLZ treatment. We compared the performance of ML algorithms by using 10-fold cross-validation and the mean absolute error (MAE). The optimal subset of features was analyzed by a random forest-based sequential forward feature selection method in the context of the top five heterogeneous regressors as base models to develop a stacked ensemble regressor, which was then optimized via the grid search method. Its predictions were explained by using local interpretable model-agnostic explanations (LIME) and partial dependence plots (PDPs). Results: A state-of-the-art stacking ensemble learning framework that integrates optimized extra trees, XGBoost, random forest, bagging, and gradient-boosting regressors was developed for nine selected features [i.e., daily dose (OLZ), gender_male, age, valproic acid_yes, ALT, K, BW, MONO#, and time of blood sampling after first administration]. It outperformed other base regressors that were considered, with an MAE of 0.064, R-square value of 0.5355, mean squared error of 0.0089, mean relative error of 13%, and ideal rate (the percentages of predicted TDM within ± 30% of actual TDM) of 63.40%. Predictions at the individual level were illustrated by LIME plots, whereas the global interpretation of associations between features and outcomes was illustrated by PDPs. Conclusion: This study highlights the feasibility of the real-time estimation of drug concentrations by using stacking-based ML strategies without losing interpretability, thus facilitating model-informed precision dosing.
Collapse
Affiliation(s)
- Xiuqing Zhu
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
| | - Jinqing Hu
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
| | - Tao Xiao
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Department of Clinical Research, Guangdong Second Provincial General Hospital, Guangzhou, China
| | - Shanqing Huang
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
| | - Yuguan Wen
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
- *Correspondence: Yuguan Wen, ; Dewei Shang,
| | - Dewei Shang
- Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
- Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China
- *Correspondence: Yuguan Wen, ; Dewei Shang,
| |
Collapse
|
18
|
Study of Multidimensional and High-Precision Height Model of Youth Based on Multilayer Perceptron. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7843455. [PMID: 35761869 PMCID: PMC9233609 DOI: 10.1155/2022/7843455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/14/2022] [Accepted: 05/13/2022] [Indexed: 11/17/2022]
Abstract
Predicting the adult height of children accurately has great social value for the selection of outstanding athlete as well as early detection of children's growth disorders. Currently, the mainstream method used to predict adult height in China has three problems: its standards are not uniform; it is stale for current Chinese children; its accuracy is not satisfactory. This article uses the data collected by the Chinese Children and Adolescents' Physical Fitness and Growth Health Project in Zhejiang primary and secondary schools. We put forward a new multidimensional and high-precision youth growth curve prediction model, which is based on multilayer perceptron. First, this model uses multidimensional growth data of children as predictors and then utilizes multilayer perceptron to predict the children's adult height. Second, we find the Table of Height Standard Deviation of Chinese Children and fit the data of zero standard deviation to obtain the curve. This curve is regarded as Chinese children's mean growth curve. Third, we use the least-squares method and the mean curve to calculate the individual growth curve. Finally, the individual curve can be used to predict children's state height. Experimental results show that this adult height prediction model's accuracy (between 2 cm) of boys and girls reached 90.20% and 88.89% and the state height prediction accuracy reached 77.46% and 74.93%. Compared with Bayley–Pinneau, the adult height prediction is improved 19.61% for boys and 13.33% for girls. Compared with BoneXpert, the adult height prediction is improved 25.49% for boys and 6.67% for girls. Compared with the method based on the bone age growth map, the adult height prediction is improved 15.69% for boys and 24.45% for girls.
Collapse
|
19
|
Gollapalli M, Alansari A, Alkhorasani H, Alsubaii M, Sakloua R, Alzahrani R, Taha Al-Hariri M, Nasser Alfares M, AlKhafaji D, Jaafar Al Argan R, Albaker W. A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-diabetes, T1DM, and T2DM. Comput Biol Med 2022; 147:105757. [DOI: 10.1016/j.compbiomed.2022.105757] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 05/27/2022] [Accepted: 06/18/2022] [Indexed: 11/29/2022]
|
20
|
Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health Inf Sci Syst 2022; 10:2. [PMID: 35178244 PMCID: PMC8828812 DOI: 10.1007/s13755-021-00168-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/15/2022] Open
Abstract
Type 2 Diabetes (T2D) is a chronic disease characterized by abnormally high blood glucose levels due to insulin resistance and reduced pancreatic insulin production. The challenge of this work is to identify T2D-associated features that can distinguish T2D sub-types for prognosis and treatment purposes. We thus employed machine learning (ML) techniques to categorize T2D patients using data from the Pima Indian Diabetes Dataset from the Kaggle ML repository. After data preprocessing, several feature selection techniques were used to extract feature subsets, and a range of classification techniques were used to analyze these. We then compared the derived classification results to identify the best classifiers by considering accuracy, kappa statistics, area under the receiver operating characteristic (AUROC), sensitivity, specificity, and logarithmic loss (logloss). To evaluate the performance of different classifiers, we investigated their outcomes using the summary statistics with a resampling distribution. Therefore, Generalized Boosted Regression modeling showed the highest accuracy (90.91%), followed by kappa statistics (78.77%) and specificity (85.19%). In addition, Sparse Distance Weighted Discrimination, Generalized Additive Model using LOESS and Boosted Generalized Additive Models also gave the maximum sensitivity (100%), highest AUROC (95.26%) and lowest logarithmic loss (30.98%) respectively. Notably, the Generalized Additive Model using LOESS was the top-ranked algorithm according to non-parametric Friedman testing. Of the features identified by these machine learning models, glucose levels, body mass index, diabetes pedigree function, and age were consistently identified as the best and most frequently accurate outcome predictors. These results indicate the utility of ML methods in constructing improved prediction models for T2D and successfully identified outcome predictors for this Pima Indian population.
Collapse
|
21
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|