1
|
Hossain S, Hasan MK, Faruk MO, Aktar N, Hossain R, Hossain K. Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023. BMC Cardiovasc Disord 2024; 24:214. [PMID: 38632519 PMCID: PMC11025260 DOI: 10.1186/s12872-024-03883-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 04/08/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Cardiovascular disorders (CVDs) are the leading cause of death worldwide. Lower- and middle-income countries (LMICs), such as Bangladesh, are also affected by several types of CVDs, such as heart failure and stroke. The leading cause of death in Bangladesh has recently switched from severe infections and parasitic illnesses to CVDs. MATERIALS AND METHODS The study dataset comprised a random sample of 391 CVD patients' medical records collected between August 2022 and April 2023 using simple random sampling. Moreover, 260 data points were collected from individuals with no CVD problems for comparison purposes. Crosstabs and chi-square tests were used to determine the association between CVD and the explanatory variables. Logistic regression, Naïve Bayes classifier, Decision Tree, AdaBoost classifier, Random Forest, Bagging Tree, and Ensemble learning classifiers were used to predict CVD. The performance evaluations encompassed accuracy, sensitivity, specificity, and area under the receiver operator characteristic (AU-ROC) curve. RESULTS Random Forest had the highest precision among the five techniques considered. The precision rates for the mentioned classifiers are as follows: Logistic Regression (93.67%), Naïve Bayes (94.87%), Decision Tree (96.1%), AdaBoost (94.94%), Random Forest (96.15%), and Bagging Tree (94.87%). The Random Forest classifier maintains the highest balance between correct and incorrect predictions. With 98.04% accuracy, the Random Forest classifier achieved the best precision (96.15%), robust recall (100%), and high F1 score (97.7%). In contrast, the Logistic Regression model achieved the lowest accuracy of 95.42%. Remarkably, the Random Forest classifier achieved the highest AUC value (0.989). CONCLUSION This research mainly focused on identifying factors that are critical in impacting patients with CVD and predicting CVD risk. It is strongly advised that the Random Forest technique be implemented in a system for predicting cardiac diseases. This research may change clinical practice by providing doctors with a new instrument to determine a patient's CVD prognosis.
Collapse
Affiliation(s)
- Sorif Hossain
- Department of Statistics, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh.
| | - Mohammad Kamrul Hasan
- Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh
| | - Mohammad Omar Faruk
- Department of Statistics, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh
| | - Nelufa Aktar
- Department of Statistics, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh
| | - Riyadh Hossain
- Department of Statistics, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh
| | - Kabir Hossain
- Department of Statistics, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh
| |
Collapse
|
2
|
Jiang L, Yang Z, Wang D, Gong H, Li J, Wang J, Wang L. Diabetes prediction model for unbalanced community follow-up data set based on optimal feature selection and scorecard. Digit Health 2024; 10:20552076241236370. [PMID: 38449681 PMCID: PMC10915850 DOI: 10.1177/20552076241236370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 03/08/2024] Open
Abstract
Objectives Diabetes is a metabolic disease and early detection is crucial to ensuring a healthy life for people with prediabetes. Community care plays an important role in public health, but the association between community follow-up of key life characteristics and diabetes risk remains unclear. Based on the method of optimal feature selection and risk scorecard, follow-up data of diabetes patients are modeled to assess diabetes risk. Methods We conducted a study on the diabetes risk assessment model and risk scorecard using follow-up data from diabetes patients in Haizhu District, Guangzhou, from 2016 to 2023. The raw data underwent preprocessing and imbalance handling. Subsequently, features relevant to diabetes were selected and optimized to determine the optimal subset of features associated with community follow-up and diabetes risk. We established the diabetes risk assessment model. Furthermore, for a comprehensible and interpretable risk expression, the Weight of Evidence transformation method was applied to features. The transformed features were discretized using the quantile binning method to design the risk scorecard, mapping the model's output to five risk levels. Results In constructing the diabetes risk assessment model, the Random Forest classifier achieved the highest accuracy. The risk scorecard obtained an accuracy of 85.16%, precision of 87.30%, recall of 80.26%, and an F1 score of 83.27% on the unbalanced research dataset. The performance loss compared to the diabetes risk assessment model was minimal, suggesting that the binning method used for constructing the diabetes risk scorecard is reasonable, with very low feature information loss. Conclusion The methods provided in this article demonstrate effectiveness and reliability in the assessment of diabetes risk. The assessment model and scorecard can be directly applied to community doctors for large-scale risk identification and early warning and can also be used for individual self-examination to reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zerui Yang
- Electronics & Information School, Yangtze University, Jingzhou, China
| | - Donghai Wang
- Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Jing Wang
- Shenzhen E-link Wisdom Co., Ltd, Shenzhen, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
3
|
Kim SH, Park SH, Lee H. Machine learning for predicting hepatitis B or C virus infection in diabetic patients. Sci Rep 2023; 13:21518. [PMID: 38057379 PMCID: PMC10700585 DOI: 10.1038/s41598-023-49046-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 12/04/2023] [Indexed: 12/08/2023] Open
Abstract
Highly prevalent hepatitis B and hepatitis C virus (HBV and HCV) infections have been reported among individuals with diabetes. Given the frequently asymptomatic nature of hepatitis and the challenges associated with screening in some vulnerable populations such as diabetes patients, we conducted an investigation into the performance of various machine learning models for the identification of hepatitis in diabetic patients while also evaluating the significance of features. Analyzing NHANES data from 2013 to 2018, machine learning models were evaluated; random forest (RF), support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), and least absolute shrinkage and selection operator (LASSO) along with stacked ensemble model. We performed hyperparameter tuning to improve the performance of the model, and selected important predictors using the best performance model. LASSO showed the highest predictive performance (AUC-ROC = 0.810) rather than other models. Illicit drug use, poverty, and race were highly ranked as predictive factors for developing hepatitis in diabetes patients. Our study demonstrated that a machine-learning-based model performed optimally in the detection of hepatitis among diabetes patients, achieving high performance. Furthermore, models and predictors evaluated from the current study, we expect, could be supportive information for developing screening or treatment methods for hepatitis care in diabetes patients.
Collapse
Affiliation(s)
- Sun-Hwa Kim
- Department of Clinical Medicinal Sciences, Konyang University, Nonsan, Republic of Korea
| | - So-Hyeon Park
- Department of Clinical Medicinal Sciences, Konyang University, Nonsan, Republic of Korea
| | - Heeyoung Lee
- College of Pharmacy, Inje University, Gimhae, Republic of Korea.
| |
Collapse
|
4
|
Xiao L, Zhang Y, Xu X, Dou Y, Guan X, Guo Y, Wen X, Meng Y, Liao M, Hu Q, Yu J. Predictive model for early death risk in pediatric hemophagocytic lymphohistiocytosis patients based on machine learning. Heliyon 2023; 9:e22202. [PMID: 38045172 PMCID: PMC10692822 DOI: 10.1016/j.heliyon.2023.e22202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 10/23/2023] [Accepted: 11/06/2023] [Indexed: 12/05/2023] Open
Abstract
Background Hemophagocytic Lymphohistiocytosis (HLH) is a rare and life-threatening disease in children, with a high early mortality rate. This study aimed to construct machine learning model to predict the risk of early death using clinical indicators at the time of HLH diagnosis. Methods This observational cohort study was conducted at the National Clinical Research Center for Child Health and Disease. Data was collected from pediatric HLH patients diagnosed by the HLH-2004 protocol between January 2006 and December 2022. Six machine learning models were constructed using the Least Absolute Shrinkage and Selection Operator (LASSO) to select key clinical indicators for model construction. Results The study included 587 pediatric HLH patients, and the early mortality rate was 28.45 %. The logistic and XGBoost model with the best performance after feature screening were selected to predict early death of HLH patients. The logistic model had an AUC of 0.915 and an accuracy of 0.863, while the XGBoost model had an AUC of 0.889 and an accuracy of 0.829. The risk factors most associated with early death were the absence of immunochemotherapy, decreased TC levels, increased BUN and total bilirubin, and prolonged TT. We developed an online calculator tool for predicting the probability of early death in children with HLH. Conclusions We developed the first web-based early mortality prediction tool for pediatric HLH to assist clinicians in risk stratification at diagnosis and in developing personalized treatment protocols. This study is registered on the China Clinical Trials Registry platform (ChiCTR2200061315).
Collapse
Affiliation(s)
- Li Xiao
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Ximing Xu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Ying Dou
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Xianmin Guan
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Yuxia Guo
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Xianhao Wen
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Yan Meng
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Meiling Liao
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| | - Qinshi Hu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Jie Yu
- Department of Hematology and Oncology, Children's Hospital of Chongqing Medical University, National Clinical Research Center for Child Health and Disorders, Chongqing Key Laboratory of Pediatrics, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing, China
| |
Collapse
|
5
|
Hosseini S, Pourmirzaee R, Armaghani DJ, Sabri Sabri MM. Prediction of ground vibration due to mine blasting in a surface lead-zinc mine using machine learning ensemble techniques. Sci Rep 2023; 13:6591. [PMID: 37085660 PMCID: PMC10121721 DOI: 10.1038/s41598-023-33796-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 04/19/2023] [Indexed: 04/23/2023] Open
Abstract
Ground vibration due to blasting is identified as a challenging issue in mining and civil activities. Peak particle velocity (PPV) is one of the blasting undesirable consequences, which is resulted during emission of vibration in blasted bench. This study focuses on the PPV prediction in the surface mines. In this regard, two ensemble systems, i.e., the ensemble of artificial neural networks and the ensemble of extreme gradient boosting (EXGBoosts) were developed for PPV prediction in one of the largest lead-zinc open-pit mines in the Middle East. For ensemble modeling, several ANN and XGBoost base models were separately designed with different architectures. Then, the validation indices such as coefficient determination (R2), root mean square error (RMSE), mean absolute error (MAE), the variance accounted for (VAF), and Accuracy were used to evaluate the performance of the base models. The five top base models with high accuracy were selected to construct an ensemble model for each of the methods, i.e., ANNs and XGBoosts. To combine the outputs of the top base models and achieve a single result stacked generalization technique, was employed. Findings showed ensemble models increase the accuracy of PPV predicting in comparison with the best individual models. The EXGBoosts was superior method for predicting of the PPV, which obtained values of R2, RMSE, MAE, VAF, and Accuracy corresponding to the EXGBoosts were (0.990, 0.391, 0.257, 99.013(%), 98.216), and (0.968, 0.295, 0.427, 96.674(%), 96.059), for training and testing datasets, respectively. However, the sensitivity analysis indicated that the spacing (r = 0.917) and number of blast-holes (r = 0.839) had the highest and lowest impact on the PPV intensity, respectively.
Collapse
Affiliation(s)
- Shahab Hosseini
- Faculty of Engineering, Tarbiat Modares University, Tehran, Iran
| | - Rashed Pourmirzaee
- Department of Mining Engineering, Urmia University of Technology, Urmia, Iran.
| | - Danial Jahed Armaghani
- Faculty of Civil Engineering, Centre of Tropical Geoengineering (GEOTROPIK), Institute of Smart Infrastructure and Innovative Engineering (ISIIC), Universiti Teknologi Malaysia, 81310, Johor Bahru, Malaysia
| | | |
Collapse
|
6
|
Mamdouh Farghaly H, Shams MY, Abd El-Hafeez T. Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-023-01851-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
AbstractPrediction and classification of diseases are essential in medical science, as it attempts to immune the spread of the disease and discover the infected regions from the early stages. Machine learning (ML) approaches are commonly used for predicting and classifying diseases that are precisely utilized as an efficient tool for doctors and specialists. This paper proposes a prediction framework based on ML approaches to predict Hepatitis C Virus among healthcare workers in Egypt. We utilized real-world data from the National Liver Institute, founded at Menoufiya University (Menoufiya, Egypt). The collected dataset consists of 859 patients with 12 different features. To ensure the robustness and reliability of the proposed framework, we performed two scenarios: the first without feature selection and the second after the features are selected based on sequential forward selection (SFS). Furthermore, the feature subset selected based on the generated features from SFS is evaluated. Naïve Bayes, random forest (RF), K-nearest neighbor, and logistic regression are utilized as induction algorithms and classifiers for model evaluation. Then, the effect of parameter tuning on learning techniques is measured. The experimental results indicated that the proposed framework achieved higher accuracies after SFS selection than without feature selection. Moreover, the RF classifier achieved 94.06% accuracy with a minimum learning elapsed time of 0.54 s. Finally, after adjusting the hyperparameter values of the RF classifier, the classification accuracy is improved to 94.88% using only four features.
Collapse
|