1
|
Atimbire SA, Appati JK, Owusu E. Empirical exploration of whale optimisation algorithm for heart disease prediction. Sci Rep 2024; 14:4530. [PMID: 38402276 PMCID: PMC10894250 DOI: 10.1038/s41598-024-54990-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/19/2024] [Indexed: 02/26/2024] Open
Abstract
Heart Diseases have the highest mortality worldwide, necessitating precise predictive models for early risk assessment. Much existing research has focused on improving model accuracy with single datasets, often neglecting the need for comprehensive evaluation metrics and utilization of different datasets in the same domain (heart disease). This research introduces a heart disease risk prediction approach by harnessing the whale optimization algorithm (WOA) for feature selection and implementing a comprehensive evaluation framework. The study leverages five distinct datasets, including the combined dataset comprising the Cleveland, Long Beach VA, Switzerland, and Hungarian heart disease datasets. The others are the Z-AlizadehSani, Framingham, South African, and Cleveland heart datasets. The WOA-guided feature selection identifies optimal features, subsequently integrated into ten classification models. Comprehensive model evaluation reveals significant improvements across critical performance metrics, including accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve. These enhancements consistently outperform state-of-the-art methods using the same dataset, validating the effectiveness of our methodology. The comprehensive evaluation framework provides a robust assessment of the model's adaptability, underscoring the WOA's effectiveness in identifying optimal features in multiple datasets in the same domain.
Collapse
Affiliation(s)
| | | | - Ebenezer Owusu
- Department of Computer Science, University of Ghana, Accra, Ghana
| |
Collapse
|
2
|
Nilashi M, Abumalloh RA, Alyami S, Alghamdi A, Alrizq M. Parkinson’s Disease Diagnosis Using Laplacian Score, Gaussian Process Regression and Self-Organizing Maps. Brain Sci 2023; 13:brainsci13040543. [PMID: 37190508 DOI: 10.3390/brainsci13040543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 03/10/2023] [Accepted: 03/18/2023] [Indexed: 03/29/2023] Open
Abstract
Parkinson’s disease (PD) is a complex degenerative brain disease that affects nerve cells in the brain responsible for body movement. Machine learning is widely used to track the progression of PD in its early stages by predicting unified Parkinson’s disease rating scale (UPDRS) scores. In this paper, we aim to develop a new method for PD diagnosis with the aid of supervised and unsupervised learning techniques. Our method is developed using the Laplacian score, Gaussian process regression (GPR) and self-organizing maps (SOM). SOM is used to segment the data to handle large PD datasets. The models are then constructed using GPR for the prediction of the UPDRS scores. To select the important features in the PD dataset, we use the Laplacian score in the method. We evaluate the developed approach on a PD dataset including a set of speech signals. The method was evaluated through root-mean-square error (RMSE) and adjusted R-squared (adjusted R²). Our findings reveal that the proposed method is efficient in the prediction of UPDRS scores through a set of speech signals (dysphonia measures). The method evaluation showed that SOM combined with the Laplacian score and Gaussian process regression with the exponential kernel provides the best results for R-squared (Motor-UPDRS = 0.9489; Total-UPDRS = 0.9516) and RMSE (Motor-UPDRS = 0.5144; Total-UPDRS = 0.5105) in predicting UPDRS compared with the other kernels in Gaussian process regression.
Collapse
|
3
|
Fajri YAZA, Wiharto W, Suryani E. Hybrid Model Feature Selection with the Bee Swarm Optimization Method and Q-Learning on the Diagnosis of Coronary Heart Disease. Information 2022; 14:15. [DOI: 10.3390/info14010015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Coronary heart disease is a type of cardiovascular disease characterized by atherosclerotic plaque, which causes myocardial infarction or sudden cardiac death. Since this sudden heart attack has no apparent symptoms, the early detection of the risk factors for coronary heart disease is required. Many studies have been conducted to diagnose heart disease, including studies that tested various classifiers, feature selection and detection models on several coronary heart disease datasets. As a result, this research aims to learn about the effect of the bee swarm optimization algorithm combined with Q-learning for optimizing the feature selection in improving the prediction of heart disease. This detection model was tested against various classification methods and evaluated against multiple performance measures, such as accuracy, precision, recall and the area under curve (AUC), to identify the best model for heart disease prediction and the benefit of the medical community. The test results show that the proposed method outperforms the existing process regarding the feature selection.
Collapse
|
4
|
Mohammedqasim H, Mohammedqasem R, Ata O, Alyasin EI. Diagnosing Coronary Artery Disease on the Basis of Hard Ensemble Voting Optimization. Medicina (Kaunas) 2022; 58:medicina58121745. [PMID: 36556946 PMCID: PMC9783937 DOI: 10.3390/medicina58121745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 11/30/2022]
Abstract
Background and Objectives: Recently, many studies have focused on the early diagnosis of coronary artery disease (CAD), which is one of the leading causes of cardiac-associated death worldwide. The effectiveness of the most important features influencing disease diagnosis determines the performance of machine learning systems that can allow for timely and accurate treatment. We performed a Hybrid ML framework based on hard ensemble voting optimization (HEVO) to classify patients with CAD using the Z-Alizadeh Sani dataset. All categorical features were converted to numerical forms, the synthetic minority oversampling technique (SMOTE) was employed to overcome imbalanced distribution between two classes in the dataset, and then, recursive feature elimination (RFE) with random forest (RF) was used to obtain the best subset of features. Materials and Methods: After solving the biased distribution in the CAD data set using the SMOTE method and finding the high correlation features that affected the classification of CAD patients. The performance of the proposed model was evaluated using grid search optimization, and the best hyperparameters were identified for developing four applications, namely, RF, AdaBoost, gradient-boosting, and extra trees based on an HEV classifier. Results: Five fold cross-validation experiments with the HEV classifier showed excellent prediction performance results with the 10 best balanced features obtained using SMOTE and feature selection. All evaluation metrics results reached > 98% with the HEV classifier, and the gradient-boosting model was the second best classification model with accuracy = 97% and F1-score = 98%. Conclusions: When compared to modern methods, the proposed method perform well in diagnosing coronary artery disease, and therefore, the proposed method can be used by medical personnel for supplementary therapy for timely, accurate, and efficient identification of CAD cases in suspected patients.
Collapse
|
5
|
Zhang S, Yuan Y, Yao Z, Yang J, Wang X, Tian J. Coronary Artery Disease Detection Model Based on Class Balancing Methods and LightGBM Algorithm. Electronics 2022; 11:1495. [DOI: 10.3390/electronics11091495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Coronary artery disease (CAD) is a disease with high mortality and disability. By 2019, there were 197 million CAD patients in the world. Additionally, the number of disability-adjusted life years (DALYs) owing to CAD reached 182 million. It is widely known that the early and accurate diagnosis of CAD is the most efficient method to reduce the damage of CAD. In medical practice, coronary angiography is considered to be the most reliable basis for CAD diagnosis. However, unfortunately, due to the limitation of inspection equipment and expert resources, many low- and middle-income countries do not have the ability to perform coronary angiography. This has led to a large loss of life and medical burden. Therefore, many researchers expect to realize the accurate diagnosis of CAD based on conventional medical examination data with the help of machine learning and data mining technology. The goal of this study is to propose a model for early, accurate and rapid detection of CAD based on common medical test data. This model took the classical logistic regression algorithm, which is the most commonly used in medical model research as the classifier. The advantages of feature selection and feature combination of tree models were used to solve the problem of manual feature engineering in logical regression. At the same time, in order to solve the class imbalance problem in Z-Alizadeh Sani dataset, five different class balancing methods were applied to balance the dataset. In addition, according to the characteristics of the dataset, we also adopted appropriate preprocessing methods. These methods significantly improved the classification performance of logistic regression classifier in terms of accuracy, recall, precision, F1 score, specificity and AUC when used for CAD detection. The best accuracy, recall, F1 score, precision, specificity and AUC were 94.7%, 94.8%, 94.8%, 95.3%, 94.5% and 0.98, respectively. Experiments and results have confirmed that, according to common medical examination data, our proposed model can accurately identify CAD patients in the early stage of CAD. Our proposed model can be used to help clinicians make diagnostic decisions in clinical practice.
Collapse
|
6
|
Jin Z, Li N. Diagnosis of each main coronary artery stenosis based on whale optimization algorithm and stacking model. Math Biosci Eng 2022; 19:4568-4591. [PMID: 35430828 DOI: 10.3934/mbe.2022211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cardiovascular disease is currently one of the diseases with high morbidity and mortality worldwide. One of the main types is coronary artery disease (CAD), which occurs when one or more of the three main arteries, the left anterior descending (LAD) artery, the left circumflex (LCX) artery, and the right coronary artery (RCA), are narrowed. In this paper, we introduce a computer-aided diagnosis model, which uses the k-nearest neighbor (KNN)-based whale optimization algorithm (WOA) for feature selection and combines stacking model for CAD diagnosis and prediction. In WOA, the values in the solution vectors are all continuous, and a threshold is set for binary-conversion to obtain the optimal feature subsets of each main coronary artery. Then we develop a two-layer stacking model based on the selected feature subsets to diagnosis LAD, LCX and RCA. By the proposed method, we select 17 features for each main artery diagnosis, and the classification accuracy on LAD, LCX, and RCA test sets is 89.68, 88.71 and 85.81%, respectively. On the Z-Alizadeh Sani dataset, we compare the proposed feature selection method with other metaheuristics and compare the performance of WOA based on different wrappers. The experimental results show that, the KNN-based WOA method selects the optimal feature subsets, and the classification performance of the stacking model is better than other machine learning algorithms.
Collapse
Affiliation(s)
- Ziyu Jin
- College of Sciences, Northeastern University, Shenyang 110819, China
| | - Ning Li
- College of Sciences, Northeastern University, Shenyang 110819, China
| |
Collapse
|
7
|
Hassannataj Joloudari J, Azizi F, Nematollahi MA, Alizadehsani R, Hassannatajjeloudari E, Nodehi I, Mosavi A. GSVMA: A Genetic Support Vector Machine ANOVA Method for CAD Diagnosis. Front Cardiovasc Med 2022; 8:760178. [PMID: 35187099 PMCID: PMC8855497 DOI: 10.3389/fcvm.2021.760178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 12/22/2021] [Indexed: 11/13/2022] Open
Abstract
Background Coronary artery disease (CAD) is one of the crucial reasons for cardiovascular mortality in middle-aged people worldwide. The most typical tool is angiography for diagnosing CAD. The challenges of CAD diagnosis using angiography are costly and have side effects. One of the alternative solutions is the use of machine learning-based patterns for CAD diagnosis. Methods Hence, this paper provides a new hybrid machine learning model called genetic support vector machine and analysis of variance (GSVMA). The analysis of variance (ANOVA) is known as the kernel function for the SVM algorithm. The proposed model is performed based on the Z-Alizadeh Sani dataset so that a genetic optimization algorithm is used to select crucial features. In addition, SVM with ANOVA, linear SVM (LSVM), and library for support vector machine (LIBSVM) with radial basis function (RBF) methods were applied to classify the dataset. Results As a result, the GSVMA hybrid method performs better than other methods. This proposed method has the highest accuracy of 89.45% through a 10-fold crossvalidation technique with 31 selected features on the Z-Alizadeh Sani dataset. Conclusion We demonstrated that SVM combined with genetic optimization algorithm could be lead to more accuracy. Therefore, our study confirms that the GSVMA method outperforms other methods so that it can facilitate CAD diagnosis.
Collapse
Affiliation(s)
| | - Faezeh Azizi
- Department of Computer Engineering, Faculty of Engineering, University of Birjand, Birjand, Iran
| | | | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC, Australia
| | - Edris Hassannatajjeloudari
- Department of Nursing, School of Nursing and Allied Medical Sciences, Maragheh Faculty of Medical Sciences, Maragheh, Iran
| | - Issa Nodehi
- Department of Computer Engineering, University of Qom, Qom, Iran
| | - Amir Mosavi
- Faculty of Informatics, Technische Universität Dresden, Dresden, Germany
- Faculty of Civil Engineering, TU-Dresden, Dresden, Germany
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
- Institute of Information Society, University of Public Service, Budapest, Hungary
- Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia
| |
Collapse
|
8
|
Zhang S, Yuan Y, Yao Z, Wang X, Lei Z. Improvement of the Performance of Models for Predicting Coronary Artery Disease Based on XGBoost Algorithm and Feature Processing Technology. Electronics 2022; 11:315. [DOI: 10.3390/electronics11030315] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Coronary artery disease (CAD) is one of the diseases with the highest morbidity and mortality in the world. In 2019, the number of deaths caused by CAD reached 9.14 million. The detection and treatment of CAD in the early stage is crucial to save lives and improve prognosis. Therefore, the purpose of this research is to develop a machine-learning system that can be used to help diagnose CAD accurately in the early stage. In this paper, two classical ensemble learning algorithms, namely, XGBoost algorithm and Random Forest algorithm, were used as the classification model. In order to improve the classification accuracy and performance of the model, we applied four feature processing techniques to process features respectively. In addition, synthetic minority oversampling technology (SMOTE) and adaptive synthetic (ADASYN) were used to balance the dataset, which included 71.29% CAD samples and 28.71% normal samples. The four feature processing technologies improved the performance of the classification models in terms of classification accuracy, precision, recall, F1 score and specificity. In particular, the XGBboost algorithm achieved the best prediction performance results on the dataset processed by feature construction and the SMOTE method. The best classification accuracy, recall, specificity, precision, F1 score and AUC were 94.7%, 96.1%, 93.2%, 93.4%, 94.6% and 98.0%, respectively. The experimental results prove that the proposed method can accurately and reliably identify CAD patients from suspicious patients in the early stage and can be used by medical staff for auxiliary diagnosis.
Collapse
|
9
|
Nejadeh M, Bayat P, Kheirkhah J, Moladoust H. Predicting the response to cardiac resynchronization therapy (CRT) using the deep learning approach. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|