1
|
Taştan M. Machine Learning-Based Calibration and Performance Evaluation of Low-Cost Internet of Things Air Quality Sensors. SENSORS (BASEL, SWITZERLAND) 2025; 25:3183. [PMID: 40431975 PMCID: PMC12115728 DOI: 10.3390/s25103183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2025] [Revised: 04/25/2025] [Accepted: 04/29/2025] [Indexed: 05/29/2025]
Abstract
Low-cost air quality sensors (LCSs) are increasingly being used in environmental monitoring due to their affordability and portability. However, their sensitivity to environmental factors can lead to measurement inaccuracies, necessitating effective calibration methods to enhance their reliability. In this study, an Internet of Things (IoT)-based air quality monitoring system was developed and tested using the most commonly preferred sensor types for air quality measurement: fine particulate matter (PM2.5), carbon dioxide (CO2), temperature, and humidity sensors. To improve sensor accuracy, eight different machine learning (ML) algorithms were applied: Decision Tree (DT), Linear Regression (LR), Random Forest (RF), k-Nearest Neighbors (kNN), AdaBoost (AB), Gradient Boosting (GB), Support Vector Machines (SVM), and Stochastic Gradient Descent (SGD). Sensor performance was evaluated by comparing measurements with a reference device, and the best-performing ML model was determined for each sensor. The results indicate that GB and kNN achieved the highest accuracy. For CO2 sensor calibration, GB achieved R2 = 0.970, RMSE = 0.442, and MAE = 0.282, providing the lowest error rates. For the PM2.5 sensor, kNN delivered the most successful results, with R2 = 0.970, RMSE = 2.123, and MAE = 0.842. Additionally, for temperature and humidity sensors, GB demonstrated the highest accuracy with the lowest error values (R2 = 0.976, RMSE = 2.284). These findings demonstrate that, by identifying suitable ML methods, ML-based calibration techniques can significantly enhance the accuracy of LCSs. Consequently, they offer a viable and cost-effective alternative to traditional high-cost air quality monitoring systems. Future studies should focus on long-term data collection, testing under diverse environmental conditions, and integrating additional sensor types to further advance this field.
Collapse
Affiliation(s)
- Mehmet Taştan
- Department of Electronics and Automation, Manisa Celal Bayar University, 45030 Manisa, Turkey
| |
Collapse
|
2
|
Pant D, Nytrø Ø, Leventhal BL, Clausen C, Koochakpour K, Stien L, Westbye OS, Koposov R, Røst TB, Frodl T, Skokauskas N. Secondary use of health records for prediction, detection, and treatment planning in the clinical decision support system: a systematic review. BMC Med Inform Decis Mak 2025; 25:190. [PMID: 40380138 DOI: 10.1186/s12911-025-03021-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 05/05/2025] [Indexed: 05/19/2025] Open
Abstract
BACKGROUND This study aims to understand how secondary use of health records can be done for prediction, detection, treatment recommendations, and related tasks in clinical decision support systems. METHODS Articles mentioning the secondary use of EHRs for clinical utility, specifically in prediction, detection, treatment recommendations, and related tasks in decision support were reviewed. We extracted study details, methods, tools, technologies, utility, and performance. RESULTS We found that secondary uses of EHRs are primarily retrospective, mostly conducted using records from hospital EHRs, EHR data networks, and warehouses. EHRs vary in type and quality, making it critical to ensure their completeness and quality for clinical utility. Widely used methods include machine learning, statistics, simulation, and analytics. Secondary use of health records can be applied in any area of medicine. The selection of data, cohorts, tools, technology, and methods depends on the specific clinical utility. CONCLUSION The process for secondary use of health records should include three key steps: 1. Validation of the quality of EHRs, 2. Use of methods, tools, and technologies with proactive training, and 3. Multidimensional assessment of the results and their usefulness. TRIAL REGISTRATION PROSPERO registration number CRD42023409582.
Collapse
Affiliation(s)
- Dipendra Pant
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway.
| | - Øystein Nytrø
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway
| | | | - Carolyn Clausen
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Kaban Koochakpour
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Line Stien
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Odd Sverre Westbye
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Child and Adolescent Psychiatry, Clinic of Mental Health Care, St. Olav University Hospital, Trondheim, Norway
| | - Roman Koposov
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU North), UiT The Arctic University of Norway, Tromsø, Norway
| | - Thomas Brox Røst
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Vivit AS, Trondheim, Norway
| | | | - Norbert Skokauskas
- Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway), Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
3
|
Kocamış OG, Çiçekcibaşı AE, Açar G, Ayaş BD, Aydoğdu D. The role of the sacroiliac joint in sex estimation: Analysis of morphometry and variation types using machine learning techniques. Leg Med (Tokyo) 2025; 75:102630. [PMID: 40349523 DOI: 10.1016/j.legalmed.2025.102630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 03/28/2025] [Accepted: 05/03/2025] [Indexed: 05/14/2025]
Abstract
This study aimed to evaluate the potential of machine learning algorithms in sex estimation by going beyond the traditional two-dimensional (2D) measurements of the pelvic bone, predominantly preferred in sex prediction, by including measurement data in three-dimensional (3D) images. Measurements were performed on abdominal multidetector computed tomography (MDCT) images of 152 individuals (77 females, 75 males) aged 18-85. 3D-Slicer software was used for measurements on 2D and 3D images. In 2D images, sacroiliac joint surface area, the angle of the joint surface, the distance between the right and left joints, joint space measurements, and joint variation typing were performed. The distances from the sacroiliac joint to the apex of the contralateral linea terminalis and from the joint to the superior and inferior pubic symphysis were measured on 3D images. It was confirmed that sacroiliac joint space measurements were significantly higher in males than in females. Among the sacroiliac joint variations, 46% in males and 28% in females were the most common standard joint. A strong positive correlation was found between sacroiliac joint distance and the distance of the sacroiliac joint to the contralateral linea terminalis apex and the sacroiliac joint to the superior and inferior pubic symphysis. In this study, the support vector machine algorithm gave the most successful result compared to other algorithms, reaching 88% accuracy in sex estimation. We hope our study will guide the use of artificial intelligence on 3D images for forensic identification, especially in sex estimation.
Collapse
Affiliation(s)
- Orhan Gazi Kocamış
- Department of Anatomy, Faculty of Medicine, Necmettin Erbakan University, Meram, Konya, Turkey.
| | - Aynur Emine Çiçekcibaşı
- Department of Anatomy, Faculty of Medicine, Necmettin Erbakan University, Meram, Konya, Turkey.
| | - Gülay Açar
- Department of Anatomy, Faculty of Medicine, Necmettin Erbakan University, Meram, Konya, Turkey.
| | - Betül Digilli Ayaş
- Department of Anatomy, Faculty of Medicine, Necmettin Erbakan University, Meram, Konya, Turkey.
| | - Demet Aydoğdu
- Department of Radiology, Faculty of Medicine, Necmettin Erbakan University, Meram, Konya, Turkey.
| |
Collapse
|
4
|
Zecconi A, Zaghen F, Meroni G, Sommariva F, Ferrari S, Sora V. Machine Learning Approach for Early Lactation Mastitis Diagnosis Using Total and Differential Somatic Cell Counts. Animals (Basel) 2025; 15:1125. [PMID: 40281959 PMCID: PMC12024274 DOI: 10.3390/ani15081125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2025] [Revised: 04/08/2025] [Accepted: 04/11/2025] [Indexed: 04/29/2025] Open
Abstract
Dairy herds around the world are undergoing several changes. Herd sizes are increasing, as are both milk yield and quality. The implementation of new technologies in various domains of dairy production is leading to an increase in the quantity of data available. This, in turn, creates a need to extract useful information from these data to improve production efficiency. This paper presents the findings of a preliminary study that utilizes a machine learning (ML) approach to assess the accuracy of somatic cell count (SCC) and neutrophils + lymphocytes count/mL (PLCC) in identifying cows at risk of developing intramammary infection (IMI) due to major pathogens. These pathogens (MajPs) include S. aureus, S. agalactiae, S. uberis, and S. dysgalactiae. This study identified these pathogens either by real-time PCR (qPCR) methods or by conventional bacteriology, following the cows' calving process. This study encompassed a total of 424 cows and 1696 quarter milk samples. A comparison of the two methods revealed significant disparities in the prevalence of MajPs, with the qPCR method demonstrating a higher prevalence than conventional bacteriology. However, the prevalence of negative results was comparable, with both methods yielding approximately 71.0% and 72.1%, respectively. The comprehensive results of this study substantiated that all the cellular markers exhibited the most accurate when MajP IMI was diagnosed using quarter milk samples, but this result is mainly due to the very high specificity. The cellular markers exhibited nearly equivalent performance, irrespective of the ML algorithm employed. The findings indicate that approaches based on SCC or PLCC may be useful for identifying healthy cows or quarters. However, it is essential to confirm all "non-negative" results through subsequent analysis within 7-15 days to ensure accuracy. However, further studies are necessary to enhance diagnostic accuracy.
Collapse
Affiliation(s)
- Alfonso Zecconi
- Department of Biomedical, Surgical and Dental Sciences, School of Medicine, University of Milano, Via Pascal 36, 20133 Milan, Italy; (F.Z.); (G.M.); (V.S.)
| | - Francesca Zaghen
- Department of Biomedical, Surgical and Dental Sciences, School of Medicine, University of Milano, Via Pascal 36, 20133 Milan, Italy; (F.Z.); (G.M.); (V.S.)
- Department of Clinical and Community Sciences, School of Medicine, University of Milan, Via Celoria 22, 20133 Milan, Italy
| | - Gabriele Meroni
- Department of Biomedical, Surgical and Dental Sciences, School of Medicine, University of Milano, Via Pascal 36, 20133 Milan, Italy; (F.Z.); (G.M.); (V.S.)
| | - Flavio Sommariva
- Associazione Regionale Allevatori Lombardia, Via Kennedy 30, 26013 Crema, Italy; (F.S.); (S.F.)
| | - Silvio Ferrari
- Associazione Regionale Allevatori Lombardia, Via Kennedy 30, 26013 Crema, Italy; (F.S.); (S.F.)
| | - Valerio Sora
- Department of Biomedical, Surgical and Dental Sciences, School of Medicine, University of Milano, Via Pascal 36, 20133 Milan, Italy; (F.Z.); (G.M.); (V.S.)
- Associazione Regionale Allevatori Lombardia, Via Kennedy 30, 26013 Crema, Italy; (F.S.); (S.F.)
| |
Collapse
|
5
|
Chulde-Fernández B, Enríquez-Ortega D, Guevara C, Navas P, Tirado-Espín A, Vizcaíno-Imacaña P, Villalba-Meneses F, Cadena-Morejon C, Almeida-Galarraga D, Acosta-Vargas P. Classification of Heart Failure Using Machine Learning: A Comparative Study. Life (Basel) 2025; 15:496. [PMID: 40141840 PMCID: PMC11944183 DOI: 10.3390/life15030496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2025] [Revised: 03/08/2025] [Accepted: 03/12/2025] [Indexed: 03/28/2025] Open
Abstract
Several machine learning classification algorithms were evaluated using a dataset focused on heart failure. Results obtained from logistic regression, random forest, decision tree, K-nearest neighbors, and multilayer perceptron (MLP) were compared to obtain the best model. The random forest method obtained specificity = 0.93, AUC = 0.97, and Matthews correlation coefficient (MCC) = 0.83. The accuracy was high; therefore, it was considered the best model. On the other hand, K-nearest neighbors and MLP (multi-layer perceptron) showed lower accuracy rates. These results confirm the effectiveness of the random forest method in identifying heart failure cases. This study underlines that the number of features, feature selection and quality, model type, and hyperparameter fit are also critical in these studies, as well as the importance of using machine learning techniques.
Collapse
Affiliation(s)
- Bryan Chulde-Fernández
- School of Biological Sciences and Engineering, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador; (B.C.-F.); (D.E.-O.); (P.N.); (F.V.-M.); (D.A.-G.)
| | - Denisse Enríquez-Ortega
- School of Biological Sciences and Engineering, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador; (B.C.-F.); (D.E.-O.); (P.N.); (F.V.-M.); (D.A.-G.)
| | - Cesar Guevara
- Quantitative Methods Department, CUNEF University, 28040 Madrid, Spain;
| | - Paulo Navas
- School of Biological Sciences and Engineering, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador; (B.C.-F.); (D.E.-O.); (P.N.); (F.V.-M.); (D.A.-G.)
| | - Andrés Tirado-Espín
- School of Mathematical and Computational Sciences, Universidad Yachay Tech, San Miguel de Urcuquí 100119, Ecuador; (A.T.-E.); (C.C.-M.)
| | - Paulina Vizcaíno-Imacaña
- Faculty of Technical Sciences, School of Computer Science, UIDE-International University of Ecuador, Quito 170501, Ecuador;
| | - Fernando Villalba-Meneses
- School of Biological Sciences and Engineering, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador; (B.C.-F.); (D.E.-O.); (P.N.); (F.V.-M.); (D.A.-G.)
| | - Carolina Cadena-Morejon
- School of Mathematical and Computational Sciences, Universidad Yachay Tech, San Miguel de Urcuquí 100119, Ecuador; (A.T.-E.); (C.C.-M.)
| | - Diego Almeida-Galarraga
- School of Biological Sciences and Engineering, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador; (B.C.-F.); (D.E.-O.); (P.N.); (F.V.-M.); (D.A.-G.)
| | - Patricia Acosta-Vargas
- Intelligent and Interactive Systems Laboratory, Universidad de Las Américas, Quito 170125, Ecuador
| |
Collapse
|
6
|
Alwakid G, Ul Haq F, Tariq N, Humayun M, Shaheen M, Alsadun M. Optimized machine learning framework for cardiovascular disease diagnosis: a novel ethical perspective. BMC Cardiovasc Disord 2025; 25:123. [PMID: 39979842 PMCID: PMC11844188 DOI: 10.1186/s12872-025-04550-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 02/05/2025] [Indexed: 02/22/2025] Open
Abstract
Alignment of advanced cutting-edge technologies such as Artificial Intelligence (AI) has emerged as a significant driving force to achieve greater precision and timeliness in identifying cardiovascular diseases (CVDs). However, it is difficult to achieve high accuracy and reliability in CVD diagnostics due to complex clinical data and the selection and modeling process of useful features. Therefore, this paper studies advanced AI-based feature selection techniques and the application of AI technologies in the CVD classification. It uses methodologies such as Chi-square, Info Gain, Forward Selection, and Backward Elimination as an essence of cardiovascular health indicators into a refined eight-feature subset. This study emphasizes ethical considerations, including transparency, interpretability, and bias mitigation. This is achieved by employing unbiased datasets, fair feature selection techniques, and rigorous validation metrics to ensure fairness and trustworthiness in the AI-based diagnostic process. In addition, the integration of various Machine Learning (ML) models, encompassing Random Forest (RF), XGBoost, Decision Trees (DT), and Logistic Regression (LR), facilitates a comprehensive exploration of predictive performance. Among this diverse range of models, XGBoost stands out as the top performer, achieving exceptional scores with a 99% accuracy rate, 100% recall, 99% F1-measure, and 99% precision. Furthermore, we venture into dimensionality reduction, applying Principal Component Analysis (PCA) to the eight-feature subset, effectively refining it to a compact six-attribute feature subset. Once again, XGBoost shines as the model of choice, yielding outstanding results. It achieves accuracy, recall, F1-measure, and precision scores of 98%, 100%, 98%, and 97%, respectively, when applied to the feature subset derived from the combination of Chi-square and Forward Selection methods.
Collapse
Affiliation(s)
- Ghadah Alwakid
- Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
| | - Farman Ul Haq
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad, Pakistan
| | - Noshina Tariq
- Department of Artificial Intelligence and Data Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan
| | - Mamoona Humayun
- Department of Computing, School of Arts Humanities and Social Sciences, University of Roehampton, London, UK.
| | - Momina Shaheen
- Department of Computing, School of Arts Humanities and Social Sciences, University of Roehampton, London, UK
| | - Marwa Alsadun
- Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia
| |
Collapse
|
7
|
Iacobescu P, Marina V, Anghel C, Anghele AD. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J Cardiovasc Dev Dis 2024; 11:396. [PMID: 39728286 DOI: 10.3390/jcdd11120396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 12/02/2024] [Accepted: 12/07/2024] [Indexed: 12/28/2024] Open
Abstract
Cardiovascular disease (CVD) is a significant global health concern and the leading cause of death in many countries. Early detection and diagnosis of CVD can significantly reduce the risk of complications and mortality. Machine learning methods, particularly classification algorithms, have demonstrated their potential to accurately predict the risk of cardiovascular disease (CVD) by analyzing patient data. This study evaluates seven binary classification algorithms, including Random Forests, Logistic Regression, Naive Bayes, K-Nearest Neighbors (kNN), Support Vector Machines, Gradient Boosting, and Artificial Neural Networks, to understand their effectiveness in predicting CVD. Advanced preprocessing techniques, such as SMOTE-ENN for addressing class imbalance and hyperparameter optimization through Grid Search Cross-Validation, were applied to enhance the reliability and performance of these models. Standard evaluation metrics, including accuracy, precision, recall, F1-score, and Area Under the Receiver Operating Characteristic Curve (ROC-AUC), were used to assess predictive capabilities. The results show that kNN achieved the highest accuracy (99%) and AUC (0.99), surpassing traditional models like Logistic Regression and Gradient Boosting. The study examines the challenges encountered when working with datasets related to cardiovascular diseases, such as class imbalance and feature selection. It demonstrates how addressing these issues enhances the reliability and applicability of predictive models. These findings emphasize the potential of kNN as a reliable tool for early CVD prediction, offering significant improvements over previous studies. This research highlights the value of advanced machine learning techniques in healthcare, addressing key challenges and laying a foundation for future studies aimed at improving predictive models for CVD prevention.
Collapse
Affiliation(s)
- Paul Iacobescu
- Department of Computer Science and Information Technology, "Dunărea de Jos" University of Galati, 800201 Galati, Romania
| | - Virginia Marina
- Medical Department of Occupational Health, Faculty of Medicine and Pharmacy, "Dunărea de Jos" University of Galati, 800201 Galati, Romania
| | - Catalin Anghel
- Department of Computer Science and Information Technology, "Dunărea de Jos" University of Galati, 800201 Galati, Romania
| | | |
Collapse
|
8
|
Vincent ACSR, Sengan S. Edge computing-based ensemble learning model for health care decision systems. Sci Rep 2024; 14:26997. [PMID: 39506092 PMCID: PMC11541999 DOI: 10.1038/s41598-024-78225-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 10/29/2024] [Indexed: 11/08/2024] Open
Abstract
A growing number of humans have suffered severe chronic illnesses, which has caused a boost in the requirement for diagnostic and medical treatment procedures that are both accurate and fast. Improved patient conditions and enhanced Decision-Making Systems (DMS) for healthcare professionals are the primary objectives of the Clinical Decision Support System (CDSS) recommended in this research article. The main drawback of traditional Machine Learning (ML) techniques is their failure to predict reliably. To solve this problem, the proposed model creates an Ensemble Extreme Learning Machine (EN-ELM) algorithm that combines predictors trained on several different data sets. This lowers the chance of overfitting. The suggested CDSS uses many different data processing methods, including Adaptive Synthetic (ADASYN) and isolation Forest (iForest), which fix problems like outliers and class imbalance. This approach significantly enhances the framework's classification performance. Also, the CDSS is compatible with an EC model, which enables real-time computation while minimizing the requirement for integrated systems. The recommended CDSS applies iForest and ADASYN to execute large-scale trials validating high standards of accuracy across numerous datasets. Researchers concluded that a suitable ELM classification threshold of 85% is the most effective, which substantially boosts the accuracy of the predictive model. When applied to various medical datasets, such as Hepatocellular Carcinoma (HCC), Cervical Cancer, Chronic Kidney Disease (CKD), Heart Disease, and Arrhythmia, the EN-ELM achieved accuracy rates of 99.36%, 98.15%, 97.85%, 97.06%, and 96.72%, respectively. By measuring this progress, the CDSS could dramatically improve the accuracy of chronic illness diagnosis and treatment, which similarly affects clinicians.
Collapse
Affiliation(s)
| | - Sudhakar Sengan
- Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu, 627451, India.
| |
Collapse
|
9
|
Bouqentar MA, Terrada O, Hamida S, Saleh S, Lamrani D, Cherradi B, Raihani A. Early heart disease prediction using feature engineering and machine learning algorithms. Heliyon 2024; 10:e38731. [PMID: 39397946 PMCID: PMC11471268 DOI: 10.1016/j.heliyon.2024.e38731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 09/28/2024] [Accepted: 09/28/2024] [Indexed: 10/15/2024] Open
Abstract
Heart disease is one of the most widespread global health issues, it is the reason behind around 32 % of deaths worldwide every year. The early prediction and diagnosis of heart diseases are critical for effective treatment and sickness management. Despite the efforts of healthcare professionals, cardiovascular surgeons and cardiologists' misdiagnosis and misinterpretation of test results may happen every day. This study addresses the growing global health challenge raised by Cardiovascular Diseases (CVDs), which account for 32 % of all deaths worldwide, according to the World Health Organization (WHO). With the progress of Machine Learning (ML) and Deep Learning (DL) techniques as part of Artificial Intelligence (AI), these technologies have become crucial for predicting and diagnosing CVDs. This research aims to develop an ML system for the early prediction of cardiovascular diseases by choosing one of the powerful existing ML algorithms after a deep comparative analysis of several. To achieve this work, the Cleveland and Statlog heart datasets from international platforms are used in this study to evaluate and validate the system's performance. The Cleveland dataset is categorized and used to train various ML algorithms, including decision tree, random forest, support vector machine, logistic regression, adaptive boosting, and K-nearest neighbors. The performance of each algorithm is assessed based on accuracy, precision, recall, F1 score, and the Area Under the Curve metrics. Hyperparameter tuning approaches have been employed to find the best hyperparameters that reflect the optimal performance of the used algorithms based on different evaluation approaches including 10-fold cross-validation with a 95 % confidence interval. The study's findings highlight the potential of ML in improving the early prediction and diagnosis of cardiovascular diseases. By comparing and analyzing the performance of the applied algorithms on both the Cleveland and Statlog heart datasets, this research contributes to the advancement of ML techniques in the medical field. The developed ML system offers a valuable tool for healthcare professionals in the early prediction and diagnosis of cardiovascular diseases, with implications for the prediction and diagnosis of other diseases as well.
Collapse
Affiliation(s)
| | - Oumaima Terrada
- EEIS Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca, Mohammedia, Morocco
| | - Soufiane Hamida
- EEIS Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca, Mohammedia, Morocco
- 2IACS Laboratory, ENSET, University Hassan II of Casablanca, Mohammedia, Morocco
- GENIUS Laboratory, SupMTI of Rabat, Rabat, Morocco
| | - Shawki Saleh
- EEIS Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca, Mohammedia, Morocco
| | - Driss Lamrani
- EEIS Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca, Mohammedia, Morocco
| | - Bouchaib Cherradi
- EEIS Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca, Mohammedia, Morocco
- 2IACS Laboratory, ENSET, University Hassan II of Casablanca, Mohammedia, Morocco
- STIE Team, CRMEF Casablanca-Settat. Provincial Section of El Jadida, El Jadida, 24000, Morocco
| | - Abdelhadi Raihani
- EEIS Laboratory, ENSET of Mohammedia, Hassan II University of Casablanca, Mohammedia, Morocco
| |
Collapse
|
10
|
Sadr H, Salari A, Ashoobi MT, Nazari M. Cardiovascular disease diagnosis: a holistic approach using the integration of machine learning and deep learning models. Eur J Med Res 2024; 29:455. [PMID: 39261891 PMCID: PMC11389500 DOI: 10.1186/s40001-024-02044-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 08/27/2024] [Indexed: 09/13/2024] Open
Abstract
BACKGROUND The incidence and mortality rates of cardiovascular disease worldwide are a major concern in the healthcare industry. Precise prediction of cardiovascular disease is essential, and the use of machine learning and deep learning can aid in decision-making and enhance predictive abilities. OBJECTIVE The goal of this paper is to introduce a model for precise cardiovascular disease prediction by combining machine learning and deep learning. METHOD Two public heart disease classification datasets with 70,000 and 1190 records besides a locally collected dataset with 600 records were used in our experiments. Then, a model which makes use of both machine learning and deep learning was proposed in this paper. The proposed model employed CNN and LSTM, as the representatives of deep learning models, besides KNN and XGB, as the representatives of machine learning models. As each classifier defined the output classes, majority voting was then used as an ensemble learner to predict the final output class. RESULT The proposed model obtained the highest classification performance based on all evaluation metrics on all datasets, demonstrating its suitability and reliability in forecasting the probability of cardiovascular disease.
Collapse
Affiliation(s)
- Hossein Sadr
- Department of Health Informatics and Intelligent Systems, Guilan Road Trauma Research Center, Trauma Institute, Guilan University of Medical Sciences, Rasht, Iran.
| | - Arsalan Salari
- Cardiovascular Disease Research Center, Department of Cardiology, School of Medicine, Heshmat Hospital, Guilan University of Medical Sciences, Rasht, Iran
| | - Mohammad Taghi Ashoobi
- Department of Surgery, School of Medicine, Razi Hospital, Guilan University of Medical Sciences, Rasht, Iran
| | - Mojdeh Nazari
- Cardiovascular Disease Research Center, Department of Cardiology, School of Medicine, Heshmat Hospital, Guilan University of Medical Sciences, Rasht, Iran.
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
11
|
Kumar S, Gola KK, Jee N, Singh BM. Optimized feature fusion-based modified cascaded kernel extreme learning machine for heart disease prediction in E-healthcare. Comput Methods Biomech Biomed Engin 2024; 27:980-993. [PMID: 37272059 DOI: 10.1080/10255842.2023.2218520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 05/19/2023] [Indexed: 06/06/2023]
Abstract
In recent years, medical technological innovators have focused on diverse clinical therapies to find innovative ways to overcome clinical challenges. But still, there emerge certain drawbacks like high computational cost, increased error, less training ability, the requirement of high storage space and degraded accuracy. To conquer these drawbacks, the proposed research article presents an innovative cascaded extreme learning machine for effective heart disease (HD) prediction. Missing data filtering and normalization methods are carried out for data pre-processing. From the pre-processed data, the features are extracted using the Framingham risk factor extraction module, whereas the extracted features are fused to generate a feature vector. The most significant features are selected using Rhino Satin Herd optimization algorithm. Using a linear weight assignment approach, the feature weighting process is undertaken by allocating higher weights to significant features and less weight to unwanted features. Finally, classification is performed through the Cascaded kernel soft plus extreme learning machine with a stacked autoencoder model. The performance is analyzed using PYTHON to evaluate the superiority of the proposed model. The proposed model obtained an overall accuracy of 90%, precision of 94%, recall of 91.3% and F1 measure of 92.6% in the Cleveland-Hungarian dataset, which is comparatively superior to the existing methods. An accuracy of 92.6% is attained for predicting HD in terms of the heart patient dataset. The proposed model attains better performance because of effective accuracy outcome, reduced overfitting issues, fewer error rates, better convergence and training ability.
Collapse
Affiliation(s)
- Sumit Kumar
- COER University, Roorkee, Uttarakhand, 247667, India
| | | | - Narayan Jee
- COER University, Roorkee, Uttarakhand, 247667, India
| | | |
Collapse
|
12
|
Saputra J, Lawrencya C, Saini JM, Suharjito S. Hyperparameter optimization for cardiovascular disease data-driven prognostic system. Vis Comput Ind Biomed Art 2023; 6:16. [PMID: 37524951 PMCID: PMC10390457 DOI: 10.1186/s42492-023-00143-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/04/2023] [Indexed: 08/02/2023] Open
Abstract
Prediction and diagnosis of cardiovascular diseases (CVDs) based, among other things, on medical examinations and patient symptoms are the biggest challenges in medicine. About 17.9 million people die from CVDs annually, accounting for 31% of all deaths worldwide. With a timely prognosis and thorough consideration of the patient's medical history and lifestyle, it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease. In this study, we used various patient datasets from a major hospital in the United States as prognostic factors for CVD. The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old. In this study, we present a data mining modeling approach to analyze the performance, classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning (ML) using the Orange data mining software. Various techniques are then used to classify the model parameters, such as k-nearest neighbors, support vector machine, random forest, artificial neural network (ANN), naïve bayes, logistic regression, stochastic gradient descent (SGD), and AdaBoost. To determine the number of clusters, various unsupervised ML clustering methods were used, such as k-means, hierarchical, and density-based spatial clustering of applications with noise clustering. The results showed that the best model performance analysis and classification accuracy were SGD and ANN, both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets. Based on the results of most clustering methods, such as k-means and hierarchical clustering, Cardiovascular Disease Prognostic datasets can be divided into two clusters. The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model. The more accurate the model, the better it can predict which patients are at risk for CVD.
Collapse
Affiliation(s)
- Jayson Saputra
- Industrial Engineering Department, BINUS Graduate Program - Master of Industrial Engineering, Bina Nusantara University, Jakarta 11480, Indonesia
| | - Cindy Lawrencya
- Industrial Engineering Department, BINUS Graduate Program - Master of Industrial Engineering, Bina Nusantara University, Jakarta 11480, Indonesia
| | - Jecky Mitra Saini
- Industrial Engineering Department, BINUS Graduate Program - Master of Industrial Engineering, Bina Nusantara University, Jakarta 11480, Indonesia
| | - Suharjito Suharjito
- Industrial Engineering Department, BINUS Graduate Program - Master of Industrial Engineering, Bina Nusantara University, Jakarta 11480, Indonesia
| |
Collapse
|
13
|
An intelligent heart disease prediction system using hybrid deep dense Aquila network. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
14
|
Ay Ş, Ekinci E, Garip Z. A comparative analysis of meta-heuristic optimization algorithms for feature selection on ML-based classification of heart-related diseases. THE JOURNAL OF SUPERCOMPUTING 2023; 79:11797-11826. [PMID: 37304052 PMCID: PMC9983547 DOI: 10.1007/s11227-023-05132-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/21/2023] [Indexed: 06/13/2023]
Abstract
This study aims to use a machine learning (ML)-based enhanced diagnosis and survival model to predict heart disease and survival in heart failure by combining the cuckoo search (CS), flower pollination algorithm (FPA), whale optimization algorithm (WOA), and Harris hawks optimization (HHO) algorithms, which are meta-heuristic feature selection algorithms. To achieve this, experiments are conducted on the Cleveland heart disease dataset and the heart failure dataset collected from the Faisalabad Institute of Cardiology published at UCI. CS, FPA, WOA, and HHO algorithms for feature selection are applied for different population sizes and are realized based on the best fitness values. For the original dataset of heart disease, the maximum prediction F-score of 88% is obtained using K-nearest neighbour (KNN) when compared to logistic regression (LR), support vector machine (SVM), Gaussian Naive Bayes (GNB), and random forest (RF). With the proposed approach, the heart disease prediction F-score of 99.72% is obtained using KNN for population sizes 60 with FPA by selecting eight features. For the original dataset of heart failure, the maximum prediction F-score of 70% is obtained using LR and RF compared to SVM, GNB, and KNN. With the proposed approach, the heart failure prediction F-score of 97.45% is obtained using KNN for population sizes 10 with HHO by selecting five features. Experimental findings show that the applied meta-heuristic algorithms with ML algorithms significantly improve prediction performances compared to performances obtained from the original datasets. The motivation of this paper is to select the most critical and informative feature subset through meta-heuristic algorithms to improve classification accuracy.
Collapse
Affiliation(s)
- Şevket Ay
- Computer Engineering Department, Faculty of Technology, Sakarya University of Applied Sciences, Sakarya, 54187 Turkey
| | - Ekin Ekinci
- Computer Engineering Department, Faculty of Technology, Sakarya University of Applied Sciences, Sakarya, 54187 Turkey
| | - Zeynep Garip
- Computer Engineering Department, Faculty of Technology, Sakarya University of Applied Sciences, Sakarya, 54187 Turkey
| |
Collapse
|
15
|
Mahoto NA, Shaikh A, Sulaiman A, Reshan MSA, Rajab A, Rajab K. A machine learning based data modeling for medical diagnosis. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
16
|
Erdaş ÇB, Sümer E, Kibaroğlu S. Neurodegenerative diseases detection and grading using gait dynamics. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:22925-22942. [PMID: 36846529 PMCID: PMC9938350 DOI: 10.1007/s11042-023-14461-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 04/27/2022] [Accepted: 01/31/2023] [Indexed: 06/01/2023]
Abstract
Detection of neurodegenerative diseases such as Parkinson's disease, Huntington's disease, Amyotrophic Lateral Sclerosis, and grading of these diseases' severity have high clinical significance. These tasks based on walking analysis stand out compared to other methods due to their simplicity and non-invasiveness. This study has emerged to realize an artificial intelligence-based disease detection and severity prediction system for neurodegenerative diseases using gait features obtained from gait signals. For the detection of the disease, the problem is divided into parts which are subgroups of 4 classes consisting of Parkinson's, Huntington's, Amyotrophic Lateral Sclerosis diseases, and the control group. In addition, the disease vs. control subgroup where all diseases are collected under a single label, the subgroups where each disease is separately against the control group. For disease severity grading, each disease was divided into subgroups and a solution was sought for the prediction problem mentioned by various machine and deep learning methods separately for each group. In this context, the resulting detection performance was measured by the metrics of Accuracy, F1 Score, Precision, and Recall while the resulting prediction performance was measured by the metrics such as R, R2, MAE, MedAE, MSE, and RMSE.
Collapse
Affiliation(s)
- Çağatay Berke Erdaş
- Department of Computer Engineering, Faculty of Engineering, Başkent University, Ankara, Turkey
| | - Emre Sümer
- Department of Computer Engineering, Faculty of Engineering, Başkent University, Ankara, Turkey
| | - Seda Kibaroğlu
- Department of Neurology, Faculty of Medicine, Başkent University, Ankara, Turkey
| |
Collapse
|
17
|
Cesarelli G, Donisi L, Amato F, Romano M, Cesarelli M, D'Addio G, Ponsiglione AM, Ricciardi C. Using Features Extracted From Upper Limb Reaching Tasks to Detect Parkinson's Disease by Means of Machine Learning Models. IEEE Trans Neural Syst Rehabil Eng 2023; 31:1056-1063. [PMID: 37021918 DOI: 10.1109/tnsre.2023.3236834] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
While in the literature there is much interest in investigating lower limbs gait of patients affected by neurological diseases, such as Parkinson's Disease (PD), fewer publications involving upper limbs movements are available. In previous studies, 24 motion signals (the so-called reaching tasks) of the upper limbs of PD patients and Healthy Controls (HCs) were used to extract several kinematic features through a custom-made software; conversely, the aim of our paper is to investigate the possibility to build models-using these features-for distinguishing PD patients from HCs. First, a binary logistic regression and, then, a Machine Learning (ML) analysis was performed by implementing five algorithms through the Knime Analytics Platform. The ML analysis was performed twice: first, a leave-one out-cross validation was applied; then, a wrapper feature selection method was implemented to identify the best subset of features that could maximize the accuracy. The binary logistic regression achieved an accuracy of 90.5%, demonstrating the importance of the maximum jerk during subjects upper limb motion; the Hosmer-Lemeshow test supported the validity of this model (p-value=0.408). The first ML analysis achieved high evaluation metrics by overcoming 95% of accuracy; the second ML analysis achieved a perfect classification with 100% of both accuracy and area under the curve receiver operating characteristics. The top-five features in terms of importance were the maximum acceleration, smoothness, duration, maximum jerk and kurtosis. The investigation carried out in our work has proved the predictive power of the features, extracted from the reaching tasks involving the upper limbs, to distinguish HCs and PD patients.
Collapse
|
18
|
Taha MA, Alsaidi SAAA, Hussein RA. Machine Learning Techniques for Predicting Heart Diseases. 2022 INTERNATIONAL SYMPOSIUM ON INNOVATIVE INFORMATICS OF BISKRA (ISNIB) 2022. [DOI: 10.1109/isnib57382.2022.10076238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Affiliation(s)
- Mohammed A. Taha
- Ministry of Education, Babylon Education Directorates,Babylon,Iraq
| | | | - Reem Ali Hussein
- Univirsity of Technology,Laser and Optoelectronics Eng. Dep.,Baghdad,Iraq
| |
Collapse
|
19
|
|
20
|
Mohammedqasim H, Mohammedqasem R, Ata O, Alyasin EI. Diagnosing Coronary Artery Disease on the Basis of Hard Ensemble Voting Optimization. MEDICINA (KAUNAS, LITHUANIA) 2022; 58:medicina58121745. [PMID: 36556946 PMCID: PMC9783937 DOI: 10.3390/medicina58121745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 11/30/2022]
Abstract
Background and Objectives: Recently, many studies have focused on the early diagnosis of coronary artery disease (CAD), which is one of the leading causes of cardiac-associated death worldwide. The effectiveness of the most important features influencing disease diagnosis determines the performance of machine learning systems that can allow for timely and accurate treatment. We performed a Hybrid ML framework based on hard ensemble voting optimization (HEVO) to classify patients with CAD using the Z-Alizadeh Sani dataset. All categorical features were converted to numerical forms, the synthetic minority oversampling technique (SMOTE) was employed to overcome imbalanced distribution between two classes in the dataset, and then, recursive feature elimination (RFE) with random forest (RF) was used to obtain the best subset of features. Materials and Methods: After solving the biased distribution in the CAD data set using the SMOTE method and finding the high correlation features that affected the classification of CAD patients. The performance of the proposed model was evaluated using grid search optimization, and the best hyperparameters were identified for developing four applications, namely, RF, AdaBoost, gradient-boosting, and extra trees based on an HEV classifier. Results: Five fold cross-validation experiments with the HEV classifier showed excellent prediction performance results with the 10 best balanced features obtained using SMOTE and feature selection. All evaluation metrics results reached > 98% with the HEV classifier, and the gradient-boosting model was the second best classification model with accuracy = 97% and F1-score = 98%. Conclusions: When compared to modern methods, the proposed method perform well in diagnosing coronary artery disease, and therefore, the proposed method can be used by medical personnel for supplementary therapy for timely, accurate, and efficient identification of CAD cases in suspected patients.
Collapse
|
21
|
Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.102042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
22
|
Russo M, Amboni M, Volzone A, Ricciardelli G, Cesarelli G, Ponsiglione AM, Barone P, Romano M, Ricciardi C. Interplay between gait and neuropsychiatric symptoms in Parkinson’s Disease. Eur J Transl Myol 2022; 32. [PMID: 35678506 PMCID: PMC9295172 DOI: 10.4081/ejtm.2022.10463] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/05/2022] [Indexed: 12/13/2022] Open
Abstract
Parkinson’s Disease (PD) is a neurodegenerative disease which involves both motor and non-motor symptoms. Non-motor mental symptoms are very common among patients with PD since the earliest stage. In this context, gait analysis allows to detect quantitative gait variables to distinguish patients affected by non-motor mental symptoms from patients without these symptoms. A cohort of 68 PD subjects (divided in two groups) was acquired through gait analysis (single and double task) and spatial temporal parameters were analysed; first with a statistical analysis and then with a machine learning (ML) approach. Single-task variables showed that 9 out of 16 spatial temporal features were statistically significant for the univariate statistical analysis (p-value< 0.05). Indeed, a statistically significant difference was found in stance phase (p-value=0.032), swing phase (p-value=0.042) and cycle length (p-value=0.03) of the dual task. The ML results confirmed the statistical analysis, in particular, the Decision Tree classifier showed the highest accuracy (80.9%) and also the highest scores in terms of specificity and precision. Our findings indicate that patients with non-motor mental symptoms display a worse gait pattern, mainly dominated by increased slowness and dynamic instability.
Collapse
|
23
|
Bidimensional and Tridimensional Poincaré Maps in Cardiology: A Multiclass Machine Learning Study. ELECTRONICS 2022. [DOI: 10.3390/electronics11030448] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Heart rate is a nonstationary signal and its variation may contain indicators of current disease or warnings about impending cardiac diseases. Hence, heart rate variation analysis has become a noninvasive tool to further study the activities of the autonomic nervous system. In this scenario, the Poincaré plot analysis has proven to be a valuable tool to support cardiac diseases diagnosis. The study’s aim is a preliminary exploration of the feasibility of machine learning to classify subjects belonging to five cardiac states (healthy, hypertension, myocardial infarction, congestive heart failure and heart transplanted) using ten unconventional quantitative parameters extracted from bidimensional and three-dimensional Poincaré maps. Knime Analytic Platform was used to implement several machine learning algorithms: Gradient Boosting, Adaptive Boosting, k-Nearest Neighbor and Naïve Bayes. Accuracy, sensitivity and specificity were computed to assess the performances of the predictive models using the leave-one-out cross-validation. The Synthetic Minority Oversampling technique was previously performed for data augmentation considering the small size of the dataset and the number of features. A feature importance, ranked on the basis of the Information Gain values, was computed. Preliminarily, a univariate statistical analysis was performed through one-way Kruskal Wallis plus post-hoc for all the features. Machine learning analysis achieved interesting results in terms of evaluation metrics, such as demonstrated by Adaptive Boosting and k-Nearest Neighbor (accuracies greater than 90%). Gradient Boosting and k-Nearest Neighbor reached even 100% score in sensitivity and specificity, respectively. The most important features according to information gain are in line with the results obtained from the statistical analysis confirming their predictive power. The study shows the proposed combination of unconventional features extracted from Poincaré maps and well-known machine learning algorithms represents a valuable approach to automatically classify patients with different cardiac diseases. Future investigations on enriched datasets will further confirm the potential application of this methodology in diagnostic.
Collapse
|
24
|
Classification Comparison of Machine Learning Algorithms Using Two Independent CAD Datasets. MATHEMATICS 2022. [DOI: 10.3390/math10030311] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the last few decades, statistical methods and machine learning (ML) algorithms have become efficient in medical decision-making. Coronary artery disease (CAD) is a common type of cardiovascular disease that causes many deaths each year. In this study, two CAD datasets from different countries (TRNC and Iran) are tested to understand the classification efficiency of different supervised machine learning algorithms. The Z-Alizadeh Sani dataset contained 303 individuals (216 patient, 87 control), while the Near East University (NEU) Hospital dataset contained 475 individuals (305 patients, 170 control). This study was conducted in three stages: (1) Each dataset, as well as their merged version, was subject to review separately with a random sampling method to obtain train-test subsets. (2) The NEU Hospital dataset was assigned as the training data, while the Z-Alizadeh Sani dataset was the test data. (3) The Z-Alizadeh Sani dataset was assigned as the training data, while the NEU hospital dataset was the test data. Among all ML algorithms, the Random Forest showed successful results for its classification performance at each stage. The least successful ML method was kNN which underperformed at all pitches. Other methods, including logistic regression, have varying classification performances at every step.
Collapse
|
25
|
Nadakinamani RG, Reyana A, Kautish S, Vibith AS, Gupta Y, Abdelwahab SF, Mohamed AW. Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2973324. [PMID: 35069715 PMCID: PMC8767405 DOI: 10.1155/2022/2973324] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 12/03/2021] [Accepted: 12/15/2021] [Indexed: 02/08/2023]
Abstract
Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry's clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS's performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.
Collapse
Affiliation(s)
| | - A. Reyana
- Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology, Coimbatore, Tamil Nadu, India
| | - Sandeep Kautish
- Department of Computer Science and Engineering, LBEF Campus, Kathmandu, Nepal, India
| | - A. S. Vibith
- Department of Computer Science and Engineering, RMK College of Engineering and Technology, Tiruvallur, Tamil Nadu, India
| | - Yogita Gupta
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, India
| | - Sayed F. Abdelwahab
- Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, PO Box 11099, Taif 21944, Saudi Arabia
| | - Ali Wagdy Mohamed
- Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
- Department of Mathematics and Actuarial Science, School of Science and Engineering, The American University in Cairo, New Cairo, Egypt
| |
Collapse
|
26
|
Okagbue HI, Nzeadibe CA, Teixeira da Silva JA. Predicting access mode of multidisciplinary and library and information sciences journals using machine learning. COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT 2022. [DOI: 10.1080/09737766.2021.2009745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- Hilary I. Okagbue
- Department of Mathematics College of Science and Technology, Covenant University, KM 10 Idiroko Rd 112104, Ota, Nigeria
| | | | | |
Collapse
|
27
|
Abdollahi J, Nouri-Moghaddam B. A hybrid method for heart disease diagnosis utilizing feature selection based ensemble classifier model generation. IRAN JOURNAL OF COMPUTER SCIENCE 2022; 5:229-246. [PMCID: PMC9081959 DOI: 10.1007/s42044-022-00104-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Accepted: 04/19/2022] [Indexed: 09/29/2023]
Abstract
Heart disease is one of the most complicated diseases, and it affects a large number of individuals throughout the world. In healthcare, particularly cardiology, early and accurate detection of cardiac disease is critical. The Heart Disease Data Set-UCI repository collects data on heart disease. The search space and complexity of the classification models are increased by this raw dataset, which contains redundant and inconsistent data. We need to eliminate the redundant and unnecessary elements from the data to improve classification accuracy. As a consequence, feature selection approaches might be useful for reducing the cost of diagnosis by identifying the most important qualities. This research developed an ensemble classification model based on a feature selection approach in which selected features play a role in classification. Accordingly, a classification approach was introduced using ensemble learning with a genetic algorithm, feature selection, and biomedical test values to diagnose heart disease. Based on the results, it is deduced that the benefits of using the feature selection method vary depending on the utilized machine learning technique. However, the best-proposed model based on the combination of genetic algorithm and the ensemble learning model has achieved an accuracy of 97.57% on the considered datasets. The suggested diagnosis system achieved better accuracy than previously proposed methods and can easily be implemented in healthcare to identify heart disease.
Collapse
Affiliation(s)
- Jafar Abdollahi
- Department of Computer Engineering, Ardabil Branch, Islamic Azad University, Ardabil, Iran
| | - Babak Nouri-Moghaddam
- Department of Computer Engineering, Ardabil Branch, Islamic Azad University, Ardabil, Iran
| |
Collapse
|
28
|
Computational Learning Model for Prediction of Heart Disease Using Machine Learning Based on a New Regularizer. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8628335. [PMID: 34804150 PMCID: PMC8601816 DOI: 10.1155/2021/8628335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 10/25/2021] [Indexed: 11/17/2022]
Abstract
Heart diseases are characterized as heterogeneous diseases comprising multiple subtypes. Early diagnosis and prognosis of heart disease are essential to facilitate the clinical management of patients. In this research, a new computational model for predicting early heart disease is proposed. The predictive model is embedded in a new regularization based on decaying the weights according to the weight matrices' standard deviation and comparing the results against its parents (RSD-ANN). The performance of RSD-ANN is far better than that of the existing methods. Based on our experiments, the average validation accuracy computed was 96.30% using either the tenfold cross-validation or holdout method.
Collapse
|
29
|
Velswamy K, Velswamy R, Swamidason ITJ, Chinnaiyan S. Classification model for heart disease prediction with feature selection through modified bee algorithm. Soft comput 2021. [DOI: 10.1007/s00500-021-06330-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
30
|
Caruso M, Ricciardi C, Delli Paoli G, Di Dato F, Donisi L, Romeo V, Petretta M, Iorio R, Cesarelli G, Brunetti A, Maurea S. Machine Learning Evaluation of Biliary Atresia Patients to Predict Long-Term Outcome after the Kasai Procedure. Bioengineering (Basel) 2021; 8:152. [PMID: 34821718 PMCID: PMC8615125 DOI: 10.3390/bioengineering8110152] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/16/2021] [Accepted: 10/19/2021] [Indexed: 11/17/2022] Open
Abstract
Kasai portoenterostomy (KP) represents the first-line treatment for biliary atresia (BA). The purpose was to compare the accuracy of quantitative parameters extracted from laboratory tests, US imaging, and MR imaging studies using machine learning (ML) algorithms to predict the long-term medical outcome in native liver survivor BA patients after KP. Twenty-four patients were evaluated according to clinical and laboratory data at initial evaluation (median follow-up = 9.7 years) after KP as having ideal (n = 15) or non-ideal (n = 9) medical outcomes. Patients were re-evaluated after an additional 4 years and classified in group 1 (n = 12) as stable and group 2 (n = 12) as non-stable in the disease course. Laboratory and quantitative imaging parameters were merged to test ML algorithms. Total and direct bilirubin (TB and DB), as laboratory parameters, and US stiffness, as an imaging parameter, were the only statistically significant parameters between the groups. The best algorithm in terms of accuracy, sensitivity, specificity, and AUCROC was naive Bayes algorithm, selecting only laboratory parameters (TB and DB). This preliminary ML analysis confirms the fundamental role of TB and DB values in predicting the long-term medical outcome for BA patients after KP, even though their values may be within the normal range. Physicians should be alert when TB and DB values change slightly.
Collapse
Affiliation(s)
- Martina Caruso
- Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (G.D.P.); (L.D.); (V.R.); (A.B.); (S.M.)
| | - Carlo Ricciardi
- Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, 80125 Naples, Italy;
- Bioengineering Unit, Institute of Care and Scientific Research Maugeri, 82037 Telese Terme, Italy;
| | - Gregorio Delli Paoli
- Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (G.D.P.); (L.D.); (V.R.); (A.B.); (S.M.)
| | - Fabiola Di Dato
- Department of Translational Medical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (F.D.D.); (M.P.); (R.I.)
| | - Leandro Donisi
- Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (G.D.P.); (L.D.); (V.R.); (A.B.); (S.M.)
- Bioengineering Unit, Institute of Care and Scientific Research Maugeri, 82037 Telese Terme, Italy;
| | - Valeria Romeo
- Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (G.D.P.); (L.D.); (V.R.); (A.B.); (S.M.)
| | - Mario Petretta
- Department of Translational Medical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (F.D.D.); (M.P.); (R.I.)
| | - Raffaele Iorio
- Department of Translational Medical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (F.D.D.); (M.P.); (R.I.)
| | - Giuseppe Cesarelli
- Bioengineering Unit, Institute of Care and Scientific Research Maugeri, 82037 Telese Terme, Italy;
- Department of Chemical, Materials and Production Engineering, University of Naples “Federico II”, 80125 Naples, Italy
| | - Arturo Brunetti
- Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (G.D.P.); (L.D.); (V.R.); (A.B.); (S.M.)
| | - Simone Maurea
- Department of Advanced Biomedical Sciences, University of Naples “Federico II”, 80131 Naples, Italy; (G.D.P.); (L.D.); (V.R.); (A.B.); (S.M.)
| |
Collapse
|
31
|
Extracting Features from Poincaré Plots to Distinguish Congestive Heart Failure Patients According to NYHA Classes. Bioengineering (Basel) 2021; 8:bioengineering8100138. [PMID: 34677211 PMCID: PMC8533203 DOI: 10.3390/bioengineering8100138] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 11/30/2022] Open
Abstract
Heart-rate variability has proved a valid tool in prognosis definition of patients with congestive heart failure (CHF). Previous research has documented Poincaré plot analysis as a valuable approach to study heart-rate variability performance among different subjects. In this paper, we explored the possibility to feed machine-learning (ML) algorithms using unconventional quantitative parameters extracted from Poincaré plots (generated from 24-h electrocardiogram recordings) to classify patients with CHF belonging to different New York Heart Association (NYHA) classes. We performed in sequence the following investigations: first, a statistical analysis was carried out on 9 morphological parameters, automatically measured from Poincaré plots. Subsequently, a feature selection through a wrapper with a 10-fold cross-validation method was performed to find the best subset of features which maximized the classification accuracy for each considered ML algorithm. Finally, patient classification was assessed through a ML analysis using AdaBoost of Decision Tree, k-Nearest Neighbors and Naive Bayes algorithms. A univariate statistical analysis proved 5 out of 9 parameters presented statistically significant differences among patients of distinct NYHA classes; similarly, a multivariate logistic regression confirmed the importance of the parameter ρy in the separability between low-risk and high-risk classes. The ML analysis achieved promising results in terms of evaluation metrics (especially the Naive Bayes algorithm), with accuracies greater than 80% and Area Under the Receiver Operating Curve indices greater than 0.7 for the overall three algorithms. The study indicates the proposed features have a predictive power to discriminate the NYHA classes, to which the features seem evenly correlated. Despite the NYHA classification being subjective and easily recognized by cardiologists, the potential relevance in the clinical cardiology of the proposed features and the promising ML results implies the methodology could be a valuable approach to automatically classify CHF. Future investigations on enriched datasets may further confirm the presented evidence.
Collapse
|
32
|
Reddy KVV, Elamvazuthi I, Aziz AA, Paramasivam S, Chua HN, Pranavanand S. Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators. APPLIED SCIENCES 2021; 11:8352. [DOI: 10.3390/app11188352] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Cardiovascular diseases (CVDs) kill about 20.5 million people every year. Early prediction can help people to change their lifestyles and to ensure proper medical treatment if necessary. In this research, ten machine learning (ML) classifiers from different categories, such as Bayes, functions, lazy, meta, rules, and trees, were trained for efficient heart disease risk prediction using the full set of attributes of the Cleveland heart dataset and the optimal attribute sets obtained from three attribute evaluators. The performance of the algorithms was appraised using a 10-fold cross-validation testing option. Finally, we performed tuning of the hyperparameter number of nearest neighbors, namely, ‘k’ in the instance-based (IBk) classifier. The sequential minimal optimization (SMO) achieved an accuracy of 85.148% using the full set of attributes and 86.468% was the highest accuracy value using the optimal attribute set obtained from the chi-squared attribute evaluator. Meanwhile, the meta classifier bagging with logistic regression (LR) provided the highest ROC area of 0.91 using both the full and optimal attribute sets obtained from the ReliefF attribute evaluator. Overall, the SMO classifier stood as the best prediction method compared to other techniques, and IBk achieved an 8.25% accuracy improvement by tuning the hyperparameter ‘k’ to 9 with the chi-squared attribute set.
Collapse
|
33
|
Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6662420. [PMID: 34055041 PMCID: PMC8149240 DOI: 10.1155/2021/6662420] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 04/10/2021] [Accepted: 04/23/2021] [Indexed: 11/23/2022]
Abstract
A computer-aided diagnosis (CAD) system that employs a super learner to diagnose the presence or absence of a disease has been developed. Each clinical dataset is preprocessed and split into training set (60%) and testing set (40%). A wrapper approach that uses three bioinspired algorithms, namely, cat swarm optimization (CSO), krill herd (KH) ,and bacterial foraging optimization (BFO) with the classification accuracy of support vector machine (SVM) as the fitness function has been used for feature selection. The selected features of each bioinspired algorithm are stored in three separate databases. The features selected by each bioinspired algorithm are used to train three back propagation neural networks (BPNN) independently using the conjugate gradient algorithm (CGA). Classifier testing is performed by using the testing set on each trained classifier, and the diagnostic results obtained are used to evaluate the performance of each classifier. The classification results obtained for each instance of the testing set of the three classifiers and the class label associated with each instance of the testing set will be the candidate instances for training and testing the super learner. The training set comprises of 80% of the instances, and the testing set comprises of 20% of the instances. Experimentation has been carried out using seven clinical datasets from the University of California Irvine (UCI) machine learning repository. The super learner has achieved a classification accuracy of 96.83% for Wisconsin diagnostic breast cancer dataset (WDBC), 86.36% for Statlog heart disease dataset (SHD), 94.74% for hepatocellular carcinoma dataset (HCC), 90.48% for hepatitis dataset (HD), 81.82% for vertebral column dataset (VCD), 84% for Cleveland heart disease dataset (CHD), and 70% for Indian liver patient dataset (ILP).
Collapse
|
34
|
Jothi Prakash V, Karthikeyan NK. Enhanced Evolutionary Feature Selection and Ensemble Method for Cardiovascular Disease Prediction. Interdiscip Sci 2021; 13:389-412. [PMID: 33988832 DOI: 10.1007/s12539-021-00430-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 04/01/2021] [Accepted: 04/09/2021] [Indexed: 11/26/2022]
Abstract
Cardiovascular Disease (CVD) is one among the main factors for the increase in mortality rate worldwide. The analysis and prediction of this disease is yet a highly formidable task in medical data analysis. Recent advancements in technology such as Big Data, Artificial Intelligence and the need for automated models have paved the way for developing a more reliable and efficient model for predicting heart disease. Several researches have been carried out in predicting heart diseases but the focus on choosing the important attributes that play a significant role in predicting CVD is inadequate. Hence the choice of right features for the classification and the diagnosis of the heart disease is important. The core aim of this work is to identify and select the important features and machine learning methodologies that can enhance the prediction capability of the classification models for accurately predicting CVD. The results show that the proposed enhanced evolutionary feature selection with the hybrid ensemble model outperforms the existing approaches in terms of precision, recall and accuracy. The experimental outcomes show that the proposed approach attains the maximum classification accuracy of 93.65% for statlog dataset, 82.81% for SPECTF dataset and 84.95% for coronary heart disease dataset. The proposed classification model performance is demonstrated using ROC curve against state-of-the-art methods in machine learning.
Collapse
Affiliation(s)
- V Jothi Prakash
- Department of Information Technology, Karpagam College of Engineering, Coimbatore, Tamil Nadu, India.
| | - N K Karthikeyan
- Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India
| |
Collapse
|
35
|
Ricciardi C, Cuocolo R, Megna R, Cesarelli M, Petretta M. Machine learning analysis: general features, requirements and cardiovascular applications. Minerva Cardiol Angiol 2021; 70:67-74. [PMID: 33944533 DOI: 10.23736/s2724-5683.21.05637-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Artificial intelligence represents the science which will probably change the future of medicine by solving actually challenging issues. In this special article, the general features of machine learning are discussed. First, a background explanation regarding the division of artificial intelligence, machine learning and deep learning is given and a focus on the structure of machine learning subgroups is shown. The traditional process of a machine learning analysis is described, starting from the collection of data, across features engineering, modelling and till the validation and deployment phase. Due to the several applications of machine learning performed in literature in the last decades and the lack of some guidelines, the need of a standardization for reporting machine learning analysis results emerged. Some possible standards for reporting machine learning results are identified and discussed deeply; these are related to study population (number of subjects), repeatability of the analysis, validation, results, comparison with current practice. The way to the use of machine learning in clinical practice is open and the hope is that, with emerging technology and advanced digital and computational tools, available from hospitalization and subsequently after discharge, it will also be possible, with the help of increasingly powerful hardware, to build assistance strategies useful in clinical practice.
Collapse
Affiliation(s)
- Carlo Ricciardi
- Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy -
| | - Renato Cuocolo
- Department of Clinical Medicine and Surgery, University of Naples Federico II, Naples, Italy
| | - Rosario Megna
- Institute of Biostructure and Bioimaging, National Council of Research, Naples, Italy
| | - Mario Cesarelli
- Department of Information Technology and Electrical Engineering, University of Naples Federico II, Naples, Italy.,Bioengineering Unit, Institute of Care and Scientific Research Maugeri, Pavia, Italy
| | | |
Collapse
|
36
|
Jia HP, Zhao JL, given-names SGNJL, given-names SGNMZ, Sun WX. Accurate Heart Disease Prediction via Improved Stacking Integration Algorithm. J Imaging Sci Technol 2021. [DOI: 10.2352/j.imagingsci.technol.2021.65.3.030408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
|
37
|
Donisi L, Cesarelli G, Coccia A, Panigazzi M, Capodaglio EM, D’Addio G. Work-Related Risk Assessment According to the Revised NIOSH Lifting Equation: A Preliminary Study Using a Wearable Inertial Sensor and Machine Learning. SENSORS (BASEL, SWITZERLAND) 2021; 21:2593. [PMID: 33917206 PMCID: PMC8068056 DOI: 10.3390/s21082593] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/02/2021] [Accepted: 04/05/2021] [Indexed: 02/08/2023]
Abstract
Many activities may elicit a biomechanical overload. Among these, lifting loads can cause work-related musculoskeletal disorders. Aspiring to improve risk prevention, the National Institute for Occupational Safety and Health (NIOSH) established a methodology for assessing lifting actions by means of a quantitative method based on intensity, duration, frequency and other geometrical characteristics of lifting. In this paper, we explored the machine learning (ML) feasibility to classify biomechanical risk according to the revised NIOSH lifting equation. Acceleration and angular velocity signals were collected using a wearable sensor during lifting tasks performed by seven subjects and further segmented to extract time-domain features: root mean square, minimum, maximum and standard deviation. The features were fed to several ML algorithms. Interesting results were obtained in terms of evaluation metrics for a binary risk/no-risk classification; specifically, the tree-based algorithms reached accuracies greater than 90% and Area under the Receiver operating curve characteristics curves greater than 0.9. In conclusion, this study indicates the proposed combination of features and algorithms represents a valuable approach to automatically classify work activities in two NIOSH risk groups. These data confirm the potential of this methodology to assess the biomechanical risk to which subjects are exposed during their work activity.
Collapse
Affiliation(s)
- Leandro Donisi
- Department of Advanced Biomedical Sciences, University of Naples Federico II, 80131 Naples, Italy;
- Scientific Clinical Institutes ICS Maugeri, 27100 Pavia, Italy; (A.C.); (M.P.); (E.M.C.); (G.D.)
| | - Giuseppe Cesarelli
- Scientific Clinical Institutes ICS Maugeri, 27100 Pavia, Italy; (A.C.); (M.P.); (E.M.C.); (G.D.)
- Department of Chemical, Materials and Production Engineering, University of Naples Federico II, 80125 Naples, Italy
| | - Armando Coccia
- Scientific Clinical Institutes ICS Maugeri, 27100 Pavia, Italy; (A.C.); (M.P.); (E.M.C.); (G.D.)
- Department of Information Technologies and Electrical Engineering, University of Naples Federico II, 80125 Naples, Italy
| | - Monica Panigazzi
- Scientific Clinical Institutes ICS Maugeri, 27100 Pavia, Italy; (A.C.); (M.P.); (E.M.C.); (G.D.)
| | - Edda Maria Capodaglio
- Scientific Clinical Institutes ICS Maugeri, 27100 Pavia, Italy; (A.C.); (M.P.); (E.M.C.); (G.D.)
| | - Giovanni D’Addio
- Scientific Clinical Institutes ICS Maugeri, 27100 Pavia, Italy; (A.C.); (M.P.); (E.M.C.); (G.D.)
| |
Collapse
|
38
|
Recenti M, Ricciardi C, Aubonnet R, Picone I, Jacob D, Svansson HÁR, Agnarsdóttir S, Karlsson GH, Baeringsdóttir V, Petersen H, Gargiulo P. Toward Predicting Motion Sickness Using Virtual Reality and a Moving Platform Assessing Brain, Muscles, and Heart Signals. Front Bioeng Biotechnol 2021; 9:635661. [PMID: 33869153 PMCID: PMC8047066 DOI: 10.3389/fbioe.2021.635661] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/05/2021] [Indexed: 01/15/2023] Open
Abstract
Motion sickness (MS) and postural control (PC) conditions are common complaints among those who passively travel. Many theories explaining a probable cause for MS have been proposed but the most prominent is the sensory conflict theory, stating that a mismatch between vestibular and visual signals causes MS. Few measurements have been made to understand and quantify the interplay between muscle activation, brain activity, and heart behavior during this condition. We introduce here a novel multimetric system called BioVRSea based on virtual reality (VR), a mechanical platform and several biomedical sensors to study the physiology associated with MS and seasickness. This study reports the results from 28 individuals: the subjects stand on the platform wearing VR goggles, a 64-channel EEG dry-electrode cap, two EMG sensors on the gastrocnemius muscles, and a sensor on the chest that captures the heart rate (HR). The virtual environment shows a boat surrounded by waves whose frequency and amplitude are synchronized with the platform movement. Three measurement protocols are performed by each subject, after each of which they answer the Motion Sickness Susceptibility Questionnaire. Nineteen parameters are extracted from the biomedical sensors (5 from EEG, 12 from EMG and, 2 from HR) and 13 from the questionnaire. Eight binary indexes are computed to quantify the symptoms combining all of them in the Motion Sickness Index (I MS ). These parameters create the MS database composed of 83 measurements. All indexes undergo univariate statistical analysis, with EMG parameters being most significant, in contrast to EEG parameters. Machine learning (ML) gives good results in the classification of the binary indexes, finding random forest to be the best algorithm (accuracy of 74.7 for I MS ). The feature importance analysis showed that muscle parameters are the most relevant, and for EEG analysis, beta wave results were the most important. The present work serves as the first step in identifying the key physiological factors that differentiate those who suffer from MS from those who do not using the novel BioVRSea system. Coupled with ML, BioVRSea is of value in the evaluation of PC disruptions, which are among the most disturbing and costly health conditions affecting humans.
Collapse
Affiliation(s)
- Marco Recenti
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Carlo Ricciardi
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland.,Department of Advanced Biomedical Sciences, University Hospital of Naples "Federico II", Naples, Italy
| | - Romain Aubonnet
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Ilaria Picone
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland.,Department of Advanced Biomedical Sciences, University Hospital of Naples "Federico II", Naples, Italy
| | - Deborah Jacob
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Halldór Á R Svansson
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Sólveig Agnarsdóttir
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Gunnar H Karlsson
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Valdís Baeringsdóttir
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland
| | - Hannes Petersen
- Department of Anatomy, University of Iceland, Reykjavík, Iceland.,Akureyri Hospital, Akureyri, Iceland
| | - Paolo Gargiulo
- Institute of Biomedical and Neural Engineering, Reykjavik University, Reykjavík, Iceland.,Department of Science, Landspitali University Hospital, Reykjavík, Iceland
| |
Collapse
|
39
|
Scrutinio D, Ricciardi C, Donisi L, Losavio E, Battista P, Guida P, Cesarelli M, Pagano G, D'Addio G. Machine learning to predict mortality after rehabilitation among patients with severe stroke. Sci Rep 2020; 10:20127. [PMID: 33208913 PMCID: PMC7674405 DOI: 10.1038/s41598-020-77243-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 11/02/2020] [Indexed: 12/23/2022] Open
Abstract
Stroke is among the leading causes of death and disability worldwide. Approximately 20–25% of stroke survivors present severe disability, which is associated with increased mortality risk. Prognostication is inherent in the process of clinical decision-making. Machine learning (ML) methods have gained increasing popularity in the setting of biomedical research. The aim of this study was twofold: assessing the performance of ML tree-based algorithms for predicting three-year mortality model in 1207 stroke patients with severe disability who completed rehabilitation and comparing the performance of ML algorithms to that of a standard logistic regression. The logistic regression model achieved an area under the Receiver Operating Characteristics curve (AUC) of 0.745 and was well calibrated. At the optimal risk threshold, the model had an accuracy of 75.7%, a positive predictive value (PPV) of 33.9%, and a negative predictive value (NPV) of 91.0%. The ML algorithm outperformed the logistic regression model through the implementation of synthetic minority oversampling technique and the Random Forests, achieving an AUC of 0.928 and an accuracy of 86.3%. The PPV was 84.6% and the NPV 87.5%. This study introduced a step forward in the creation of standardisable tools for predicting health outcomes in individuals affected by stroke.
Collapse
Affiliation(s)
| | - Carlo Ricciardi
- Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy. .,Department of Advanced Biomedical Sciences, University Hospital of Naples "Federico II", Naples, Italy.
| | - Leandro Donisi
- Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.,Department of Advanced Biomedical Sciences, University Hospital of Naples "Federico II", Naples, Italy
| | | | | | - Pietro Guida
- Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy
| | - Mario Cesarelli
- Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.,Department of Electrical Engineering and Information Technology, University of Naples "Federico II", Naples, Italy
| | - Gaetano Pagano
- Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy
| | | |
Collapse
|
40
|
Ricciardi C, Improta G, Amato F, Cesarelli G, Romano M. Classifying the type of delivery from cardiotocographic signals: A machine learning approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 196:105712. [PMID: 32877811 DOI: 10.1016/j.cmpb.2020.105712] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 08/12/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND AND OBJECTIVE Cardiotocography (CTG) is the most employed methodology to monitor the foetus in the prenatal phase. Since the evaluation of CTG is often visual, and hence qualitative and too subjective, some automated methods have been introduced for its assessment. METHODS In this paper, a custom-made software is exploited to extract 17 features from the available CTG. A preliminary univariate statistical analysis is performed; then, five machine learning algorithms, exploiting ensemble learning, were implemented (J48, Random Forests (RF), Ada-boosting of decision tree (ADA-B), Gradient Boosting and Decorate) through Knime analytics platform to classify patients according to their delivery: vaginal or caesarean section. The dataset is composed by 370 signals collected between 2000 and 2009 in both public and private hospitals. The performance of the algorithms was evaluated using 10 folds cross validation with different evaluation metrics: accuracy, precision, sensitivity, specificity, area under the curve receiver operating characteristic (AUCROC). RESULTS While only two features were significantly different (gestation week and power expressed by the high frequency band of FHR power spectrum), from the statistical point of view, machine learning results were great. The RF obtained the best results: accuracy (91.1%), sensitivity (90.0%) and AUCROC (96.7%). The ADA-B achieved the highest precision (92.6%) and specificity (93.1%). As expected, the lowest scores were obtained by J48 that was the base classifier employed in all the others empowered implementations. Excluding the J48 results, the AUCROC of all the algorithms was greater than 94.9%. CONCLUSION In the light of the obtained results, that are greater than those ones found in the literature from comparable researches, it can be stated that the machine learning approach can actually help the physicians in their decision process when evaluating the foetal well-being.
Collapse
Affiliation(s)
- C Ricciardi
- Department of Advanced Biomedical Sciences, University Hospital of Naples Federico II, Naples, Italy
| | - G Improta
- Department of Public Health, University Hospital of Naples Federico II, Naples, Italy; Centro Interdipartimentale di Ricerca in Management Sanitario e Innovazione in Sanità (CIRMIS)
| | - F Amato
- Centro Interdipartimentale di Ricerca in Management Sanitario e Innovazione in Sanità (CIRMIS); Department of Electrical Engineering and Information Technology, DIETI, University of Naples Federico II, Naples 80125, Italy.
| | - G Cesarelli
- Department of Chemical, Materials and Production Engineering, University of Naples "Federico II", Naples, Italy; Istituto Italiano di Tecnologia, Naples, Italy
| | - M Romano
- Department of Experimental and Clinical Medicine (DMSC), University "Magna Graecia" of Catanzaro, Italy
| |
Collapse
|
41
|
Ricciardi C, Jónsson H, Jacob D, Improta G, Recenti M, Gíslason MK, Cesarelli G, Esposito L, Minutolo V, Bifulco P, Gargiulo P. Improving Prosthetic Selection and Predicting BMD from Biometric Measurements in Patients Receiving Total Hip Arthroplasty. Diagnostics (Basel) 2020; 10:diagnostics10100815. [PMID: 33066350 PMCID: PMC7602076 DOI: 10.3390/diagnostics10100815] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/08/2020] [Accepted: 10/12/2020] [Indexed: 12/11/2022] Open
Abstract
There are two surgical approaches to performing total hip arthroplasty (THA): a cemented or uncemented type of prosthesis. The choice is usually based on the experience of the orthopaedic surgeon and on parameters such as the age and gender of the patient. Using machine learning (ML) techniques on quantitative biomechanical and bone quality data extracted from computed tomography, electromyography and gait analysis, the aim of this paper was, firstly, to help clinicians use patient-specific biomarkers from diagnostic exams in the prosthetic decision-making process. The second aim was to evaluate patient long-term outcomes by predicting the bone mineral density (BMD) of the proximal and distal parts of the femur using advanced image processing analysis techniques and ML. The ML analyses were performed on diagnostic patient data extracted from a national database of 51 THA patients using the Knime analytics platform. The classification analysis achieved 93% accuracy in choosing the type of prosthesis; the regression analysis on the BMD data showed a coefficient of determination of about 0.6. The start and stop of the electromyographic signals were identified as the best predictors. This study shows a patient-specific approach could be helpful in the decision-making process and provide clinicians with information regarding the follow up of patients.
Collapse
Affiliation(s)
- Carlo Ricciardi
- Department of Advanced Biomedical Sciences, University Hospital of Naples ‘Federico II’, 80131 Naples, Italy
- Institute for Biomedical and Neural Engineering, Reykjavík University, 102 Reykjavík, Iceland; (D.J.); (M.R.); (M.K.G.); (P.G.)
- Correspondence:
| | - Halldór Jónsson
- Faculty of Medicine, University of Iceland, 102 Reykjavík, Iceland;
- Landspítali Hospital, Orthopaedic Clinic, 102 Reykjavík, Iceland
| | - Deborah Jacob
- Institute for Biomedical and Neural Engineering, Reykjavík University, 102 Reykjavík, Iceland; (D.J.); (M.R.); (M.K.G.); (P.G.)
| | - Giovanni Improta
- Department of Public Health, University Hospital of Naples ‘Federico II’, 80125 Naples, Italy;
| | - Marco Recenti
- Institute for Biomedical and Neural Engineering, Reykjavík University, 102 Reykjavík, Iceland; (D.J.); (M.R.); (M.K.G.); (P.G.)
| | - Magnús Kjartan Gíslason
- Institute for Biomedical and Neural Engineering, Reykjavík University, 102 Reykjavík, Iceland; (D.J.); (M.R.); (M.K.G.); (P.G.)
| | - Giuseppe Cesarelli
- Department of Chemical, Materials and Production Engineering, University of Naples “Federico II”, 80125 Naples, Italy;
- Istituto Italiano di Tecnologia, 80125 Naples, Italy
| | - Luca Esposito
- Department Engineering, University of Campania Luigi Vanvitelli, 81100 Aversa (CE), Italy; (L.E.); (V.M.)
| | - Vincenzo Minutolo
- Department Engineering, University of Campania Luigi Vanvitelli, 81100 Aversa (CE), Italy; (L.E.); (V.M.)
| | - Paolo Bifulco
- Department of Electrical Engineering and Information Technologies, University Hospital of Naples ‘Federico II’, 80125 Naples, Italy;
| | - Paolo Gargiulo
- Institute for Biomedical and Neural Engineering, Reykjavík University, 102 Reykjavík, Iceland; (D.J.); (M.R.); (M.K.G.); (P.G.)
- Department of Science, Landspítali Hospital, 102 Reykjavík, Iceland
| |
Collapse
|