1
|
Jamel L, Umer M, Saidani O, Alabduallah B, Alsubai S, Ishmanov F, Kim TH, Ashraf I. Improving prediction of maternal health risks using PCA features and TreeNet model. PeerJ Comput Sci 2024; 10:e1982. [PMID: 38660162 PMCID: PMC11042025 DOI: 10.7717/peerj-cs.1982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 03/15/2024] [Indexed: 04/26/2024]
Abstract
Maternal healthcare is a critical aspect of public health that focuses on the well-being of pregnant women before, during, and after childbirth. It encompasses a range of services aimed at ensuring the optimal health of both the mother and the developing fetus. During pregnancy and in the postpartum period, the mother's health is susceptible to several complications and risks, and timely detection of such risks can play a vital role in women's safety. This study proposes an approach to predict risks associated with maternal health. The first step of the approach involves utilizing principal component analysis (PCA) to extract significant features from the dataset. Following that, this study employs a stacked ensemble voting classifier which combines one machine learning and one deep learning model to achieve high performance. The performance of the proposed approach is compared to six machine learning algorithms and one deep learning algorithm. Two scenarios are considered for the experiments: one utilizing all features and the other using PCA features. By utilizing PCA-based features, the proposed model achieves an accuracy of 98.25%, precision of 99.17%, recall of 99.16%, and an F1 score of 99.16%. The effectiveness of the proposed model is further confirmed by comparing it to existing state of-the-art approaches.
Collapse
Affiliation(s)
- Leila Jamel
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Bayan Alabduallah
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Farruh Ishmanov
- Department of Electronics and Communication Engineering, Kwangwoon University, Seoul, Republic of South Korea
| | - Tai-hoon Kim
- School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, Daehak-ro, Yeosu-si, Jeollanam-do, Republic of South Korea
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of South Korea
| |
Collapse
|
2
|
Aljrees T. Improving prediction of cervical cancer using KNN imputer and multi-model ensemble learning. PLoS One 2024; 19:e0295632. [PMID: 38170713 PMCID: PMC10763959 DOI: 10.1371/journal.pone.0295632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/23/2023] [Indexed: 01/05/2024] Open
Abstract
Cervical cancer is a leading cause of women's mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification-handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women's health and healthcare systems.
Collapse
Affiliation(s)
- Turki Aljrees
- College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| |
Collapse
|
3
|
Karamti H, Alharthi R, Umer M, Shaiba H, Ishaq A, Abuzinadah N, Alsubai S, Ashraf I. Breast cancer detection employing stacked ensemble model with convolutional features. Cancer Biomark 2023:CBM230294. [PMID: 38160347 DOI: 10.3233/cbm-230294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Breast cancer is a major cause of female deaths, especially in underdeveloped countries. It can be treated if diagnosed early and chances of survival are high if treated appropriately and timely. For timely and accurate automated diagnosis, machine learning approaches tend to show better results than traditional methods, however, accuracy lacks the desired level. This study proposes the use of an ensemble model to provide accurate detection of breast cancer. The proposed model uses the random forest and support vector classifier along with automatic feature extraction using an optimized convolutional neural network (CNN). Extensive experiments are performed using the original, as well as, CNN-based features to analyze the performance of the deployed models. Experimental results involving the use of the Wisconsin dataset reveal that CNN-based features provide better results than the original features. It is observed that the proposed model achieves an accuracy of 99.99% for breast cancer detection. Performance comparison with existing state-of-the-art models is also carried out showing the superior performance of the proposed model.
Collapse
Affiliation(s)
- Hanen Karamti
- Department of Computer Sciences, College of Computer and Information Sciences Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Raed Alharthi
- Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Hadil Shaiba
- Department of Computer Sciences, College of Computer and Information Sciences Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Abid Ishaq
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Nihal Abuzinadah
- Faculty of Computer Science and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Korea
| |
Collapse
|
4
|
Umer M, Aljrees T, Ullah S, Bashir AK. Novel approach for quantitative and qualitative authors research profiling using feature fusion and tree-based learning approach. PeerJ Comput Sci 2023; 9:e1752. [PMID: 38192451 PMCID: PMC10773922 DOI: 10.7717/peerj-cs.1752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/22/2023] [Indexed: 01/10/2024]
Abstract
Article citation creates a link between the cited and citing articles and is used as a basis for several parameters like author and journal impact factor, H-index, i10 index, etc., for scientific achievements. Citations also include self-citation which refers to article citation by the author himself. Self-citation is important to evaluate an author's research profile and has gained popularity recently. Although different criteria are found in the literature regarding appropriate self-citation, self-citation does have a huge impact on a researcher's scientific profile. This study carries out two cases in this regard. In case 1, the qualitative aspect of the author's profile is analyzed using hand-crafted feature engineering techniques. The sentiments conveyed through citations are integral in assessing research quality, as they can signify appreciation, critique, or serve as a foundation for further research. Analyzing sentiments within in-text citations remains a formidable challenge, even with the utilization of automated sentiment annotations. For this purpose, this study employs machine learning models using term frequency (TF) and term frequency-inverse document frequency (TF-IDF). Random forest using TF with Synthetic Minority Oversampling Technique (SMOTE) achieved a 0.9727 score of accuracy. Case 2 deals with quantitative analysis and investigates direct and indirect self-citation. In this study, the top 2% of researchers in 2020 is considered as a baseline. For this purpose, the data of the top 25 Pakistani researchers are manually retrieved from this dataset, in addition to the citation information from the Web of Science (WoS). The self-citation is estimated using the proposed model and results are compared with those obtained from WoS. Experimental results show a substantial difference between the two, as the ratio of self-citation from the proposed approach is higher than WoS. It is observed that the citations from the WoS for authors are overstated. For a comprehensive evaluation of the researcher's profile, both direct and indirect self-citation must be included.
Collapse
Affiliation(s)
- Muhammad Umer
- Department of Computer Science, Khwaja Fareed University of Engineering & IT, Rahim Yar Khan, Punjab, Pakistan
| | - Turki Aljrees
- Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| | - Saleem Ullah
- Department of Computer Science, Khwaja Fareed University of Engineering & IT, Rahim Yar Khan, Punjab, Pakistan
| | - Ali Kashif Bashir
- Department of Computing and Mathematics, The Manchester Metropolitan University, Manchester, United Kingdom
| |
Collapse
|
5
|
Abuzinadah N, Kumar Posa S, Alarfaj AA, Alabdulqader EA, Umer M, Kim TH, Alsubai S, Ashraf I. Improved Prediction of Ovarian Cancer Using Ensemble Classifier and Shaply Explainable AI. Cancers (Basel) 2023; 15:5793. [PMID: 38136346 PMCID: PMC10742117 DOI: 10.3390/cancers15245793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023] Open
Abstract
The importance of detecting and preventing ovarian cancer is of utmost significance for women's overall health and wellness. Referred to as the "silent killer," ovarian cancer exhibits inconspicuous symptoms during its initial phases, posing a challenge for timely identification. Identification of ovarian cancer during its advanced stages significantly diminishes the likelihood of effective treatment and survival. Regular screenings, such as pelvic exams, ultrasound, and blood tests for specific biomarkers, are essential tools for detecting the disease in its early, more treatable stages. This research makes use of the Soochow University ovarian cancer dataset, containing 50 features for the accurate detection of ovarian cancer. The proposed predictive model makes use of a stacked ensemble model, merging the strengths of bagging and boosting classifiers, and aims to enhance predictive accuracy and reliability. This combination harnesses the benefits of variance reduction and improved generalization, contributing to superior ovarian cancer prediction outcomes. The proposed model gives 96.87% accuracy, which is currently the highest model result obtained on this dataset so far using all features. Moreover, the outcomes are elucidated utilizing the explainable artificial intelligence method referred to as SHAPly. The excellence of the suggested model is demonstrated through a comparison of its performance with that of other cutting-edge models.
Collapse
Affiliation(s)
- Nihal Abuzinadah
- Faculty of Computer Science and Information Technology, King Abdulaziz University, P.O. Box 80200, Jeddah 21589, Saudi Arabia;
| | - Sarath Kumar Posa
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA;
| | - Aisha Ahmed Alarfaj
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Ebtisam Abdullah Alabdulqader
- Department of Information Technology, College of Computer and Information Sciences, King Saud University, Riyadh 12372, Saudi Arabia;
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan;
| | - Tai-Hoon Kim
- School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, 50, Daehak-ro, Yeosu-si 59626, Jeollanam-do, Republic of Korea
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, Al-Kharj 11942, Saudi Arabia;
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| |
Collapse
|
6
|
Jamali AA, Berger C, Spiteri RJ. Momentary Depressive Feeling Detection Using X (Formerly Twitter) Data: Contextual Language Approach. JMIR AI 2023; 2:e49531. [PMID: 38875532 DOI: 10.2196/49531] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 09/06/2023] [Accepted: 10/27/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND Depression and momentary depressive feelings are major public health concerns imposing a substantial burden on both individuals and society. Early detection of momentary depressive feelings is highly beneficial in reducing this burden and improving the quality of life for affected individuals. To this end, the abundance of data exemplified by X (formerly Twitter) presents an invaluable resource for discerning insights into individuals' mental states and enabling timely detection of these transitory depressive feelings. OBJECTIVE The objective of this study was to automate the detection of momentary depressive feelings in posts using contextual language approaches. METHODS First, we identified terms expressing momentary depressive feelings and depression, scaled their relevance to depression, and constructed a lexicon. Then, we scraped posts using this lexicon and labeled them manually. Finally, we assessed the performance of the Bidirectional Encoder Representations From Transformers (BERT), A Lite BERT (ALBERT), Robustly Optimized BERT Approach (RoBERTa), Distilled BERT (DistilBERT), convolutional neural network (CNN), bidirectional long short-term memory (BiLSTM), and machine learning (ML) algorithms in detecting momentary depressive feelings in posts. RESULTS This study demonstrates a notable distinction in performance between binary classification, aimed at identifying posts conveying depressive sentiments and multilabel classification, designed to categorize such posts across multiple emotional nuances. Specifically, binary classification emerges as the more adept approach in this context, outperforming multilabel classification. This outcome stems from several critical factors that underscore the nuanced nature of depressive expressions within social media. Our results show that when using binary classification, BERT and DistilBERT (pretrained transfer learning algorithms) may outperform traditional ML algorithms. Particularly, DistilBERT achieved the best performance in terms of area under the curve (96.71%), accuracy (97.4%), sensitivity (97.57%), specificity (97.22%), precision (97.30%), and F1-score (97.44%). DistilBERT obtained an area under the curve nearly 12% points higher than that of the best-performing traditional ML algorithm, convolutional neural network. This study showed that transfer learning algorithms are highly effective in extracting knowledge from posts, detecting momentary depressive feelings, and highlighting their superiority in contextual analysis. CONCLUSIONS Our findings suggest that contextual language approaches-particularly those rooted in transfer learning-are reliable approaches to automate the early detection of momentary depressive feelings and can be used to develop social media monitoring tools for identifying individuals who may be at risk of depression. The implications are far-reaching because these approaches stand poised to inform the creation of social media monitoring tools and are pivotal for identifying individuals susceptible to depression. By intervening proactively, these tools possess the potential to slow the progression of depressive feelings, effectively mitigating the societal load of depression and fostering improved mental health. In addition to highlighting the capabilities of automated sentiment analysis, this study illuminates its pivotal role in advancing global public health.
Collapse
Affiliation(s)
- Ali Akbar Jamali
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Corinne Berger
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| | - Raymond J Spiteri
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
7
|
Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi AA, Alsubai S, Umer M. Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach. Cancers (Basel) 2023; 15:4412. [PMID: 37686692 PMCID: PMC10486648 DOI: 10.3390/cancers15174412] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/02/2023] [Accepted: 08/09/2023] [Indexed: 09/10/2023] Open
Abstract
Objective: Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer's aftereffects are early identification and treatment under the finest medical guidance. One of the best methods to find this sort of malignancy is by looking at a Pap smear image. For automated detection of cervical cancer, the available datasets often have missing values, which can significantly affect the performance of machine learning models. Methods: To address these challenges, this study proposes an automated system for predicting cervical cancer that efficiently handles missing values with SMOTE features to achieve high accuracy. The proposed system employs a stacked ensemble voting classifier model that combines three machine learning models, along with KNN Imputer and SMOTE up-sampled features for handling missing values. Results: The proposed model achieves 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1 score when using KNN imputed SMOTE features. The study compares the performance of the proposed model with multiple other machine learning algorithms under four scenarios: with missing values removed, with KNN imputation, with SMOTE features, and with KNN imputed SMOTE features. The study validates the efficacy of the proposed model against existing state-of-the-art approaches. Conclusions: This study investigates the issue of missing values and class imbalance in the data collected for cervical cancer detection and might aid medical practitioners in timely detection and providing cervical cancer patients with better care.
Collapse
Affiliation(s)
- Hanen Karamti
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Raed Alharthi
- Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin 39524, Saudi Arabia;
| | - Amira Al Anizi
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Reemah M. Alhebshi
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| | - Ala’ Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah 23218, Saudi Arabia;
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, Al-Kharj 11942, Saudi Arabia;
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| |
Collapse
|
8
|
Saidani O, Aljrees T, Umer M, Alturki N, Alshardan A, Khan SW, Alsubai S, Ashraf I. Enhancing Prediction of Brain Tumor Classification Using Images and Numerical Data Features. Diagnostics (Basel) 2023; 13:2544. [PMID: 37568907 PMCID: PMC10417332 DOI: 10.3390/diagnostics13152544] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/23/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
Brain tumors, along with other diseases that harm the neurological system, are a significant contributor to global mortality. Early diagnosis plays a crucial role in effectively treating brain tumors. To distinguish individuals with tumors from those without, this study employs a combination of images and data-based features. In the initial phase, the image dataset is enhanced, followed by the application of a UNet transfer-learning-based model to accurately classify patients as either having tumors or being normal. In the second phase, this research utilizes 13 features in conjunction with a voting classifier. The voting classifier incorporates features extracted from deep convolutional layers and combines stochastic gradient descent with logistic regression to achieve better classification results. The reported accuracy score of 0.99 achieved by both proposed models shows its superior performance. Also, comparing results with other supervised learning algorithms and state-of-the-art models validates its performance.
Collapse
Affiliation(s)
- Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia; (O.S.); (N.A.); (A.A.)
| | - Turki Aljrees
- Department College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin 39524, Saudi Arabia;
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| | - Nazik Alturki
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia; (O.S.); (N.A.); (A.A.)
| | - Amal Alshardan
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia; (O.S.); (N.A.); (A.A.)
| | - Sardar Waqar Khan
- Department of Computer Science & Information Technology, The University of Lahore, Lahore 54000, Pakistan;
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia;
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
| |
Collapse
|
9
|
Alturki N, Umer M, Ishaq A, Abuzinadah N, Alnowaiser K, Mohamed A, Saidani O, Ashraf I. Combining CNN Features with Voting Classifiers for Optimizing Performance of Brain Tumor Classification. Cancers (Basel) 2023; 15:cancers15061767. [PMID: 36980653 PMCID: PMC10046217 DOI: 10.3390/cancers15061767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 02/20/2023] [Accepted: 03/04/2023] [Indexed: 03/17/2023] Open
Abstract
Brain tumors and other nervous system cancers are among the top ten leading fatal diseases. The effective treatment of brain tumors depends on their early detection. This research work makes use of 13 features with a voting classifier that combines logistic regression with stochastic gradient descent using features extracted by deep convolutional layers for the efficient classification of tumorous victims from the normal. From the first and second-order brain tumor features, deep convolutional features are extracted for model training. Using deep convolutional features helps to increase the precision of tumor and non-tumor patient classification. The proposed voting classifier along with convoluted features produces results that show the highest accuracy of 99.9%. Compared to cutting-edge methods, the proposed approach has demonstrated improved accuracy.
Collapse
Affiliation(s)
- Nazik Alturki
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| | - Abid Ishaq
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| | - Nihal Abuzinadah
- Faculty of Computer Science and Information Technology, King Abdulaziz University, P.O. Box. 80200, Jeddah 21589, Saudi Arabia
| | - Khaled Alnowaiser
- Department of Computer Engineering, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
| | - Abdullah Mohamed
- Research Centre, Future University in Egypt, New Cairo 11745, Egypt
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
- Correspondence:
| |
Collapse
|
10
|
Chen X, Aljrees T, Umer M, Saidani O, Almuqren L, Mzoughi O, Ishaq A, Ashraf I. Cervical cancer detection using K nearest neighbor imputer and stacked ensemble learningmodel. Digit Health 2023; 9:20552076231203802. [PMID: 37799501 PMCID: PMC10548812 DOI: 10.1177/20552076231203802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 09/08/2023] [Indexed: 10/07/2023] Open
Abstract
Objective Cervical cancer stands as a leading cause of mortality among women in developing nations. To ensure the reduction of its adverse consequences, the primary protocols to be adhered to involve early detection and treatment under the guidance of expert medical professionals. An effective approach for identifying this form of malignancy involves the examination of Pap smear images. However, in the context of automating cervical cancer detection, many of the existing datasets frequently exhibit missing data points, a factor that can substantially impact the effectiveness of machine learning models. Methods In response to these hurdles, this research introduces an automated system designed to predict cervical cancer with a dual focus: adeptly managing missing data while attaining remarkable accuracy. The system's core is built upon a stacked ensemble voting classifier model, which amalgamates three distinct machine learning models, all harmoniously integrated with the KNN Imputer to address the issue of missing values. Results The model put forth attains an accuracy of 99.41%, precision of 97.63%, recall of 95.96%, and an F1 score of 96.76% when incorporating the KNN imputation method. The investigation conducts a comparative analysis, contrasting the performance of this model with seven alternative machine learning algorithms in two scenarios: one where missing values are eliminated, and another employing KNN imputation. This study offers validation of the effectiveness of the proposed model in comparison to current state-of-the-art methodologies. Conclusions This research delves into the challenge of handling missing data in the dataset utilized for cervical cancer detection. The findings have the potential to assist healthcare professionals in achieving early detection and enhancing the quality of care provided to individuals affected by cervical cancer.
Collapse
Affiliation(s)
- Xiaoyuan Chen
- Huzhou Key Laboratory of Green Energy Materials and Battery Cascade Utilization, School of Intelligent Manufacturing, Huzhou College, Huzhou, P.R. China
| | - Turki Aljrees
- Department College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Latifah Almuqren
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Olfa Mzoughi
- Department of Computer Science, College of Sciences and Humanities-Aflaj, Prince Sattam bin Abdulaziz University, Aflaj, Saudi Arabia
| | - Abid Ishaq
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, South Korea
| |
Collapse
|