Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

87
(from Reference Citation Analysis)

Article PDFs (27)

Cited by > 0 (61)

Searched Name

SMOTE

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Ahsan MM, Ali MS, Siddique Z. Enhancing and improving the performance of imbalanced class data using novel GBO and SSG: A comparative analysis. Neural Netw 2024;173:106157. [PMID: 38335796 DOI: 10.1016/j.neunet.2024.106157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/01/2024] [Accepted: 02/01/2024] [Indexed: 02/12/2024]

Chen CC, Ting WC, Lee HC, Chang CC, Lin TC, Yang SF. A Cost-Effective Model for Predicting Recurrent Gastric Cancer Using Clinical Features. Diagnostics (Basel) 2024;14:842. [PMID: 38667487 PMCID: PMC11049390 DOI: 10.3390/diagnostics14080842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open

Khan MM, Alkhathami M. Anomaly detection in IoT-based healthcare: machine learning for enhanced security. Sci Rep 2024;14:5872. [PMID: 38467709 PMCID: PMC10928137 DOI: 10.1038/s41598-024-56126-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 02/29/2024] [Indexed: 03/13/2024] Open

Li J, Dai Y, Mu Z, Wang Z, Meng J, Meng T, Wang J. Choice of refractive surgery types for myopia assisted by machine learning based on doctors' surgical selection data. BMC Med Inform Decis Mak 2024;24:41. [PMID: 38331788 PMCID: PMC10854042 DOI: 10.1186/s12911-024-02451-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 02/02/2024] [Indexed: 02/10/2024] Open

Abstract

In recent years, corneal refractive surgery has been widely used in clinics as an effective means to restore vision and improve the quality of life. When choosing myopia-refractive surgery, it is necessary to comprehensively consider the differences in equipment and technology as well as the specificity of individual patients, which heavily depend on the experience of ophthalmologists. In our study, we took advantage of machine learning to learn about the experience of ophthalmologists in decision-making and assist them in the choice of corneal refractive surgery in a new case. Our study was based on the clinical data of 7,081 patients who underwent corneal refractive surgery between 2000 and 2017 at the Department of Ophthalmology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences. Due to the long data period, there were data losses and errors in this dataset. First, we cleaned the data and deleted the samples of key data loss. Then, patients were divided into three groups according to the type of surgery, after which we used SMOTE technology to eliminate imbalance between groups. Six statistical machine learning models, including NBM, RF, AdaBoost, XGBoost, BP neural network, and DBN were selected, and a ten-fold cross-validation and grid search were used to determine the optimal hyperparameters for better performance. When tested on the dataset, the multi-class RF model showed the best performance, with agreement with ophthalmologist decisions as high as 0.8775 and Macro F1 as high as 0.8019. Furthermore, the results of the feature importance analysis based on the SHAP technique were consistent with an ophthalmologist's practical experience. Our research will assist ophthalmologists in choosing appropriate types of refractive surgery and will have beneficial clinical effects.

Collapse

Mohseni-Takalloo S, Mohseni H, Mozaffari-Khosravi H, Mirzaei M, Hosseinzadeh M. The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forest. BMC Bioinformatics 2024;25:18. [PMID: 38212697 PMCID: PMC10782700 DOI: 10.1186/s12859-024-05633-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open

Museru ML, Nazari R, Giglou AN, Opare K, Karimi M. Advancing flood damage modeling for coastal Alabama residential properties: A multivariable machine learning approach. Sci Total Environ 2024;907:167872. [PMID: 37852490 DOI: 10.1016/j.scitotenv.2023.167872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 10/13/2023] [Accepted: 10/14/2023] [Indexed: 10/20/2023]

Nath A, Chaube R. Mining Chemogenomic Spaces for Prediction of Drug-Target Interactions. Methods Mol Biol 2024;2714:155-169. [PMID: 37676598 DOI: 10.1007/978-1-0716-3441-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]

Xu Y, Park Y, Park JD, Sun B. Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms. Healthcare (Basel) 2023;11:3173. [PMID: 38132063 PMCID: PMC10742910 DOI: 10.3390/healthcare11243173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/11/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open

Semary NA, Ahmed W, Amin K, Pławiak P, Hammad M. Improving sentiment classification using a RoBERTa-based hybrid model. Front Hum Neurosci 2023;17:1292010. [PMID: 38130432 PMCID: PMC10733963 DOI: 10.3389/fnhum.2023.1292010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 11/23/2023] [Indexed: 12/23/2023] Open

Chen J, Qi TD, Vu J, Wen Y. A deep learning approach for inpatient length of stay and mortality prediction. J Biomed Inform 2023;147:104526. [PMID: 37852346 DOI: 10.1016/j.jbi.2023.104526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/11/2023] [Accepted: 10/15/2023] [Indexed: 10/20/2023]

Tang X, Wu Z, Liu W, Tian J, Liu L. Exploring effective ways to increase reliable positive samples for machine learning-based urban waterlogging susceptibility assessments. J Environ Manage 2023;344:118682. [PMID: 37567005 DOI: 10.1016/j.jenvman.2023.118682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 07/10/2023] [Accepted: 07/25/2023] [Indexed: 08/13/2023]

Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi AA, Alsubai S, Umer M. Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach. Cancers (Basel) 2023;15:4412. [PMID: 37686692 PMCID: PMC10486648 DOI: 10.3390/cancers15174412] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/02/2023] [Accepted: 08/09/2023] [Indexed: 09/10/2023] Open

Angaitkar P, Janghel RR, Sahu TP. DL-TCNN: Deep Learning-based Temporal Convolutional Neural Network for prediction of conformational B-cell epitopes. 3 Biotech 2023;13:297. [PMID: 37575599 PMCID: PMC10412510 DOI: 10.1007/s13205-023-03716-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 07/24/2023] [Indexed: 08/15/2023] Open

Zhou T, Jiao H. Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment. Educ Psychol Meas 2023;83:831-854. [PMID: 37398846 PMCID: PMC10311957 DOI: 10.1177/00131644221117193] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]

Ma F, Li H. Online painting image clustering for the mental health of college art students based on improved CNN and SMOTE. PeerJ Comput Sci 2023;9:e1462. [PMID: 37547389 PMCID: PMC10403178 DOI: 10.7717/peerj-cs.1462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/06/2023] [Indexed: 08/08/2023]

Welvaars K, Oosterhoff JHF, van den Bekerom MPJ, Doornberg JN, van Haarst EP. Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA Open 2023;6:ooad033. [PMID: 37266187 PMCID: PMC10232287 DOI: 10.1093/jamiaopen/ooad033] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/04/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023] Open

Işık Ü, Güven A, Batbat T. Evaluation of Emotions from Brain Signals on 3D VAD Space via Artificial Intelligence Techniques. Diagnostics (Basel) 2023;13:2141. [PMID: 37443535 DOI: 10.3390/diagnostics13132141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/12/2023] [Accepted: 06/14/2023] [Indexed: 07/15/2023] Open

Saminathan S, Malathy C. Ensemble-based classification approach for PM2.5 concentration forecasting using meteorological data. Front Big Data 2023;6:1175259. [PMID: 37360751 PMCID: PMC10289837 DOI: 10.3389/fdata.2023.1175259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 05/09/2023] [Indexed: 06/28/2023] Open

Chhabra D, Juneja M, Chutani G. An efficient ensemble based machine learning approach for predicting Chronic Kidney Disease. Curr Med Imaging 2023:CMIR-EPUB-131580. [PMID: 37157217 DOI: 10.2174/1573405620666230508104538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 02/01/2023] [Accepted: 03/16/2023] [Indexed: 05/10/2023]

Fatlawi HK, Kiss A. An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream. Sensors (Basel) 2023;23:s23042061. [PMID: 36850659 PMCID: PMC9963940 DOI: 10.3390/s23042061] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/08/2023] [Accepted: 02/10/2023] [Indexed: 06/12/2023]

Mafarja M, Thaher T, Al-Betar MA, Too J, Awadallah MA, Abu Doush I, Turabieh H. Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. APPL INTELL 2023;53:1-43. [PMID: 36785593 PMCID: PMC9909674 DOI: 10.1007/s10489-022-04427-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2022] [Indexed: 02/11/2023]

Abstract

Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms' performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.

Collapse

Azlim Khan AK, Ahamed Hassain Malim NH. Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction. Molecules 2023;28. [PMID: 36838652 DOI: 10.3390/molecules28041663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/12/2023] Open

Din NU, Zhang L, Yang Y. Automated Battery Making Fault Classification Using Over-Sampled Image Data CNN Features. Sensors (Basel) 2023;23:1927. [PMID: 36850526 PMCID: PMC9965985 DOI: 10.3390/s23041927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 01/26/2023] [Accepted: 02/03/2023] [Indexed: 06/18/2023]

Chandrashekar K, Setlur AS, Sabhapathi C A, Raiker SS, Singh S, Niranjan V. Decision Support System and Web-Application Using Supervised Machine Learning Algorithms for Easy Cancer Classifications. Cancer Inform 2023;22:11769351221147244. [PMID: 36714384 PMCID: PMC9880585 DOI: 10.1177/11769351221147244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 12/06/2022] [Indexed: 01/24/2023] Open

Abstract

Using a decision support system (DSS) that classifies various cancers provides support to the clinicians/researchers to make better decisions that can aid in early cancer diagnosis, thereby reducing chances of incorrect disease diagnosis. Thus, this work aimed at designing a classification model that can predict accurately for 5 different cancer types comprising of 20 cancer exomes, using the mutations identified from whole exome cancer analysis. Initially, a basic model was designed using supervised machine learning classification algorithms such as K-nearest neighbor (KNN), support vector machine (SVM), decision tree, naïve bayes and random forest (RF), among which decision tree and random forest performed better in terms of preliminary model accuracy. However, output predictions were incorrect due to less training scores. Thus, 16 essential features were then selected for model improvement using 2 approaches. All imbalanced datasets were balanced using SMOTE. In the first approach, all features from 20 cancer exome datasets were trained and models were designed using decision tree and random forest. Balanced datasets for decision tree model showed an accuracy of 77%, while with the RF model, the accuracy improved to 82% where all 5 cancer types were predicted correctly. Area under the curve for RF model was closer to 1, than decision tree model. In the second approach, all 15 datasets were trained, while 5 were tested. However, only 2 cancer types were predicted correctly. To cross validate RF model, Matthew's correlation co-efficient (MCC) test was performed. For method 1, the MCC test and MCC cross validation was found to be 0.7796 and 0.9356 respectively. Likewise, for second approach, MCC was observed to be 0.9365, corroborating the accuracy of the designed model. The model was successfully deployed using Streamlit as a web application for easy use. This study presents insights for allowing easy cancer classifications.

Collapse

Sachdeva RK, Bathla P, Rani P, Solanki V, Ahuja R. A systematic method for diagnosis of hepatitis disease using machine learning. Innov Syst Softw Eng 2023;19:71-80. [PMID: 36628173 PMCID: PMC9818056 DOI: 10.1007/s11334-022-00509-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]

Lu M, Wang M, Zhang Q, Yu M, He C, Zhang Y, Li Y. A vision transformer for lightning intensity estimation using 3D weather radar. Sci Total Environ 2022;853:158496. [PMID: 36063932 DOI: 10.1016/j.scitotenv.2022.158496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/11/2022] [Accepted: 08/30/2022] [Indexed: 06/15/2023]

Karim M, Saad Missen MM, Umer M, Fida A, Eshmawi AA, Mohamed A, Ashraf I. Comprehension of polarity of articles by citation sentiment analysis using TF-IDF and ML classifiers. PeerJ Comput Sci 2022;8:e1107. [PMID: 37346319 PMCID: PMC10280177 DOI: 10.7717/peerj-cs.1107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 08/29/2022] [Indexed: 06/23/2023]

Shah SMA, Usman SM, Khalid S, Rehman IU, Anwar A, Hussain S, Ullah SS, Elmannai H, Algarni AD, Manzoor W. An Ensemble Model for Consumer Emotion Prediction Using EEG Signals for Neuromarketing Applications. Sensors (Basel) 2022;22:9744. [PMID: 36560113 PMCID: PMC9782208 DOI: 10.3390/s22249744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/21/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]

Abstract

Traditional advertising techniques seek to govern the consumer's opinion toward a product, which may not reflect their actual behavior at the time of purchase. It is probable that advertisers misjudge consumer behavior because predicted opinions do not always correspond to consumers' actual purchase behaviors. Neuromarketing is the new paradigm of understanding customer buyer behavior and decision making, as well as the prediction of their gestures for product utilization through an unconscious process. Existing methods do not focus on effective preprocessing and classification techniques of electroencephalogram (EEG) signals, so in this study, an effective method for preprocessing and classification of EEG signals is proposed. The proposed method involves effective preprocessing of EEG signals by removing noise and a synthetic minority oversampling technique (SMOTE) to deal with the class imbalance problem. The dataset employed in this study is a publicly available neuromarketing dataset. Automated features were extracted by using a long short-term memory network (LSTM) and then concatenated with handcrafted features like power spectral density (PSD) and discrete wavelet transform (DWT) to create a complete feature set. The classification was done by using the proposed hybrid classifier that optimizes the weights of two machine learning classifiers and one deep learning classifier and classifies the data between like and dislike. The machine learning classifiers include the support vector machine (SVM), random forest (RF), and deep learning classifier (DNN). The proposed hybrid model outperforms other classifiers like RF, SVM, and DNN and achieves an accuracy of 96.89%. In the proposed method, accuracy, sensitivity, specificity, precision, and F1 score were computed to evaluate and compare the proposed method with recent state-of-the-art methods.

Collapse

Ali Z, Hayat MF, Shaukat K, Alam TM, Hameed IA, Luo S, Basheer S, Ayadi M, Ksibi A. A Proposed Framework for Early Prediction of Schistosomiasis. Diagnostics (Basel) 2022;12. [PMID: 36553145 DOI: 10.3390/diagnostics12123138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/08/2022] [Accepted: 12/08/2022] [Indexed: 12/15/2022] Open

Abstract

Schistosomiasis is a neglected tropical disease that continues to be a leading cause of illness and mortality around the globe. The causing parasites are affixed to the skin through defiled water and enter the human body. Failure to diagnose Schistosomiasis can result in various medical complications, such as ascites, portal hypertension, esophageal varices, splenomegaly, and growth retardation. Early prediction and identification of risk factors may aid in treating disease before it becomes incurable. We aimed to create a framework by incorporating the most significant features to predict Schistosomiasis using machine learning techniques. A dataset of advanced Schistosomiasis has been employed containing recovery and death cases. A total data of 4316 individuals containing recovery and death cases were included in this research. The dataset contains demographics, socioeconomic, and clinical factors with lab reports. Data preprocessing techniques (missing values imputation, outlier removal, data normalisation, and data transformation) have also been employed for better results. Feature selection techniques, including correlation-based feature selection, Information gain, gain ratio, ReliefF, and OneR, have been utilised to minimise a large number of features. Data resampling algorithms, including Random undersampling, Random oversampling, Cluster Centroid, Near miss, and SMOTE, are applied to address the data imbalance problem. We applied four machine learning algorithms to construct the model: Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting and CatBoost. The performance of the proposed framework has been evaluated based on Accuracy, Precision, Recall and F1-Score. The results of our proposed framework stated that the CatBoost model showed the best performance with the highest accuracy of (87.1%) compared with Gradient Boosting (86%), Light Gradient Boosting (86.7%) and Extreme Gradient Boosting (86.9%). Our proposed framework will assist doctors and healthcare professionals in the early diagnosis of Schistosomiasis.

Collapse

Wang H, Li H, Gao W, Xie J. PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy. Anal Biochem 2022;658:114935. [PMID: 36206844 DOI: 10.1016/j.ab.2022.114935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022]

Abstract

Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.

Collapse

Gu X, Ding Y, Xiao P, He T. A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins. Front Genet 2022;13:935717. [PMID: 36506312 PMCID: PMC9727185 DOI: 10.3389/fgene.2022.935717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/02/2022] [Indexed: 11/24/2022] Open

Jiang L, Jiang J, Wang X, Zhang Y, Zheng B, Liu S, Zhang Y, Liu C, Wan Y, Xiang D, Lv Z. IUP-BERT: Identification of Umami Peptides Based on BERT Features. Foods 2022;11. [PMID: 36429332 DOI: 10.3390/foods11223742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/14/2022] [Accepted: 11/16/2022] [Indexed: 11/23/2022] Open

El Barakaz F, Boutkhoum O, Hanine M, El Moutaouakkil A, Rustam F, Din S, Ashraf I. Optimization of Imbalanced and Multidimensional Learning Under Bayes Minimum Risk and Savings Measure. Big Data 2022;10:425-439. [PMID: 35723636 DOI: 10.1089/big.2021.0225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Okey OD, Maidin SS, Adasme P, Lopes Rosa R, Saadi M, Carrillo Melgarejo D, Zegarra Rodríguez D. BoostedEnML: Efficient Technique for Detecting Cyberattacks in IoT Systems Using Boosted Ensemble Machine Learning. Sensors (Basel) 2022;22:7409. [PMID: 36236506 PMCID: PMC9572777 DOI: 10.3390/s22197409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 06/16/2023]

Abstract

Following the recent advances in wireless communication leading to increased Internet of Things (IoT) systems, many security threats are currently ravaging IoT systems, causing harm to information. Considering the vast application areas of IoT systems, ensuring that cyberattacks are holistically detected to avoid harm is paramount. Machine learning (ML) algorithms have demonstrated high capacity in helping to mitigate attacks on IoT devices and other edge systems with reasonable accuracy. However, the dynamics of operation of intruders in IoT networks require more improved IDS models capable of detecting multiple attacks with a higher detection rate and lower computational resource requirement, which is one of the challenges of IoT systems. Many ensemble methods have been used with different ML classifiers, including decision trees and random forests, to propose IDS models for IoT environments. The boosting method is one of the approaches used to design an ensemble classifier. This paper proposes an efficient method for detecting cyberattacks and network intrusions based on boosted ML classifiers. Our proposed model is named BoostedEnML. First, we train six different ML classifiers (DT, RF, ET, LGBM, AD, and XGB) and obtain an ensemble using the stacking method and another with a majority voting approach. Two different datasets containing high-profile attacks, including distributed denial of service (DDoS), denial of service (DoS), botnets, infiltration, web attacks, heartbleed, portscan, and botnets, were used to train, evaluate, and test the IDS model. To ensure that we obtained a holistic and efficient model, we performed data balancing with synthetic minority oversampling technique (SMOTE) and adaptive synthetic (ADASYN) techniques; after that, we used stratified K-fold to split the data into training, validation, and testing sets. Based on the best two models, we construct our proposed BoostedEnsML model using LightGBM and XGBoost, as the combination of the two classifiers gives a lightweight yet efficient model, which is part of the target of this research. Experimental results show that BoostedEnsML outperformed existing ensemble models in terms of accuracy, precision, recall, F-score, and area under the curve (AUC), reaching 100% in each case on the selected datasets for multiclass classification.

Collapse

Yan Y, Bao X, Chen B, Li Y, Yin J, Zhu G, Li Q. Interpretable machine learning framework reveals microbiome features of oral disease. Microbiol Res 2022;265:127198. [PMID: 36126491 DOI: 10.1016/j.micres.2022.127198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 08/25/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022]

Tang M, Meng C, Wu H, Zhu H, Yi J, Tang J, Wang Y. Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM. Sensors (Basel) 2022;22:s22186763. [PMID: 36146110 PMCID: PMC9505918 DOI: 10.3390/s22186763] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 08/24/2022] [Accepted: 08/25/2022] [Indexed: 05/27/2023]

Prasetiyowati MI, Maulidevi NU, Surendro K. The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy. PeerJ Comput Sci 2022;8:e1041. [PMID: 35875646 PMCID: PMC9299283 DOI: 10.7717/peerj-cs.1041] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 06/22/2022] [Indexed: 06/12/2023]

Kogut T, Tomczak A, Słowik A, Oberski T. Seabed Modelling by Means of Airborne Laser Bathymetry Data and Imbalanced Learning for Offshore Mapping. Sensors (Basel) 2022;22:s22093121. [PMID: 35590809 PMCID: PMC9100212 DOI: 10.3390/s22093121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/15/2022] [Accepted: 04/18/2022] [Indexed: 11/16/2022]

Kumari M, Subbarao N. A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors. Future Med Chem 2022. [PMID: 35393862 DOI: 10.4155/fmc-2021-0290] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Kim J, Mun S, Lee S, Jeong K, Baek Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health 2022;22:664. [PMID: 35387629 PMCID: PMC8985311 DOI: 10.1186/s12889-022-13131-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/30/2022] [Indexed: 01/10/2023] Open

Abstract

Background

Metabolic syndrome (MetS) is a complex condition that appears as a cluster of metabolic abnormalities, and is closely associated with the prevalence of various diseases. Early prediction of the risk of MetS in the middle-aged population provides greater benefits for cardiovascular disease-related health outcomes. This study aimed to apply the latest machine learning techniques to find the optimal MetS prediction model for the middle-aged Korean population.

Methods

We retrieved 20 data types from the Korean Medicine Daejeon Citizen Cohort, a cohort study on a community-based population of adults aged 30–55 years. The data included sex, age, anthropometric data, lifestyle-related data, and blood indicators of 1991 individuals. Participants satisfying two (pre-MetS) or ≥ 3 (MetS) of the five NECP-ATP III criteria were included in the MetS group. MetS prediction used nine machine learning models based on the following algorithms: Decision tree, Gaussian Naïve Bayes, K-nearest neighbor, eXtreme gradient boosting (XGBoost), random forest, logistic regression, support vector machine, multi-layer perceptron, and 1D convolutional neural network. All analyses were performed by sequentially inputting the features in three steps according to their characteristics. The models’ performances were compared after applying the synthetic minority oversampling technique (SMOTE) to resolve data imbalance.

Results

MetS was detected in 33.85% of the subjects. Among the MetS prediction models, the tree-based random forest and XGBoost models showed the best performance, which improved with the number of features used. As a measure of the models’ performance, the area under the receiver operating characteristic curve (AUC) increased by up to 0.091 when the SMOTE was applied, with XGBoost showing the highest AUC of 0.851. Body mass index and waist-to-hip ratio were identified as the most important features in the MetS prediction models for this population.

Conclusions

Tree-based machine learning models were useful in identifying MetS with high accuracy in middle-aged Koreans. Early diagnosis of MetS is important and requires a multidimensional approach that includes self-administered questionnaire, anthropometric, and biochemical measurements.

Collapse

Liu X, Fu L, Chun-Wei Lin J, Liu S. SRAS-net: Low-resolution chromosome image classification based on deep learning. IET Syst Biol 2022;16:85-97. [PMID: 35373918 PMCID: PMC9290780 DOI: 10.1049/syb2.12042] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/14/2022] [Accepted: 03/15/2022] [Indexed: 12/03/2022] Open

Anuntakarun S, Lertampaiporn S, Laomettachit T, Wattanapornprom W, Ruengjitchatchawalya M. mSRFR: a machine learning model using microalgal signature features for ncRNA classification. BioData Min 2022;15:8. [PMID: 35313925 PMCID: PMC8935802 DOI: 10.1186/s13040-022-00291-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 02/06/2022] [Indexed: 11/10/2022] Open

Vu BN, Bi J, Wang W, Huff A, Kondragunta S, Liu Y. Application of geostationary satellite and high-resolution meteorology data in estimating hourly PM_2.5 levels during the Camp Fire episode in California. Remote Sens Environ 2022;271:112890. [PMID: 37033879 PMCID: PMC10081518 DOI: 10.1016/j.rse.2022.112890] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Chen PN, Lee CC, Liang CM, Pao SI, Huang KH, Lin KF. General deep learning model for detecting diabetic retinopathy. BMC Bioinformatics 2021;22:84. [PMID: 34749634 PMCID: PMC8576963 DOI: 10.1186/s12859-021-04005-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 02/08/2021] [Indexed: 01/04/2023] Open

Qasim HM, Ata O, Ansari MA, Alomary MN, Alghamdi S, Almehmadi M. Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem. Medicina (Kaunas) 2021;57:1217. [PMID: 34833435 DOI: 10.3390/medicina57111217] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 10/29/2021] [Accepted: 11/05/2021] [Indexed: 11/16/2022]

Abstract

Background and Objectives: Recently, many studies have focused on the early detection of Parkinson's disease (PD). This disease belongs to a group of neurological problems that immediately affect brain cells and influence the movement, hearing, and various cognitive functions. Medical data sets are often not equally distributed in their classes and this gives a bias in the classification of patients. We performed a Hybrid feature selection framework that can deal with imbalanced datasets like PD. Use the SOMTE algorithm to deal with unbalanced datasets. Removing the contradiction from the features in the dataset and decrease the processing time by using Recursive Feature Elimination (RFE), and Principle Component Analysis (PCA). Materials and Methods: PD acoustic datasets and the characteristics of control subjects were used to construct classification models such as Bagging, K-nearest neighbour (KNN), multilayer perceptron, and the support vector machine (SVM). In the prepressing stage, the synthetic minority over-sampling technique (SMOTE) with two-feature selection RFE and PCA were used. The PD dataset comprises a large difference between the numbers of the infected and uninfected patients, which causes the classification bias problem. Therefore, SMOTE was used to resolve this problem. Results: For model evaluation, the train-test split technique was used for the experiment. All the models were Grid-search tuned, the evaluation results of the SVM model showed the highest accuracy of 98.2%, and the KNN model exhibited the highest specificity of 99%. Conclusions: the proposed method is compared with the current modern methods of detecting Parkinson's disease and other methods for medical diseases, it was noted that our developed system could treat data bias and reach a high prediction of PD and this can be beneficial for health organizations to properly prioritize assets.

Collapse

Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021;14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]

Abstract

Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.

Collapse

Venkata Vara Prasad D, Senthil Kumar P, Venkataramana LY, Prasannamedha G, Harshana S, Jahnavi Srividya S, Harrinei K, Indraganti S. Automating water quality analysis using ML and auto ML techniques. Environ Res 2021;202:111720. [PMID: 34297938 DOI: 10.1016/j.envres.2021.111720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 07/02/2021] [Accepted: 07/09/2021] [Indexed: 06/13/2023]

Garrafa E, Vezzoli M, Ravanelli M, Farina D, Borghesi A, Calza S, Maroldi R. Early prediction of in-hospital death of COVID-19 patients: a machine-learning model based on age, blood analyses, and chest x-ray score. eLife 2021;10:70640. [PMID: 34661530 PMCID: PMC8550757 DOI: 10.7554/elife.70640] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 10/17/2021] [Indexed: 12/15/2022] Open

Hatzidaki E, Iliopoulos A, Papasotiriou I. A Novel Method for Colorectal Cancer Screening Based on Circulating Tumor Cells and Machine Learning. Entropy (Basel) 2021;23:e23101248. [PMID: 34681972 PMCID: PMC8534570 DOI: 10.3390/e23101248] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/20/2021] [Accepted: 09/21/2021] [Indexed: 02/07/2023]

Aldraimli M, Soria D, Grishchuck D, Ingram S, Lyon R, Mistry A, Oliveira J, Samuel R, Shelley LEA, Osman S, Dwek MV, Azria D, Chang-Claude J, Gutiérrez-Enríquez S, De Santis MC, Rosenstein BS, De Ruysscher D, Sperk E, Symonds RP, Stobart H, Vega A, Veldeman L, Webb A, Talbot CJ, West CM, Rattay T, Chaussalet TJ. A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy. Comput Biol Med 2021;135:104624. [PMID: 34247131 DOI: 10.1016/j.compbiomed.2021.104624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/24/2021] [Accepted: 06/28/2021] [Indexed: 11/20/2022]

Abstract

The prediction by classification of side effects incidence in a given medical treatment is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of such algorithms is to use several features to predict dichotomous responses (e.g., disease positive/negative). Similar to statistical inference modelling, ML modelling is subject to the class imbalance problem and is affected by the majority class, increasing the false-negative rate. In this study, seventy-nine ML models were built and evaluated to classify approximately 2000 participants from 26 hospitals in eight different countries into two groups of radiotherapy (RT) side effects incidence based on recorded observations from the international study of RT related toxicity "REQUITE". We also examined the effect of sampling techniques and cost-sensitive learning methods on the models when dealing with class imbalance. The combinations of such techniques used had a significant impact on the classification. They resulted in an improvement in incidence status prediction by shifting classifiers' attention to the minority group. The best classification model for RT acute toxicity prediction was identified based on domain experts' success criteria. The Area Under Receiver Operator Characteristic curve of the models tested with an isolated dataset ranged from 0.50 to 0.77. The scale of improved results is promising and will guide further development of models to predict RT acute toxicities. One model was optimised and found to be beneficial to identify patients who are at risk of developing acute RT early-stage toxicities as a result of undergoing breast RT ensuring relevant treatment interventions can be appropriately targeted. The design of the approach presented in this paper resulted in producing a preclinical-valid prediction model. The study was developed by a multi-disciplinary collaboration of data scientists, medical physicists, oncologists and surgeons in the UK Radiotherapy Machine Learning Network.

Collapse

Affiliation(s)

Mahmoud Aldraimli The Health Innovation Ecosystem, University of Westminster, London, UK.
Daniele Soria School of Computing, University of Kent, Canterbury, UK
Diana Grishchuck Imperial College Healthcare NHS Trust, London, UK
Samuel Ingram Division of Cancer Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, UK
Robert Lyon Department of Computer Science, Edge Hill University, Ormskirk, Lancashire, UK
Anil Mistry Guy's and St Thomas' NHS Foundation Trust, London, UK
Jorge Oliveira Mirada Medical, Oxford, UK
Robert Samuel University of Leeds, Leeds Cancer Centre, St. James's University Hospital, Leeds, UK
Leila E A Shelley Edinburgh Cancer Centre, Western General Hospital, Crewe Road South, Edinburgh, UK
Sarah Osman Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, Belfast, UK
Miriam V Dwek School of Life Sciences, University of Westminster, London, UK
David Azria University of Montpellier, France
Jenny Chang-Claude German Cancer Research Center (DKFZ) Division of Cancer Epidemiology, Unit of Genetic Epidemiology, Heidelberg, Germany
Sara Gutiérrez-Enríquez Vall d'Hebron Institute of Oncology, Barcelona, Spain
Maria Carmen De Santis Dept of Radiation Oncology 1, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
Barry S Rosenstein Prostate Cancer Program, Mount Sinai School of Medicine, New York, USA
Dirk De Ruysscher Maastricht Radiation Oncology (MAASTRO Clinic) University Hospital Maastricht, the Netherlands
Elena Sperk Department of Radiation Oncology, University Medical Center Mannheim, Medical Faculty Mannheim, Heidelberg University, Germany
R Paul Symonds Department of Oncology, Leicester Royal Infirmary, UK
Hilary Stobart Independent Cancer Patients' Voice, London, UK
Ana Vega Fundación Publica Galega Medicina Xenomica, Santiago de Compostela, Spain
Liv Veldeman Department of Basic Medical Sciences, University Hospital Ghent, Belgium
Adam Webb Department of Genetics and Genome Biology, University of Leicester, UK
Christopher J Talbot Cancer Research Centre, University of Leicester, Leicester, UK
Catharine M West Institute of Cancer Sciences, Christie Hospital, Wilmslow Road, Manchester, UK
Tim Rattay Cancer Research Centre, University of Leicester, Leicester, UK
Thierry J Chaussalet The Health Innovation Ecosystem, University of Westminster, London, UK

Collapse