1
|
Nourian R, Motamedi SA, Pourfard M. BHBA-GRNet: Cancer detection through improved gene expression profiling using Binary Honey Badger Algorithm and Gene Residual-based Network. Comput Biol Med 2025; 184:109348. [PMID: 39615230 DOI: 10.1016/j.compbiomed.2024.109348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 10/29/2024] [Accepted: 10/30/2024] [Indexed: 12/22/2024]
Abstract
Cancer, a pervasive and devastating disease, remains a leading global cause of mortality, emphasizing the growing urgency for effective detection methods. Gene Expression Microarray (GEM) data has emerged as a crucial tool in this context, offering insights into early cancer detection and treatment. While deep learning methods offer promise in detecting various cancers through GEM analysis, they suffer from high dimensionality inherent in gene sequences, preventing optimal detection performance across diverse cancer types. Additionally, existing methods often resort to synthetic features and data augmentation to enhance performance. To address these challenges and enhance accuracy, a novel Binary Honey Badger Algorithm (BHBA) integrated with the Gene Residual Network (GRNet) method has been proposed. Our approach capitalizes on BHBA's feature reduction mechanism, eliminating the need for additional preprocessing steps. Comprehensive evaluations on three well-established datasets representing lung and blood-type cancers demonstrate that our method reduces GEM data size by approximately 40 % and achieves a superior accuracy improvement of around 1 % in lung cancer types compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Reza Nourian
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| | - Seyed Ahmad Motamedi
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| | - Mohammadreza Pourfard
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| |
Collapse
|
2
|
Lončar B, Pezo L, Knežević V, Nićetin M, Filipović J, Petković M, Filipović V. Enhancing Cookie Formulations with Combined Dehydrated Peach: A Machine Learning Approach for Technological Quality Assessment and Optimization. Foods 2024; 13:782. [PMID: 38472895 DOI: 10.3390/foods13050782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024] Open
Abstract
This study focuses on predicting and optimizing the quality parameters of cookies enriched with dehydrated peach through the application of Support Vector Machine (SVM) and Artificial Neural Network (ANN) models. The purpose of the study is to employ advanced machine learning techniques to understand the intricate relationships between input parameters, such as the presence of dehydrated peach and treatment methods (lyophilization and lyophilization with osmotic pretreatment), and output variables representing various quality aspects of cookies. For each of the 32 outputs, including the parameters of the basic chemical compositions of the cookie samples, selected mineral contents, moisture contents, baking characteristics, color properties, sensorial attributes, and antioxidant properties, separate models were constructed using SVMs and ANNs. Results showcase the efficiency of ANN models in predicting a diverse set of quality parameters with r2 up to 1.000, with SVM models exhibiting slightly higher coefficients of determination for specific variables with r2 reaching 0.981. The sensitivity analysis underscores the pivotal role of dehydrated peach and the positive influence of osmotic pretreatment on specific compositional attributes. Utilizing established Artificial Neural Network models, multi-objective optimization was conducted, revealing optimal formulation and factor values in cookie quality optimization. The optimal quantity of lyophilized peach with osmotic pretreatment for the cookie formulation was identified as 15%.
Collapse
Affiliation(s)
- Biljana Lončar
- Faculty of Technology Novi Sad, University of Novi Sad, Bulevar Cara Lazara 1, 21000 Novi Sad, Serbia
| | - Lato Pezo
- Institute of General and Physical Chemistry, Studentski trg 12/V, 11000 Belgrade, Serbia
| | - Violeta Knežević
- Faculty of Technology Novi Sad, University of Novi Sad, Bulevar Cara Lazara 1, 21000 Novi Sad, Serbia
| | - Milica Nićetin
- Faculty of Technology Novi Sad, University of Novi Sad, Bulevar Cara Lazara 1, 21000 Novi Sad, Serbia
| | - Jelena Filipović
- Institute of Food Technology in Novi Sad, University of Novi Sad, Bulevar Cara Lazara 1, 21000 Novi Sad, Serbia
| | - Marko Petković
- Faculty of Agronomy, University of Kragujevac, Cara Dušana 34, 32102 Čačak, Serbia
| | - Vladimir Filipović
- Faculty of Technology Novi Sad, University of Novi Sad, Bulevar Cara Lazara 1, 21000 Novi Sad, Serbia
| |
Collapse
|
3
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Ikotun AM, Mohammed M. A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data. Sci Rep 2023; 13:14644. [PMID: 37670037 PMCID: PMC10480180 DOI: 10.1038/s41598-023-41731-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 08/30/2023] [Indexed: 09/07/2023] Open
Abstract
Breast cancer is considered one of the significant health challenges and ranks among the most prevalent and dangerous cancer types affecting women globally. Early breast cancer detection and diagnosis are crucial for effective treatment and personalized therapy. Early detection and diagnosis can help patients and physicians discover new treatment options, provide a more suitable quality of life, and ensure increased survival rates. Breast cancer detection using gene expression involves many complexities, such as the issue of dimensionality and the complicatedness of the gene expression data. This paper proposes a bio-inspired CNN model for breast cancer detection using gene expression data downloaded from the cancer genome atlas (TCGA). The data contains 1208 clinical samples of 19,948 genes with 113 normal and 1095 cancerous samples. In the proposed model, Array-Array Intensity Correlation (AAIC) is used at the pre-processing stage for outlier removal, followed by a normalization process to avoid biases in the expression measures. Filtration is used for gene reduction using a threshold value of 0.25. Thereafter the pre-processed gene expression dataset was converted into images which were later converted to grayscale to meet the requirements of the model. The model also uses a hybrid model of CNN architecture with a metaheuristic algorithm, namely the Ebola Optimization Search Algorithm (EOSA), to enhance the detection of breast cancer. The traditional CNN and five hybrid algorithms were compared with the classification result of the proposed model. The competing hybrid algorithms include the Whale Optimization Algorithm (WOA-CNN), the Genetic Algorithm (GA-CNN), the Satin Bowerbird Optimization (SBO-CNN), the Life Choice-Based Optimization (LCBO-CNN), and the Multi-Verse Optimizer (MVO-CNN). The results show that the proposed model determined the classes with high-performance measurements with an accuracy of 98.3%, a precision of 99%, a recall of 99%, an f1-score of 99%, a kappa of 90.3%, a specificity of 92.8%, and a sensitivity of 98.9% for the cancerous class. The results suggest that the proposed method has the potential to be a reliable and precise approach to breast cancer detection, which is crucial for early diagnosis and personalized therapy.
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa.
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Abiodun M Ikotun
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| |
Collapse
|
4
|
Houssein EH, Samee NA, Mahmoud NF, Hussain K. Dynamic Coati Optimization Algorithm for Biomedical Classification Tasks. Comput Biol Med 2023; 164:107237. [PMID: 37467535 DOI: 10.1016/j.compbiomed.2023.107237] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/21/2023]
Abstract
Medical datasets are primarily made up of numerous pointless and redundant elements in a collection of patient records. None of these characteristics are necessary for a medical decision-making process. Conversely, a large amount of data leads to increased dimensionality and decreased classifier performance in terms of machine learning. Numerous approaches have recently been put out to address this issue, and the results indicate that feature selection can be a successful remedy. To meet the various needs of input patterns, medical diagnostic tasks typically involve learning a suitable categorization model. The k-Nearest Neighbors algorithm (kNN) classifier's classification performance is typically decreased by the input variables' abundance of irrelevant features. To simplify the kNN classifier, essential attributes of the input variables have been searched using the feature selection approach. This paper presents the Coati Optimization Algorithm (DCOA) in a dynamic form as a feature selection technique where each iteration of the optimization process involves the introduction of a different feature. We enhance the exploration and exploitation capability of DCOA by employing dynamic opposing candidate solutions. The most impressive feature of DCOA is that it does not require any preparatory parameter fine-tuning to the most popular metaheuristic algorithms. The CEC'22 test suite and nine medical datasets with various dimension sizes were used to evaluate the performance of the original COA and the proposed dynamic version. The statistical results were validated using the Bonferroni-Dunn test and Kendall's W test and showed the superiority of DCOA over seven well-known metaheuristic algorithms with an overall accuracy of 89.7%, a feature selection of 24%, a sensitivity of 93.35% a specificity of 96.81%, and a precision of 93.90%.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Noha F Mahmoud
- Rehabilitation Sciences Department, Health and Rehabilitation Sciences College, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Kashif Hussain
- Department of Science and Engineering, Solent University, East Park Terrace, Southampton, SO14 0YN, United Kingdom.
| |
Collapse
|
5
|
M S K, Rajaguru H, Nair AR. Evaluation and Exploration of Machine Learning and Convolutional Neural Network Classifiers in Detection of Lung Cancer from Microarray Gene-A Paradigm Shift. Bioengineering (Basel) 2023; 10:933. [PMID: 37627818 PMCID: PMC10451477 DOI: 10.3390/bioengineering10080933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 08/27/2023] Open
Abstract
Microarray gene expression-based detection and classification of medical conditions have been prominent in research studies over the past few decades. However, extracting relevant data from the high-volume microarray gene expression with inherent nonlinearity and inseparable noise components raises significant challenges during data classification and disease detection. The dataset used for the research is the Lung Harvard 2 Dataset (LH2) which consists of 150 Adenocarcinoma subjects and 31 Mesothelioma subjects. The paper proposes a two-level strategy involving feature extraction and selection methods before the classification step. The feature extraction step utilizes Short Term Fourier Transform (STFT), and the feature selection step employs Particle Swarm Optimization (PSO) and Harmonic Search (HS) metaheuristic methods. The classifiers employed are Nonlinear Regression, Gaussian Mixture Model, Softmax Discriminant, Naive Bayes, SVM (Linear), SVM (Polynomial), and SVM (RBF). The two-level extracted relevant features are compared with raw data classification results, including Convolutional Neural Network (CNN) methodology. Among the methods, STFT with PSO feature selection and SVM (RBF) classifier produced the highest accuracy of 94.47%.
Collapse
Affiliation(s)
- Karthika M S
- Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Ajin R. Nair
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| |
Collapse
|
6
|
Wang Z, Zhou Y, Takagi T, Song J, Tian YS, Shibuya T. Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data. BMC Bioinformatics 2023; 24:139. [PMID: 37031189 PMCID: PMC10082986 DOI: 10.1186/s12859-023-05267-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/02/2023] [Indexed: 04/10/2023] Open
Abstract
BACKGROUND Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
Collapse
Affiliation(s)
- Zixuan Wang
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
| | - Yi Zhou
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
| | - Tatsuya Takagi
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Jiangning Song
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC, 3800, Australia
| | - Yu-Shi Tian
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Tetsuo Shibuya
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
7
|
A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2022.110249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
8
|
Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06775-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
9
|
A Modified Memetic Algorithm with an Application to Gene Selection in a Sheep Body Weight Study. Animals (Basel) 2022; 12:ani12020201. [PMID: 35049823 PMCID: PMC8772977 DOI: 10.3390/ani12020201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 01/06/2022] [Accepted: 01/14/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary Due to lacking exploitation capability, traditional genetic algorithm cannot accurately identify the minimal best gene subset. Thus, the improved splicing method is introduced into a genetic algorithm to enhance exploitation capability for achieving balance between exploitation and exploration of GA. It can effectively identify true gene subsets with high probability. Furthermore, a dataset of the body weight of Hu sheep has been used to show that the proposed method can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including genetic algorithm and adaptive best-subset selection algorithm. Abstract Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.
Collapse
|
10
|
Cheng F, Chu F, Zhang L. A multi-objective evolutionary algorithm based on length reduction for large-scale instance selection. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.052] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Qu C, Zhang L, Li J, Deng F, Tang Y, Zeng X, Peng X. Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning. Brief Bioinform 2021; 22:6238587. [PMID: 33876181 DOI: 10.1093/bib/bbab097] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/28/2021] [Accepted: 03/03/2021] [Indexed: 11/14/2022] Open
Abstract
Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.
Collapse
Affiliation(s)
- Chiwen Qu
- College of Mathematics and Statistics, Hunan Normal University, China
| | - Lupeng Zhang
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Jinlong Li
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Fang Deng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Xiaoning Peng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| |
Collapse
|
12
|
Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF. A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft comput 2021. [DOI: 10.1007/s00500-021-05726-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
13
|
Tripathi D, Edla DR, Bablani A, Shukla AK, Reddy BR. Experimental analysis of machine learning methods for credit score classification. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00238-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
14
|
|