1
|
Li N, Ma L, Yu G, Xue B, Zhang M, Jin Y. Survey on Evolutionary Deep Learning: Principles, Algorithms, Applications, and Open Issues. ACM COMPUTING SURVEYS 2024; 56:1-34. [DOI: 10.1145/3603704] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 05/31/2023] [Indexed: 01/04/2025]
Abstract
Over recent years, there has been a rapid development of deep learning (DL) in both industry and academia fields. However, finding the optimal hyperparameters of a DL model often needs high computational cost and human expertise. To mitigate the above issue, evolutionary computation (EC) as a powerful heuristic search approach has shown significant merits in the automated design of DL models, so-called evolutionary deep learning (EDL). This article aims to analyze EDL from the perspective of automated machine learning (AutoML). Specifically, we first illuminate EDL from DL and EC and regard EDL as an optimization problem. According to the DL pipeline, we systematically introduce EDL methods ranging from data preparation, model generation, to model deployment with a new taxonomy (i.e., what and how to evolve/optimize), and focus on the discussions of solution representation and search paradigm in handling the optimization problem by EC. Finally, key applications, open issues, and potentially promising lines of future research are suggested. This survey has reviewed recent developments of EDL and offers insightful guidelines for the development of EDL.
Collapse
Affiliation(s)
- Nan Li
- Northeastern University, China
| | | | - Guo Yu
- Nanjing Tech University, China
| | - Bing Xue
- Victoria University of Wellington, New Zealand
| | | | | |
Collapse
|
2
|
Yaqoob A, Verma NK, Aziz RM. Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm. J Med Syst 2024; 48:10. [PMID: 38193948 DOI: 10.1007/s10916-023-02031-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/28/2023] [Indexed: 01/10/2024]
Abstract
Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| | - Rabia Musheer Aziz
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| |
Collapse
|
3
|
Mahto R, Ahmed SU, Rahman RU, Aziz RM, Roy P, Mallik S, Li A, Shah MA. A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection. BMC Bioinformatics 2023; 24:479. [PMID: 38102551 PMCID: PMC10724960 DOI: 10.1186/s12859-023-05605-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/06/2023] [Indexed: 12/17/2023] Open
Abstract
Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.
Collapse
Affiliation(s)
- Rajul Mahto
- School of Computing Science and Engineering, VIT Bhopal University, Kothrikalan, Sehore, Madhya Pradesh, 46611, India
| | - Saboor Uddin Ahmed
- School of Computing Science and Engineering, VIT Bhopal University, Kothrikalan, Sehore, Madhya Pradesh, 46611, India
| | - Rizwan Ur Rahman
- School of Computing Science and Engineering, VIT Bhopal University, Kothrikalan, Sehore, Madhya Pradesh, 46611, India
| | - Rabia Musheer Aziz
- School of Advanced Sciences and Language, VIT Bhopal University, Kothrikalan, Sehore, Madhya Pradesh, 46611, India
| | - Priyanka Roy
- School of Advanced Sciences and Language, VIT Bhopal University, Kothrikalan, Sehore, Madhya Pradesh, 46611, India.
| | - Saurav Mallik
- Molecular and Integrative Physiological Sciences, Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, 85721, USA.
| | - Aimin Li
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- School of Computer Science and Engineering, Xi'an University of Technology, Shaanxi, 710048, China
| | - Mohd Asif Shah
- Department of Economics, Kebri Dehar University, Kebri Dehar, 250, Somali, Ethiopia.
- Division of Research and Development, Lovely Professional University, Phagwara, Punjab, 144001, India.
- Centre for Research Impact & Outcome, Chitkara University, Rajpura, Punjab, 140401, India.
| |
Collapse
|
4
|
Traditional machine learning algorithms for breast cancer image classification with optimized deep features. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
5
|
Chamlal H, Ouaderhman T, Aaboub F. A graph based preordonnances theoretic supervised feature selection in high dimensional data. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
6
|
A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis. ScientificWorldJournal 2022; 2022:1056490. [PMID: 35983572 PMCID: PMC9381276 DOI: 10.1155/2022/1056490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/20/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, F-statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.
Collapse
|
7
|
EGFAFS: A Novel Feature Selection Algorithm Based on Explosion Gravitation Field Algorithm. ENTROPY 2022; 24:e24070873. [PMID: 35885095 PMCID: PMC9322764 DOI: 10.3390/e24070873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 02/04/2023]
Abstract
Feature selection (FS) is a vital step in data mining and machine learning, especially for analyzing the data in high-dimensional feature space. Gene expression data usually consist of a few samples characterized by high-dimensional feature space. As a result, they are not suitable to be processed by simple methods, such as the filter-based method. In this study, we propose a novel feature selection algorithm based on the Explosion Gravitation Field Algorithm, called EGFAFS. To reduce the dimensions of the feature space to acceptable dimensions, we constructed a recommended feature pool by a series of Random Forests based on the Gini index. Furthermore, by paying more attention to the features in the recommended feature pool, we can find the best subset more efficiently. To verify the performance of EGFAFS for FS, we tested EGFAFS on eight gene expression datasets compared with four heuristic-based FS methods (GA, PSO, SA, and DE) and four other FS methods (Boruta, HSICLasso, DNN-FS, and EGSG). The results show that EGFAFS has better performance for FS on gene expression data in terms of evaluation metrics, having more than the other eight FS algorithms. The genes selected by EGFAGS play an essential role in the differential co-expression network and some biological functions further demonstrate the success of EGFAFS for solving FS problems on gene expression data.
Collapse
|
8
|
Aziz RM. Cuckoo Search-Based Optimization for Cancer Classification: A New Hybrid Approach. J Comput Biol 2022; 29:565-584. [DOI: 10.1089/cmb.2021.0410] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
|
9
|
Kundu R, Chattopadhyay S, Cuevas E, Sarkar R. AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets. Comput Biol Med 2022; 144:105349. [PMID: 35303580 DOI: 10.1016/j.compbiomed.2022.105349] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 02/22/2022] [Accepted: 02/22/2022] [Indexed: 12/15/2022]
Abstract
The data-driven modern era has enabled the collection of large amounts of biomedical and clinical data. DNA microarray gene expression datasets have mainly gained significant attention to the research community owing to their ability to identify diseases through the "bio-markers" or specific alterations in the gene sequence that represent that particular disease (for example, different types of cancer). However, gene expression datasets are very high-dimensional, while only a few of those are "bio-markers". Meta-heuristic-based feature selection effectively filters out only the relevant genes from a large set of attributes efficiently to reduce data storage and computation requirements. To this end, in this paper, we propose an Altruistic Whale Optimization Algorithm (AltWOA) for the feature selection problem in high-dimensional microarray data. AltWOA is an improvement on the basic Whale Optimization Algorithm. We embed the concept of altruism in the whale population to help efficient propagation of candidate solutions that can reach the global optima over the iterations. Evaluation of the proposed method on eight high dimensional microarray datasets reveals the superiority of AltWOA compared to popular and classical techniques in the literature on the same datasets both in terms of accuracy and the final number of features selected. The relevant codes for the proposed approach are available publicly at https://github.com/Rohit-Kundu/AltWOA.
Collapse
Affiliation(s)
- Rohit Kundu
- Department of Electrical Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Soham Chattopadhyay
- Department of Electrical Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Erik Cuevas
- Departamento de Electrónica, Universidad de Guadalajara, CUCEI, Av. Revolución 1500, Guadalajara, Jal, Mexico.
| | - Ram Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, 700032, India.
| |
Collapse
|
10
|
Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06775-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
11
|
Khoder A, Dornaika F. Ensemble learning via feature selection and multiple transformed subsets: Application to image classification. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.108006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|