1
|
Esfandiari A, Nasiri N. Gene selection and cancer classification using interaction-based feature clustering and improved-binary Bat algorithm. Comput Biol Med 2024; 181:109071. [PMID: 39205342 DOI: 10.1016/j.compbiomed.2024.109071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 08/13/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
In high-dimensional gene expression data, selecting an optimal subset of genes is crucial for achieving high classification accuracy and reliable diagnosis of diseases. This paper proposes a two-stage hybrid model for gene selection based on clustering and a swarm intelligence algorithm to identify the most informative genes with high accuracy. First, a clustering-based multivariate filter approach is performed to explore the interactions between the features and eliminate any redundant or irrelevant ones. Then, by controlling for the problem of premature convergence in the binary Bat algorithm, the optimal gene subset is determined using different classifiers with the Monte Carlo cross-validation data partitioning model. The effectiveness of our proposed framework is evaluated using eight gene expression datasets, by comparison with other recently published algorithms in the literature. Experiments confirm that in seven out of eight datasets, the proposed method can achieve superior results in terms of classification accuracy and gene subset size. In particular, it achieves a classification accuracy of 100% in Lymphoma and Ovarian datasets and above 97.4% in the rest with a minimum number of genes. The results demonstrate that our proposed algorithm has the potential to solve the feature selection problem in different applications with high-dimensional datasets.
Collapse
Affiliation(s)
- Ahmad Esfandiari
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran.
| | - Niki Nasiri
- Pediatric Infectious Diseases Research Center, Communicable Diseases Institute, Mazandaran University of Medical Sciences, Sari, Iran
| |
Collapse
|
2
|
M S K, Rajaguru H, Nair AR. Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data-In Pursuit of Precision. Bioengineering (Basel) 2024; 11:314. [PMID: 38671736 PMCID: PMC11047746 DOI: 10.3390/bioengineering11040314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/18/2024] [Accepted: 03/20/2024] [Indexed: 04/28/2024] Open
Abstract
Microarray gene expression analysis is a powerful technique used in cancer classification and research to identify and understand gene expression patterns that can differentiate between different cancer types, subtypes, and stages. However, microarray databases are highly redundant, inherently nonlinear, and noisy. Therefore, extracting meaningful information from such a huge database is a challenging one. The paper adopts the Fast Fourier Transform (FFT) and Mixture Model (MM) for dimensionality reduction and utilises the Dragonfly optimisation algorithm as the feature selection technique. The classifiers employed in this research are Nonlinear Regression, Naïve Bayes, Decision Tree, Random Forest and SVM (RBF). The classifiers' performances are analysed with and without feature selection methods. Finally, Adaptive Moment Estimation (Adam) and Random Adaptive Moment Estimation (RanAdam) hyper-parameter tuning techniques are used as improvisation techniques for classifiers. The SVM (RBF) classifier with the Fast Fourier Transform Dimensionality Reduction method and Dragonfly feature selection achieved the highest accuracy of 98.343% with RanAdam hyper-parameter tuning compared to other classifiers.
Collapse
Affiliation(s)
- Karthika M S
- Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Ajin R. Nair
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| |
Collapse
|
3
|
An ensemble framework for microarray data classification based on feature subspace partitioning. Comput Biol Med 2022; 148:105820. [PMID: 35872409 DOI: 10.1016/j.compbiomed.2022.105820] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 06/05/2022] [Accepted: 07/03/2022] [Indexed: 12/14/2022]
Abstract
Feature selection is exposed to the curse of dimensionality risk, and it is even more exacerbated with high-dimensional data such as microarrays. Moreover, the low-instance/high-feature (LIHF) property of microarray data needs considerable processing time to do some calculations and comparisons among features to choose the best subset of them, which has led to many efforts to subdue the LIHF property of such genomic medicine data. Due to the promising results of the ensemble models in machine learning problems, this paper presents a novel framework, named feature-level aggregation-based ensemble based on overlapped feature subspace partitioning (FLAE-OFSP) for microarray data classification. The proposed ensemble has three main steps: after generating several subsets by the proposed partitioning approach, a feature selection algorithm (i.e., a feature ranker) is applied on each subset, and finally, their results are combined into a single ranked list using six defined aggregation functions. Evaluation of the presented framework based on seven microarray datasets and using four measures, including stability, classification accuracy, runtime, and Modscore shows substantial runtime improvement and also quality results in other evaluated measures compared to individual methods.
Collapse
|
4
|
Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 2022; 60:1627-1646. [PMID: 35399141 DOI: 10.1007/s11517-022-02555-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 03/16/2022] [Indexed: 12/19/2022]
Abstract
Identifying a small subset of informative genes from a gene expression dataset is an important process for sample classification in the fields of bioinformatics and machine learning. In this process, there are two objectives: first, to minimize the number of selected genes, and second, to maximize the classification accuracy of the used classifier. In this paper, a hybrid machine learning framework based on a nature-inspired cuckoo search (CS) algorithm has been proposed to resolve this problem. The proposed framework is obtained by incorporating the cuckoo search (CS) algorithm with an artificial bee colony (ABC) in the exploitation and exploration of the genetic algorithm (GA). These strategies are used to maintain an appropriate balance between the exploitation and exploration phases of the ABC and GA algorithms in the search process. In preprocessing, the independent component analysis (ICA) method extracts the important genes from the dataset. Then, the proposed gene selection algorithms along with the Naive Bayes (NB) classifier and leave-one-out cross-validation (LOOCV) have been applied to find a small set of informative genes that maximize the classification accuracy. To conduct a comprehensive performance study, proposed algorithms have been applied on six benchmark datasets of gene expression. The experimental comparison shows that the proposed framework (ICA and CS-based hybrid algorithm with NB classifier) performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared to the previously published feature selection algorithm for the NB classifier.
Collapse
|
5
|
Aziz RM. Cuckoo Search-Based Optimization for Cancer Classification: A New Hybrid Approach. J Comput Biol 2022; 29:565-584. [DOI: 10.1089/cmb.2021.0410] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
|
6
|
Guney H, Oztoprak H. A robust ensemble feature selection technique for high‐dimensional datasets based on minimum weight threshold method. Comput Intell 2022. [DOI: 10.1111/coin.12524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Huseyin Guney
- Computer Engineering Department Bahçeşehir Cyprus University Nicosia North Cyprus Turkey
| | - Huseyin Oztoprak
- Electrical and Electronics Engineering Department Cyprus International University Nicosia North Cyprus Turkey
| |
Collapse
|
7
|
Aziz RM. Application of nature inspired soft computing techniques for gene selection: a novel frame work for classification of cancer. Soft comput 2022. [DOI: 10.1007/s00500-022-07032-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
8
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Feature selection methods on gene expression microarray data for cancer classification: A systematic review. Comput Biol Med 2022; 140:105051. [PMID: 34839186 DOI: 10.1016/j.compbiomed.2021.105051] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 11/01/2021] [Accepted: 11/15/2021] [Indexed: 11/29/2022]
Abstract
This systematic review provides researchers interested in feature selection (FS) for processing microarray data with comprehensive information about the main research directions for gene expression classification conducted during the recent seven years. A set of 132 researches published by three different publishers is reviewed. The studied papers are categorized into nine directions based on their objectives. The FS directions that received various levels of attention were then summarized. The review revealed that 'propose hybrid FS methods' represented the most interesting research direction with a percentage of 34.9%, while the other directions have lower percentages that ranged from 13.6% down to 3%. This guides researchers to select the most competitive research direction. Papers in each category are thoroughly reviewed based on six perspectives, mainly: method(s), classifier(s), dataset(s), dataset dimension(s) range, performance metric(s), and result(s) achieved.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006, QLD, Australia; Yonsei Frontier Lab, Yonsei University, Seoul, South Korea.
| |
Collapse
|
9
|
Mishra P, Bhoi N. Cancer gene recognition from microarray data with manta ray based enhanced ANFIS technique. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.06.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
10
|
Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021; 627:114242. [PMID: 33974890 DOI: 10.1016/j.ab.2021.114242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/12/2021] [Accepted: 05/02/2021] [Indexed: 11/18/2022]
Abstract
This paper introduces a new hybrid approach (DBH) for solving gene selection problem that incorporates the strengths of two existing metaheuristics: binary dragonfly algorithm (BDF) and binary black hole algorithm (BBHA). This hybridization aims to identify a limited and stable set of discriminative genes without sacrificing classification accuracy, whereas most current methods have encountered challenges in extracting disease-related information from a vast amount of redundant genes. The proposed approach first applies the minimum redundancy maximum relevancy (MRMR) filter method to reduce the dimensionality of feature space and then utilizes the suggested hybrid DBH algorithm to determine a smaller set of significant genes. The proposed approach was evaluated on eight benchmark gene expression datasets, and then, was compared against the latest state-of-art techniques to demonstrate algorithm efficiency. The comparative study shows that the proposed approach achieves a significant improvement as compared with existing methods in terms of classification accuracy and the number of selected genes. Moreover, the performance of the suggested method was examined on real RNA-Seq coronavirus-related gene expression data of asthmatic patients for selecting the most significant genes in order to improve the discriminative accuracy of angiotensin-converting enzyme 2 (ACE2). ACE2, as a coronavirus receptor, is a biomarker that helps to classify infected patients from uninfected in order to identify subgroups at risk for COVID-19. The result denotes that the suggested MRMR-DBH approach represents a very promising framework for finding a new combination of most discriminative genes with high classification accuracy.
Collapse
Affiliation(s)
- Elnaz Pashaei
- Department of Software Engineering, Istanbul Aydin University, Istanbul, Turkey.
| | - Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
11
|
Qu C, Zhang L, Li J, Deng F, Tang Y, Zeng X, Peng X. Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning. Brief Bioinform 2021; 22:6238587. [PMID: 33876181 DOI: 10.1093/bib/bbab097] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/28/2021] [Accepted: 03/03/2021] [Indexed: 11/14/2022] Open
Abstract
Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.
Collapse
Affiliation(s)
- Chiwen Qu
- College of Mathematics and Statistics, Hunan Normal University, China
| | - Lupeng Zhang
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Jinlong Li
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Fang Deng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Xiaoning Peng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| |
Collapse
|
12
|
Dagnew G, Shekar B. Ensemble learning‐based classification of microarray cancer data on tree‐based features. COGNITIVE COMPUTATION AND SYSTEMS 2021. [DOI: 10.1049/ccs2.12003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Affiliation(s)
- Guesh Dagnew
- Department of Studies and Research in Computer Science Mangalore University Mangalore Karnataka India
| | - B.H. Shekar
- Department of Studies and Research in Computer Science Mangalore University Mangalore Karnataka India
| |
Collapse
|
13
|
Hassaine A, Salimi-Khorshidi G, Canoy D, Rahimi K. Untangling the complexity of multimorbidity with machine learning. Mech Ageing Dev 2020; 190:111325. [PMID: 32768443 PMCID: PMC7493712 DOI: 10.1016/j.mad.2020.111325] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 12/20/2022]
Abstract
The prevalence of multimorbidity has been increasing in recent years, posing a major burden for health care delivery and service. Understanding its determinants and impact is proving to be a challenge yet it offers new opportunities for research to go beyond the study of diseases in isolation. In this paper, we review how the field of machine learning provides many tools for addressing research challenges in multimorbidity. We highlight recent advances in promising methods such as matrix factorisation, deep learning, and topological data analysis and how these can take multimorbidity research beyond cross-sectional, expert-driven or confirmatory approaches to gain a better understanding of evolving patterns of multimorbidity. We discuss the challenges and opportunities of machine learning to identify likely causal links between previously poorly understood disease associations while giving an estimate of the uncertainty on such associations. We finally summarise some of the challenges for wider clinical adoption of machine learning research tools and propose some solutions.
Collapse
Affiliation(s)
- Abdelaali Hassaine
- Deep Medicine, Oxford Martin School, University of Oxford, Oxford, United Kingdom; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom; Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Gholamreza Salimi-Khorshidi
- Deep Medicine, Oxford Martin School, University of Oxford, Oxford, United Kingdom; Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Dexter Canoy
- Deep Medicine, Oxford Martin School, University of Oxford, Oxford, United Kingdom; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom; Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Kazem Rahimi
- Deep Medicine, Oxford Martin School, University of Oxford, Oxford, United Kingdom; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom; Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom.
| |
Collapse
|
14
|
A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 2020; 107:103466. [DOI: 10.1016/j.jbi.2020.103466] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 05/01/2020] [Accepted: 05/31/2020] [Indexed: 01/09/2023]
|
15
|
Shukla AK. Feature selection inspired by human intelligence for improving classification accuracy of cancer types. Comput Intell 2020. [DOI: 10.1111/coin.12341] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Alok Kumar Shukla
- Department of Computer Science & EngineeringG.L. Bajaj Institute of Technology and Management Gr. Noida India
| |
Collapse
|
16
|
Shukla AK, Pippal SK, Gupta S, Ramachandra Reddy B, Tripathi D. Knowledge discovery in medical and biological datasets by integration of Relief-F and correlation feature selection techniques. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179743] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Alok Kumar Shukla
- Department of CSE, G.L. Bajaj Institute of Technology & Management, Greater Noida, India
| | - Sanjeev Kumar Pippal
- Department of CSE, G.L. Bajaj Institute of Technology & Management, Greater Noida, India
| | | | | | | |
Collapse
|
17
|
Heuristic filter feature selection methods for medical datasets. Genomics 2020; 112:1173-1181. [DOI: 10.1016/j.ygeno.2019.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 06/19/2019] [Accepted: 07/01/2019] [Indexed: 11/23/2022]
|
18
|
Detecting biomarkers from microarray data using distributed correlation based gene selection. Genes Genomics 2020; 42:449-465. [PMID: 32040771 DOI: 10.1007/s13258-020-00916-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Accepted: 01/23/2020] [Indexed: 01/16/2023]
Abstract
BACKGROUND Over the past few decades, DNA microarray technology has emerged as a prevailing process for early identification of cancer subtypes. Several feature selection (FS) techniques have been widely applied for identifying cancer from microarray gene data but only very few studies have been conducted on distributing the feature selection process for detecting cancer subtypes. OBJECTIVE Not all the gene expressions are needed in prediction, this research article objective is to select discriminative biomarkers by using distributed FS method which helps in accurately diagnosis of cancer subtype. Traditional feature selection techniques have several drawbacks like unrelated features that could perform well in terms of classification accuracy with a suitable subset of genes will be left out of the selection. METHOD To overcome the issue, in this paper a new filter-based method for gene selection is introduced which can select the highly relevant genes for distinguishing tissues from the gene expression dataset. In addition, it is used to compute the relation between gene-gene and gene-class and simultaneously identify subset of essential genes. Our method is tested on Diffuse Large B cell Lymphoma (DLBCL) dataset by using well-known classification techniques such as support vector machine, naïve Bayes, k-nearest neighbor, and decision tree. RESULTS Results on biological DLBCL dataset demonstrate that the proposed method provides promising tools for the prediction of cancer type, with the prediction accuracy of 97.62%, precision of 94.23%, sensitivity of 94.12%, F-measure of 90.12%, and ROC value of 99.75%. CONCLUSION The experimental results reveal the fact that the proposed method is significantly improved classification accuracy and execution time, compared to existing standard algorithms when applied to the non-partitioned dataset. Furthermore, the extracted genes are biologically sound and agree with the outcome of relevant biomedical studies.
Collapse
|
19
|
Particle Swarm Optimized Hybrid Kernel-Based Multiclass Support Vector Machine for Microarray Cancer Data Analysis. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4085725. [PMID: 31998772 PMCID: PMC6973196 DOI: 10.1155/2019/4085725] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 10/26/2019] [Accepted: 11/21/2019] [Indexed: 11/17/2022]
Abstract
Determining an optimal decision model is an important but difficult combinatorial task in imbalanced microarray-based cancer classification. Though the multiclass support vector machine (MCSVM) has already made an important contribution in this field, its performance solely depends on three aspects: the penalty factor C, the type of kernel, and its parameters. To improve the performance of this classifier in microarray-based cancer analysis, this paper proposes PSO-PCA-LGP-MCSVM model that is based on particle swarm optimization (PSO), principal component analysis (PCA), and multiclass support vector machine (MCSVM). The MCSVM is based on a hybrid kernel, i.e., linear-Gaussian-polynomial (LGP) that combines the advantages of three standard kernels (linear, Gaussian, and polynomial) in a novel manner, where the linear kernel is linearly combined with the Gaussian kernel embedding the polynomial kernel. Further, this paper proves and makes sure that the LGP kernel confirms the features of a valid kernel. In order to reveal the effectiveness of our model, several experiments were conducted and the obtained results compared between our model and other three single kernel-based models, namely, PSO-PCA-L-MCSVM (utilizing a linear kernel), PSO-PCA-G-MCSVM (utilizing a Gaussian kernel), and PSO-PCA-P-MCSVM (utilizing a polynomial kernel). In comparison, two dual and two multiclass imbalanced standard microarray datasets were used. Experimental results in terms of three extended assessment metrics (F-score, G-mean, and Accuracy) reveal the superior global feature extraction, prediction, and learning abilities of this model against three single kernel-based models.
Collapse
|
20
|
Frequency based feature selection method using whale algorithm. Genomics 2019; 111:1946-1955. [DOI: 10.1016/j.ygeno.2019.01.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Revised: 12/18/2018] [Accepted: 01/05/2019] [Indexed: 11/24/2022]
|
21
|
|
22
|
Shukla AK, Singh P, Vardhan M. A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2019. [DOI: 10.1142/s1469026819500202] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).
Collapse
Affiliation(s)
| | - Pradeep Singh
- Department of Computer Science & Engineering, Raipur, India
| | - Manu Vardhan
- Department of Computer Science & Engineering, Raipur, India
| |
Collapse
|
23
|
Shukla AK, Tripathi D. Identification of potential biomarkers on microarray data using distributed gene selection approach. Math Biosci 2019; 315:108230. [DOI: 10.1016/j.mbs.2019.108230] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 06/04/2019] [Accepted: 07/16/2019] [Indexed: 02/09/2023]
|
24
|
Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med Biol Eng Comput 2018; 57:159-176. [DOI: 10.1007/s11517-018-1874-4] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 07/12/2018] [Indexed: 12/25/2022]
|
25
|
Mu H, Xu J, Wang Y, Sun L. Feature genes selection using Fisher transformation method. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-17710] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Huiyu Mu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| | - Jiucheng Xu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| | - Yun Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| | - Lin Sun
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China
| |
Collapse
|
26
|
Shukla AK, Singh P, Vardhan M. A hybrid gene selection method for microarray recognition. Biocybern Biomed Eng 2018. [DOI: 10.1016/j.bbe.2018.08.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
27
|
Gene selection from large-scale gene expression data based on fuzzy interactive multi-objective binary optimization for medical diagnosis. Biocybern Biomed Eng 2018. [DOI: 10.1016/j.bbe.2018.02.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
28
|
Sahu B, Dehuri S, Jagadev AK. Feature selection model based on clustering and ranking in pipeline for microarray data. INFORMATICS IN MEDICINE UNLOCKED 2017. [DOI: 10.1016/j.imu.2017.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|