1
|
Alkamli S, Alshamlan H. Evaluating the Nuclear Reaction Optimization (NRO) Algorithm for Gene Selection in Cancer Classification. Diagnostics (Basel) 2025; 15:927. [PMID: 40218277 PMCID: PMC11988358 DOI: 10.3390/diagnostics15070927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Revised: 03/29/2025] [Accepted: 03/31/2025] [Indexed: 04/14/2025] Open
Abstract
Background/Objectives: Cancer classification using microarray datasets presents a significant challenge due to their extremely high dimensionality. This complexity necessitates advanced optimization methods for effective gene selection. Methods: This study introduces and evaluates the Nuclear Reaction Optimization (NRO)-drawing inspiration from nuclear fission and fusion-for identifying informative gene subsets in six benchmark cancer microarray datasets. Employed as a standalone approach without prior dimensionality reduction, NRO was assessed using both Support Vector Machine (SVM) and k-Nearest Neighbors (k-NN). Leave-One-Out Cross-Validation (LOOCV) was used to rigorously evaluate classification accuracy and the relevance of the selected genes. Results: Experimental results show that NRO achieved high classification accuracy, particularly when used with SVM. In select datasets, it outperformed several state-of-the-art optimization algorithms. However, due to the absence of additional dimensionality reduction techniques, the number of selected genes remains relatively high. Comparative analysis with Harris Hawks Optimization (HHO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and Firefly Algorithm (FFA) shows that while NRO delivers competitive performance, it does not consistently outperform all methods across datasets. Conclusions: The study concludes that NRO is a promising gene selection approach, particularly effective in certain datasets, and suggests that future work should explore hybrid models and feature reduction techniques to further enhance its accuracy and efficiency.
Collapse
Affiliation(s)
| | - Hala Alshamlan
- Department of Information Technology, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia;
| |
Collapse
|
2
|
Nagarajan A, Varadhan V, Manikandan MS, Kaliaperumal K, Palaniyandi T, Kaliamoorthy S, Baskar G, Rab SO, Balaramnavar VM, Kumarasamy S. Signature of collagen alpha-1(x) gene expression in human cancers and their therapeutic implications. Pathol Res Pract 2025; 266:155811. [PMID: 39787688 DOI: 10.1016/j.prp.2025.155811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 12/24/2024] [Accepted: 01/02/2025] [Indexed: 01/12/2025]
Abstract
Cancers are a class of disorders that entail uncontrollably unwanted cell development with dissemination. One in six fatalities globally is attributed to cancer, a global health issue. The analysis of the entire DNA sequence and how it expresses itself in tumor cells is known as cancer genomics. The development of novel cancer treatments has been facilitated because of the genomics method. COL10A1 gene, a short chain collagen, and an interstitial matrix component, acts as a predictive biomarker for cancer prognosis. Recognizing the fundamental consequences of mutations in the COL10A1 gene and its expression in cancer is crucial. Analyzing the COL10A1 gene expression with a data set and gene expression patterns shows the level of display of the tumor. Examining the therapeutic techniques of COL10A1 gene expression leads to early detection, screening, radiation therapy, and advanced developments. This review highlights the value of the COL10A1 gene in breast, gastric, pancreatic, lung, and colorectal cancers, emphasizing its role in gene expression patterns and therapeutic techniques.
Collapse
Affiliation(s)
- Akshaya Nagarajan
- Department of Biotechnology, Dr. M. G. R Educational and Research Institute, Chennai, Tamil Nadu 600095, India
| | - Varsha Varadhan
- Department of Biotechnology, Dr. M. G. R Educational and Research Institute, Chennai, Tamil Nadu 600095, India
| | - Monica Shri Manikandan
- Department of Biotechnology, Dr. M. G. R Educational and Research Institute, Chennai, Tamil Nadu 600095, India
| | - Kumaravel Kaliaperumal
- Department of Orthodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, India.
| | - Thirunavukkarasu Palaniyandi
- Department of Biotechnology, Dr. M. G. R Educational and Research Institute, Chennai, Tamil Nadu 600095, India; ACS-Advanced Medical Research Institute, Dr. M.G.R Educational and Research Institute, Chennai 600077, India.
| | - Senthilkumar Kaliamoorthy
- Department of Electronics and Communication Engineering, Dr. M.G.R Educational and Research Institute, Chennai, Tamil Nadu 600095, India
| | - Gomathy Baskar
- Department of Biotechnology, Dr. M. G. R Educational and Research Institute, Chennai, Tamil Nadu 600095, India
| | - Safia Obaidur Rab
- Central Labs, King Khalid University, AlQura'a, Abha, Saudi Arabia; Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Khalid University, Abha, Saudi Arabia
| | - Vishal M Balaramnavar
- School of Pharmacy and Research Centre, Sanskriti University, Chhata, Mathura, Uttar Pradesh 281401, India
| | - Saravanan Kumarasamy
- Department of Electric and Electronic Engineering, Dr. M.G.R Educational and Research Institute, Deemed to Be University, Chennai, Tamil Nadu 600 095, India
| |
Collapse
|
3
|
Yaqoob A, Mir MA, Jagannadha Rao GVV, Tejani GG. Transforming Cancer Classification: The Role of Advanced Gene Selection. Diagnostics (Basel) 2024; 14:2632. [PMID: 39682540 DOI: 10.3390/diagnostics14232632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 11/18/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024] Open
Abstract
Background/Objectives: Accurate classification in cancer research is vital for devising effective treatment strategies. Precise cancer classification depends significantly on selecting the most informative genes from high-dimensional datasets, a task made complex by the extensive data involved. This study introduces the Two-stage MI-PSA Gene Selection algorithm, a novel approach designed to enhance cancer classification accuracy through robust gene selection methods. Methods: The proposed method integrates Mutual Information (MI) and Particle Swarm Optimization (PSO) for gene selection. In the first stage, MI acts as an initial filter, identifying genes rich in cancer-related information. In the second stage, PSO refines this selection to pinpoint an optimal subset of genes for accurate classification. Results: The experimental findings reveal that the MI-PSA method achieves a best classification accuracy of 99.01% with a selected subset of 19 genes, substantially outperforming the MI and SVM methods, which attain best accuracies of 93.44% and 91.26%, respectively, for the same gene count. Furthermore, MI-PSA demonstrates superior performance in terms of average and worst-case accuracy, underscoring its robustness and reliability. Conclusions: The MI-PSA algorithm presents a powerful approach for identifying critical genes essential for precise cancer classification, advancing both our understanding and management of this complex disease.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Science and Language, VIT Bhopal University, Kothrikalan, Sehore, Bhopal 466114, India
| | - Mushtaq Ahmad Mir
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Khalid University, Abha 61421, Saudi Arabia
| | | | - Ghanshyam G Tejani
- Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan City 320315, Taiwan
- Jadara Research Center, Jadara University, Irbid 21110, Jordan
| |
Collapse
|
4
|
Yaqoob A, Verma NK, Aziz RM, Shah MA. RNA-Seq analysis for breast cancer detection: a study on paired tissue samples using hybrid optimization and deep learning techniques. J Cancer Res Clin Oncol 2024; 150:455. [PMID: 39390265 PMCID: PMC11467072 DOI: 10.1007/s00432-024-05968-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 09/21/2024] [Indexed: 10/12/2024]
Abstract
PROBLEM Breast cancer is a leading global health issue, contributing to high mortality rates among women. The challenge of early detection is exacerbated by the high dimensionality and complexity of gene expression data, which complicates the classification process. AIM This study aims to develop an advanced deep learning model that can accurately detect breast cancer using RNA-Seq gene expression data, while effectively addressing the challenges posed by the data's high dimensionality and complexity. METHODS We introduce a novel hybrid gene selection approach that combines the Harris Hawk Optimization (HHO) and Whale Optimization (WO) algorithms with deep learning to improve feature selection and classification accuracy. The model's performance was compared to five conventional optimization algorithms integrated with deep learning: Genetic Algorithm (GA), Artificial Bee Colony (ABC), Cuckoo Search (CS), and Particle Swarm Optimization (PSO). RNA-Seq data was collected from 66 paired samples of normal and cancerous tissues from breast cancer patients at the Jawaharlal Nehru Cancer Hospital & Research Centre, Bhopal, India. Sequencing was performed by Biokart Genomics Lab, Bengaluru, India. RESULTS The proposed model achieved a mean classification accuracy of 99.0%, consistently outperforming the GA, ABC, CS, and PSO methods. The dataset comprised 55 female breast cancer patients, including both early and advanced stages, along with age-matched healthy controls. CONCLUSION Our findings demonstrate that the hybrid gene selection approach using HHO and WO, combined with deep learning, is a powerful and accurate tool for breast cancer detection. This approach shows promise for early detection and could facilitate personalized treatment strategies, ultimately improving patient outcomes.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Science and Language, VIT Bhopal University, Kothrikalan, Sehore, Bhopal, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Science and Language, VIT Bhopal University, Kothrikalan, Sehore, Bhopal, 466114, India
| | - Rabia Musheer Aziz
- Planning Department, State Planning Institute (New Division), Lucknow, Utter Pradesh, 226001, India
| | - Mohd Asif Shah
- Department of Economics, Kardan University, Parwane Du, 1001, Kabul, Afghanistan.
- Division of Research and Development, Lovely Professional University, Phagwara, Punjab, 144001, India.
- Centre of Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, India.
| |
Collapse
|
5
|
Borah K, Das HS, Seth S, Mallick K, Rahaman Z, Mallik S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Funct Integr Genomics 2024; 24:139. [PMID: 39158621 DOI: 10.1007/s10142-024-01415-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/20/2024]
Abstract
Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.
Collapse
Affiliation(s)
- Kasmika Borah
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India
| | - Himanish Shekhar Das
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India.
| | - Soumita Seth
- Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata, 700150, West Bengal, India
| | - Koushik Mallick
- Department of Computer Science and Engineering, RCC Institute of Information Technology, Canal S Rd, Beleghata, Kolkata, 700015, West Bengal, India
| | | | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
6
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
7
|
Yaqoob A, Verma NK, Aziz RM. Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm. J Med Syst 2024; 48:10. [PMID: 38193948 DOI: 10.1007/s10916-023-02031-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/28/2023] [Indexed: 01/10/2024]
Abstract
Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| | - Rabia Musheer Aziz
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| |
Collapse
|
8
|
Parhi P, Bisoi R, Kishore Dash P. An improvised nature-inspired algorithm enfolded broad learning system for disease classification. EGYPTIAN INFORMATICS JOURNAL 2023. [DOI: 10.1016/j.eij.2023.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
9
|
Chen Z, Xuan P, Heidari AA, Liu L, Wu C, Chen H, Escorcia-Gutierrez J, Mansour RF. An artificial bee bare-bone hunger games search for global optimization and high-dimensional feature selection. iScience 2023; 26:106679. [PMID: 37216098 PMCID: PMC10193239 DOI: 10.1016/j.isci.2023.106679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/01/2023] [Accepted: 04/12/2023] [Indexed: 05/24/2023] Open
Abstract
The domains of contemporary medicine and biology have generated substantial high-dimensional genetic data. Identifying representative genes and decreasing the dimensionality of the data can be challenging. The goal of gene selection is to minimize computing costs and enhance classification precision. Therefore, this article designs a new wrapper gene selection algorithm named artificial bee bare-bone hunger games search (ABHGS), which is the hunger games search (HGS) integrated with an artificial bee strategy and a Gaussian bare-bone structure to address this issue. To evaluate and validate the performance of our proposed method, ABHGS is compared to HGS and a single strategy embedded in HGS, six classic algorithms, and ten advanced algorithms on the CEC 2017 functions. The experimental results demonstrate that the bABHGS outperforms the original HGS. Compared to peers, it increases classification accuracy and decreases the number of selected features, indicating its actual engineering utility in spatial search and feature selection.
Collapse
Affiliation(s)
- Zhiqing Chen
- School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| | - Ali Asghar Heidari
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Lei Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Chengwen Wu
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Huiling Chen
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - José Escorcia-Gutierrez
- Department of Computational Science and Electronics, Universidad de la Costa, CUC, Barranquilla 080002, Colombia
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt
| |
Collapse
|
10
|
Mowlaei ME, Shi X. FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic Algorithms. Genes (Basel) 2023; 14:genes14051059. [PMID: 37239419 DOI: 10.3390/genes14051059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023] Open
Abstract
(1) Background: Phenotype prediction is a pivotal task in genetics in order to identify how genetic factors contribute to phenotypic differences. This field has seen extensive research, with numerous methods proposed for predicting phenotypes. Nevertheless, the intricate relationship between genotypes and complex phenotypes, including common diseases, has resulted in an ongoing challenge to accurately decipher the genetic contribution. (2) Results: In this study, we propose a novel feature selection framework for phenotype prediction utilizing a genetic algorithm (FSF-GA) that effectively reduces the feature space to identify genotypes contributing to phenotype prediction. We provide a comprehensive vignette of our method and conduct extensive experiments using a widely used yeast dataset. (3) Conclusions: Our experimental results show that our proposed FSF-GA method delivers comparable phenotype prediction performance as compared to baseline methods, while providing features selected for predicting phenotypes. These selected feature sets can be used to interpret the underlying genetic architecture that contributes to phenotypic variation.
Collapse
Affiliation(s)
- Mohammad Erfan Mowlaei
- Department of Computer and Information Sciences, Temple University, 925 N. 12th Street, Philadelphia, PA 19122, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, 925 N. 12th Street, Philadelphia, PA 19122, USA
| |
Collapse
|
11
|
Wang Z, Zhou Y, Takagi T, Song J, Tian YS, Shibuya T. Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data. BMC Bioinformatics 2023; 24:139. [PMID: 37031189 PMCID: PMC10082986 DOI: 10.1186/s12859-023-05267-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/02/2023] [Indexed: 04/10/2023] Open
Abstract
BACKGROUND Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
Collapse
Affiliation(s)
- Zixuan Wang
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
| | - Yi Zhou
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
| | - Tatsuya Takagi
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Jiangning Song
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC, 3800, Australia
| | - Yu-Shi Tian
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Tetsuo Shibuya
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
12
|
Nekouie N, Romoozi M, Esmaeili M. A New Evolutionary Ensemble Learning of Multimodal Feature Selection from Microarray Data. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11159-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
|
13
|
AlMazrua H, AlShamlan H. A New Algorithm for Cancer Biomarker Gene Detection Using Harris Hawks Optimization. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22197273. [PMID: 36236372 PMCID: PMC9572901 DOI: 10.3390/s22197273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/01/2022] [Accepted: 09/09/2022] [Indexed: 05/29/2023]
Abstract
This paper presents two novel swarm intelligence algorithms for gene selection, HHO-SVM and HHO-KNN. Both of these algorithms are based on Harris Hawks Optimization (HHO), one in conjunction with support vector machines (SVM) and the other in conjunction with k-nearest neighbors (k-NN). In both algorithms, the goal is to determine a small gene subset that can be used to classify samples with a high degree of accuracy. The proposed algorithms are divided into two phases. To obtain an accurate gene set and to deal with the challenge of high-dimensional data, the redundancy analysis and relevance calculation are conducted in the first phase. To solve the gene selection problem, the second phase applies SVM and k-NN with leave-one-out cross-validation. A performance evaluation was performed on six microarray data sets using the two proposed algorithms. A comparison of the two proposed algorithms with several known algorithms indicates that both of them perform quite well in terms of classification accuracy and the number of selected genes.
Collapse
|
14
|
Qiu F, Zheng P, Heidari AA, Liang G, Chen H, Karim FK, Elmannai H, Lin H. Mutational Slime Mould Algorithm for Gene Selection. Biomedicines 2022; 10:2052. [PMID: 36009599 PMCID: PMC9406076 DOI: 10.3390/biomedicines10082052] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/14/2022] [Accepted: 08/16/2022] [Indexed: 02/02/2023] Open
Abstract
A large volume of high-dimensional genetic data has been produced in modern medicine and biology fields. Data-driven decision-making is particularly crucial to clinical practice and relevant procedures. However, high-dimensional data in these fields increase the processing complexity and scale. Identifying representative genes and reducing the data's dimensions is often challenging. The purpose of gene selection is to eliminate irrelevant or redundant features to reduce the computational cost and improve classification accuracy. The wrapper gene selection model is based on a feature set, which can reduce the number of features and improve classification accuracy. This paper proposes a wrapper gene selection method based on the slime mould algorithm (SMA) to solve this problem. SMA is a new algorithm with a lot of application space in the feature selection field. This paper improves the original SMA by combining the Cauchy mutation mechanism with the crossover mutation strategy based on differential evolution (DE). Then, the transfer function converts the continuous optimizer into a binary version to solve the gene selection problem. Firstly, the continuous version of the method, ISMA, is tested on 33 classical continuous optimization problems. Then, the effect of the discrete version, or BISMA, was thoroughly studied by comparing it with other gene selection methods on 14 gene expression datasets. Experimental results show that the continuous version of the algorithm achieves an optimal balance between local exploitation and global search capabilities, and the discrete version of the algorithm has the highest accuracy when selecting the least number of genes.
Collapse
Affiliation(s)
- Feng Qiu
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Pan Zheng
- Information Systems, University of Canterbury, Christchurch 8014, New Zealand
| | - Ali Asghar Heidari
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Guoxi Liang
- Department of Information Technology, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Faten Khalid Karim
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Hela Elmannai
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Haiping Lin
- Department of Information Engineering, Hangzhou Vocational & Technical College, Hangzhou 310018, China
| |
Collapse
|
15
|
IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2650742. [PMID: 35909844 PMCID: PMC9334098 DOI: 10.1155/2022/2650742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 07/04/2022] [Indexed: 11/18/2022]
Abstract
A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.
Collapse
|
16
|
Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 2022; 60:1627-1646. [PMID: 35399141 DOI: 10.1007/s11517-022-02555-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 03/16/2022] [Indexed: 12/19/2022]
Abstract
Identifying a small subset of informative genes from a gene expression dataset is an important process for sample classification in the fields of bioinformatics and machine learning. In this process, there are two objectives: first, to minimize the number of selected genes, and second, to maximize the classification accuracy of the used classifier. In this paper, a hybrid machine learning framework based on a nature-inspired cuckoo search (CS) algorithm has been proposed to resolve this problem. The proposed framework is obtained by incorporating the cuckoo search (CS) algorithm with an artificial bee colony (ABC) in the exploitation and exploration of the genetic algorithm (GA). These strategies are used to maintain an appropriate balance between the exploitation and exploration phases of the ABC and GA algorithms in the search process. In preprocessing, the independent component analysis (ICA) method extracts the important genes from the dataset. Then, the proposed gene selection algorithms along with the Naive Bayes (NB) classifier and leave-one-out cross-validation (LOOCV) have been applied to find a small set of informative genes that maximize the classification accuracy. To conduct a comprehensive performance study, proposed algorithms have been applied on six benchmark datasets of gene expression. The experimental comparison shows that the proposed framework (ICA and CS-based hybrid algorithm with NB classifier) performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared to the previously published feature selection algorithm for the NB classifier.
Collapse
|
17
|
Aziz RM. Cuckoo Search-Based Optimization for Cancer Classification: A New Hybrid Approach. J Comput Biol 2022; 29:565-584. [DOI: 10.1089/cmb.2021.0410] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
|
18
|
Aziz RM. Application of nature inspired soft computing techniques for gene selection: a novel frame work for classification of cancer. Soft comput 2022. [DOI: 10.1007/s00500-022-07032-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
19
|
Tahmouresi A, Rashedi E, Yaghoobi MM, Rezaei M. Gene selection using pyramid gravitational search algorithm. PLoS One 2022; 17:e0265351. [PMID: 35290401 PMCID: PMC8923457 DOI: 10.1371/journal.pone.0265351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022] Open
Abstract
Genetics play a prominent role in the development and progression of malignant neoplasms. Identification of the relevant genes is a high-dimensional data processing problem. Pyramid gravitational search algorithm (PGSA), a hybrid method in which the number of genes is cyclically reduced is proposed to conquer the curse of dimensionality. PGSA consists of two elements, a filter and a wrapper method (inspired by the gravitational search algorithm) which iterates through cycles. The genes selected in each cycle are passed on to the subsequent cycles to further reduce the dimension. PGSA tries to maximize the classification accuracy using the most informative genes while reducing the number of genes. Results are reported on a multi-class microarray gene expression dataset for breast cancer. Several feature selection algorithms have been implemented to have a fair comparison. The PGSA ranked first in terms of accuracy (84.5%) with 73 genes. To check if the selected genes are meaningful in terms of patient’s survival and response to therapy, protein-protein interaction network analysis has been applied on the genes. An interesting pattern was emerged when examining the genetic network. HSP90AA1, PTK2 and SRC genes were amongst the top-rated bottleneck genes, and DNA damage, cell adhesion and migration pathways are highly enriched in the network.
Collapse
Affiliation(s)
| | - Esmat Rashedi
- Department of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran
- * E-mail:
| | - Mohammad Mehdi Yaghoobi
- Department of Biotechnology, Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman, Iran
| | - Masoud Rezaei
- Faculty of Medicine, Kerman University of Medical Sciences, Kerman, Iran
| |
Collapse
|
20
|
Fan L, Ma X. Maximum power point tracking of PEMFC based on hybrid artificial bee colony algorithm with fuzzy control. Sci Rep 2022; 12:4316. [PMID: 35279691 PMCID: PMC8918329 DOI: 10.1038/s41598-022-08327-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 03/07/2022] [Indexed: 11/28/2022] Open
Abstract
Maximum power point tracking (MPPT) is an effective method to improve the power generation efficiency and power supply quality of a proton exchange membrane fuel cell (PEMFC). Due to the inherent nonlinear characteristics of PEMFC, conventional MPPT methods are often difficult to achieve a satisfactory control effect. Considering this, artificial bee colony algorithm combining fuzzy control (ABC-fuzzy) was proposed to construct a MPPT control scheme for PEMFC. The global optimization ability of ABC algorithm was used to approach the maximum power point of PEMFC and solve the problem of falling into local optimization, and fuzzy control was used to eliminate the problems of large overshoot and slow convergence speed of ABC algorithm. The testing results show that compared with perturb & observe algorithm, conductance increment and ABC methods, ABC-fuzzy method can make PEMFC obtain greater output power, faster regulation speed, smaller steady-state error, less oscillation and stronger anti-interference ability. The MPPT scheme based on ABC-fuzzy can effectively realize the maximum power output of PEMFC, and plays an important role in improving the service life and power supply efficiency of PEMFC.
Collapse
Affiliation(s)
- Liping Fan
- College of Information Engineering, Shenyang University of Chemical Technology, Shenyang, 110142, China. .,Key Laboratory of Collaborative Control and Optimization Technology of Industrial Environment and Resource of Liaoning Province, Shenyang University of Chemical Technology, Shenyang, 110142, China.
| | - Xianyang Ma
- College of Information Engineering, Shenyang University of Chemical Technology, Shenyang, 110142, China.,Key Laboratory of Collaborative Control and Optimization Technology of Industrial Environment and Resource of Liaoning Province, Shenyang University of Chemical Technology, Shenyang, 110142, China
| |
Collapse
|
21
|
Sathya M, Jeyaselvi M, Joshi S, Pandey E, Pareek PK, Jamal SS, Kumar V, Atiglah HK. Cancer Categorization Using Genetic Algorithm to Identify Biomarker Genes. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:5821938. [PMID: 35242297 PMCID: PMC8888099 DOI: 10.1155/2022/5821938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/14/2021] [Indexed: 11/18/2022]
Abstract
In the microarray gene expression data, there are a large number of genes that are expressed at varying levels of expression. Given that there are only a few critically significant genes, it is challenging to analyze and categorize datasets that span the whole gene space. In order to aid in the diagnosis of cancer disease and, as a consequence, the suggestion of individualized treatment, the discovery of biomarker genes is essential. Starting with a large pool of candidates, the parallelized minimal redundancy and maximum relevance ensemble (mRMRe) is used to choose the top m informative genes from a huge pool of candidates. A Genetic Algorithm (GA) is used to heuristically compute the ideal set of genes by applying the Mahalanobis Distance (MD) as a distance metric. Once the genes have been identified, they are input into the GA. It is used as a classifier to four microarray datasets using the approved approach (mRMRe-GA), with the Support Vector Machine (SVM) serving as the classification basis. Leave-One-Out-Cross-Validation (LOOCV) is a cross-validation technique for assessing the performance of a classifier. It is now being investigated if the proposed mRMRe-GA strategy can be compared to other approaches. It has been shown that the proposed mRMRe-GA approach enhances classification accuracy while employing less genetic material than previous methods. Microarray, Gene Expression Data, GA, Feature Selection, SVM, and Cancer Classification are some of the terms used in this paper.
Collapse
Affiliation(s)
- M. Sathya
- Department of Information Science and Engineering, AMC Engineering College, Bengaluru, Karnataka 560083, India
| | - M. Jeyaselvi
- Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India
| | - Shubham Joshi
- Department of Computer Engineering, SVKM'S NMIMS MPSTME Shirpur, Maharashtra 425405, India
| | - Ekta Pandey
- Applied Science Department, Bundhelkhand Institute of Engineering and Technology, Jhansi, Uttar Pradesh, India
| | - Piyush Kumar Pareek
- Department of Computer Science & Engineering & Head of IPR Cell, Nitte Meenakshi Institute of Technology, Bengaluru, India
| | - Sajjad Shaukat Jamal
- Department of Mathematics, College of Science, King Khalid University, Abha, Saudi Arabia
| | - Vinay Kumar
- Department of Computer Engineering and Application, GLA University, Mathura, India
| | - Henry Kwame Atiglah
- Department of Electrical and Electronics Engineering, Tamale Technical University, Tamale, Ghana
| |
Collapse
|
22
|
Sarkar T, Salauddin M, Mukherjee A, Shariati MA, Rebezov M, Tretyak L, Pateiro M, Lorenzo JM. Application of bio-inspired optimization algorithms in food processing. Curr Res Food Sci 2022; 5:432-450. [PMID: 35243356 PMCID: PMC8866069 DOI: 10.1016/j.crfs.2022.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 02/08/2022] [Accepted: 02/11/2022] [Indexed: 12/23/2022] Open
Abstract
Bio-inspired optimization techniques (BOT) are part of intelligent computing techniques. There are several BOTs available and many new BOTs are evolving in this era of industrial revolution 4.0. Genetic algorithm, particle swarm optimization, artificial bee colony, and grey wolf optimization are the techniques explored by researchers in the field of food processing technology. Although, there are other potential methods that may efficiently solve the optimum related problem in food industries. In this review, the mathematical background of the techniques, their application and the potential microbial-based optimization methods with higher precision has been surveyed for a complete and comprehensive understanding of BOTs along with their mechanism of functioning. These techniques can simulate the process efficiently and able to find the near-to-optimal value expeditiously.
Collapse
Affiliation(s)
- Tanmay Sarkar
- Department of Food Processing Technology, Malda Polytechnic, West Bengal State Council of Technical Education, Malda, 732102, West Bengal, India
| | - Molla Salauddin
- Department of Food Processing Technology, Mir Madan Mohanlal Govt. Polytechnic, West Bengal State Council of Technical Education, Nadia 741156, West Bengal, India
| | - Alok Mukherjee
- Government College of Engineering and Ceramic Technology, Kolkata, India
| | - Mohammad Ali Shariati
- Department of Scientific Research, K.G. Razumovsky Moscow State University of Technologies and Management (The First Cossack University), 109004, Moscow, Russian Federation
| | - Maksim Rebezov
- Department of Scientific Research, K.G. Razumovsky Moscow State University of Technologies and Management (The First Cossack University), 109004, Moscow, Russian Federation
- Biophotonics Center, Prokhorov General Physics Institute of the Russian Academy of Science, 119991, Moscow, Russian Federation
- Department of Scientific Research, V. M. Gorbatov Federal Research Center for Food Systems, 109316, Moscow, Russian Federation
| | - Lyudmila Tretyak
- Department of Metrology, Standardization and Certification, Orenburg State University, 460018, Orenburg, Russian Federation
| | - Mirian Pateiro
- Centro Tecnológico de La Carne de Galicia, Rúa Galicia Nº 4, Parque Tecnológico de Galicia, San Cibrao das Viñas, 32900, Ourense, Spain
| | - José M. Lorenzo
- Centro Tecnológico de La Carne de Galicia, Rúa Galicia Nº 4, Parque Tecnológico de Galicia, San Cibrao das Viñas, 32900, Ourense, Spain
- Universidade de Vigo, Área de Tecnoloxía dos Alimentos, Facultade de Ciencias, 32004 Ourense, Spain
| |
Collapse
|
23
|
Jaddi NS, Saniee Abadeh M. Cell separation algorithm with enhanced search behaviour in miRNA feature selection for cancer diagnosis. INFORM SYST 2022. [DOI: 10.1016/j.is.2021.101906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
24
|
A Two-Stage Method Based on Multiobjective Differential Evolution for Gene Selection. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5227377. [PMID: 34966420 PMCID: PMC8712129 DOI: 10.1155/2021/5227377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/06/2021] [Accepted: 12/03/2021] [Indexed: 11/17/2022]
Abstract
Microarray gene expression data provide a prospective way to diagnose disease and classify cancer. However, in bioinformatics, the gene selection problem, i.e., how to select the most informative genes from thousands of genes, remains challenging. This problem is a specific feature selection problem with high-dimensional features and small sample sizes. In this paper, a two-stage method combining a filter feature selection method and a wrapper feature selection method is proposed to solve the gene selection problem. In contrast to common methods, the proposed method models the gene selection problem as a multiobjective optimization problem. Both stages employ the same multiobjective differential evolution (MODE) as the search strategy but incorporate different objective functions. The three objective functions of the filter method are mainly based on mutual information. The two objective functions of the wrapper method are the number of selected features and the classification error of a naive Bayes (NB) classifier. Finally, the performance of the proposed method is tested and analyzed on six benchmark gene expression datasets. The experimental results verified that this paper provides a novel and effective way to solve the gene selection problem by applying a multiobjective optimization algorithm.
Collapse
|
25
|
Uzma, Halim Z. An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
26
|
Gene Selection for Microarray Cancer Classification based on Manta Rays Foraging Optimization and Support Vector Machines. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-06102-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
27
|
Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data. PLoS One 2021; 16:e0230164. [PMID: 34613963 PMCID: PMC8494339 DOI: 10.1371/journal.pone.0230164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 09/21/2021] [Indexed: 12/22/2022] Open
Abstract
With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGSf—based on frequency and MGSrf—based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data.
Collapse
|
28
|
Asad E, Mollah AF. Biomarker Identification From Gene Expression Based on Symmetrical Uncertainty. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES 2021. [DOI: 10.4018/ijiit.289966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper, we present an effective information theoretic feature selection method, Symmetrical Uncertainty to classify gene expression microarray data and detect biomarkers from it. Here, Information Gain and Symmetrical Uncertainty contribute for ranking the features. Based on computed values of Symmetrical Uncertainty, features were sorted from most informative to least informative ones. Then, the top features from the sorted list are passed to Random Forest, Logistic Regression and other well-known classifiers with Leave-One-Out cross validation to construct the best classification model(s) and accordingly select the most important genes from microarray datasets. Obtained results in terms of classification accuracy, running time, root mean square error and other parameters computed on Leukemia and Colon cancer datasets demonstrate the effectiveness of the proposed approach. The proposed method is relatively much faster than many other wrapper or ensemble methods.
Collapse
|
29
|
Abinash MJ, Vasudevan V. Boundaries tuned support vector machine (BT-SVM) classifier for cancer prediction from gene selection. Comput Methods Biomech Biomed Engin 2021; 25:794-807. [PMID: 34585639 DOI: 10.1080/10255842.2021.1981300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
In recent days, the identified genes which are detecting cancer-causing diseases are plays a crucial part in the microarray data analysis. Huge volume of data required since the disease changed often. Conventional data mining techniques are lacking in space concern and time complexity. Based on big data the proposed work is executed. Using the ISPCA - Improved Supervised Principal Component Analysis, feature extraction is developed in this study. For gene expression, co-variance matrix is generated and through feature selection cancer classification is performed by IPSCA. Further feature selection process by boundaries tuned support vector machines (BT-SVM) classifier and modified particle swarm optimization with novel wrapper model algorithm are performed. The experimentation is carried out by utilizing different datasets like leukaemia, breast cancer dataset, brain cancer, colon, and lung carcinoma from the UCI repository. The proposed work is executed on six benchmark dataset for DNA microarray data in terms of accuracy, recall, and precision to evaluate the performance of the proposed work. For evaluating the proposed work effectiveness, it is compared with various traditional techniques and resulted in optimum accuracy, recall, precision and training time with and without feature selection effectively.
Collapse
Affiliation(s)
- M J Abinash
- Department of Computer Science, Sri Kaliswari College (Autonomous), Sivakasi, TamilNadu, India
| | - V Vasudevan
- Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India
| |
Collapse
|
30
|
Abstract
The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s χ2 test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s χ2 test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s χ2 test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.
Collapse
|
31
|
Al-Rajab M, Lu J, Xu Q. A framework model using multifilter feature selection to enhance colon cancer classification. PLoS One 2021; 16:e0249094. [PMID: 33861766 PMCID: PMC8691854 DOI: 10.1371/journal.pone.0249094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/11/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiles can be utilized in the diagnosis of critical diseases such as cancer. The selection of biomarker genes from these profiles is significant and crucial for cancer detection. This paper presents a framework proposing a two-stage multifilter hybrid model of feature selection for colon cancer classification. Colon cancer is being extremely common nowadays among other types of cancer. There is a need to find fast and an accurate method to detect the tissues, and enhance the diagnostic process and the drug discovery. This paper reports on a study whose objective has been to improve the diagnosis of cancer of the colon through a two-stage, multifilter model of feature selection. The model described deals with feature selection using a combination of Information Gain and a Genetic Algorithm. The next stage is to filter and rank the genes identified through this method using the minimum Redundancy Maximum Relevance (mRMR) technique. The final phase is to further analyze the data using correlated machine learning algorithms. This two-stage approach, which involves the selection of genes before classification techniques are used, improves success rates for the identification of cancer cells. It is found that Decision Tree, K-Nearest Neighbor, and Naïve Bayes classifiers had showed promising accurate results using the developed hybrid framework model. It is concluded that the performance of our proposed method has achieved a higher accuracy in comparison with the existing methods reported in the literatures. This study can be used as a clue to enhance treatment and drug discovery for the colon cancer cure.
Collapse
Affiliation(s)
- Murad Al-Rajab
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| |
Collapse
|
32
|
Debata PP, Mohapatra P. Selection of informative genes from high-dimensional cancerous data employing an improvised meta-heuristic algorithm. EVOLUTIONARY INTELLIGENCE 2021. [DOI: 10.1007/s12065-021-00593-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
33
|
Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF. A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft comput 2021. [DOI: 10.1007/s00500-021-05726-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
34
|
Gao XZ, Nalluri MSR, Kannan K, Sinharoy D. Multi-objective optimization of feature selection using hybrid cat swarm optimization. SCIENCE CHINA TECHNOLOGICAL SCIENCES 2021; 64:508-520. [DOI: 10.1007/s11431-019-1607-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 04/17/2020] [Indexed: 01/04/2025]
|
35
|
Peng C, Wu X, Yuan W, Zhang X, Zhang Y, Li Y. MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:621-632. [PMID: 31180870 DOI: 10.1109/tcbb.2019.2921961] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Microarray gene expression data have become a topic of great interest for cancer classification and for further research in the field of bioinformatics. Nonetheless, due to the "large p, small n" paradigm of limited biosamples and high-dimensional data, gene selection is becoming a demanding task, which is aimed at selecting a minimal number of discriminatory genes associated closely with a phenotype. Feature or gene selection is still a challenging problem owing to its nondeterministic polynomial time complexity and thus most of the existing feature selection algorithms utilize heuristic rules. A multilayer recursive feature elimination method based on an embedded integer-coded genetic algorithm, MGRFE, is proposed here, which is aimed at selecting the gene combination with minimal size and maximal information. On the basis of 19 benchmark microarray datasets including multiclass and imbalanced datasets, MGRFE outperforms state-of-the-art feature selection algorithms with better cancer classification accuracy and a smaller selected gene number. MGRFE could be regarded as a promising feature selection method for high-dimensional datasets especially gene expression data. Moreover, the genes selected by MGRFE have close biological relevance to cancer phenotypes. The source code of our proposed algorithm and all the 19 datasets used in this paper are available at https://github.com/Pengeace/MGRFE-GaRFE.
Collapse
|
36
|
Baliarsingh SK, Muhammad K, Bakshi S. SARA: A memetic algorithm for high-dimensional biomedical data. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
37
|
Mahendran N, Durai Raj Vincent PM, Srinivasan K, Chang CY. Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions. Front Genet 2020; 11:603808. [PMID: 33362861 PMCID: PMC7758324 DOI: 10.3389/fgene.2020.603808] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 10/29/2020] [Indexed: 12/20/2022] Open
Abstract
Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes' help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data.
Collapse
Affiliation(s)
- Nivedhitha Mahendran
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - P. M. Durai Raj Vincent
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Chuan-Yu Chang
- Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan
| |
Collapse
|
38
|
A Genetic Programming Strategy to Induce Logical Rules for Clinical Data Analysis. Processes (Basel) 2020. [DOI: 10.3390/pr8121565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper proposes a machine learning approach dealing with genetic programming to build classifiers through logical rule induction. In this context, we define and test a set of mutation operators across from different clinical datasets to improve the performance of the proposal for each dataset. The use of genetic programming for rule induction has generated interesting results in machine learning problems. Hence, genetic programming represents a flexible and powerful evolutionary technique for automatic generation of classifiers. Since logical rules disclose knowledge from the analyzed data, we use such knowledge to interpret the results and filter the most important features from clinical data as a process of knowledge discovery. The ultimate goal of this proposal is to provide the experts in the data domain with prior knowledge (as a guide) about the structure of the data and the rules found for each class, especially to track dichotomies and inequality. The results reached by our proposal on the involved datasets have been very promising when used in classification tasks and compared with other methods.
Collapse
|
39
|
Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04846-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
40
|
Novel competitive-cooperative learning models (cclms) based on higher order information sets. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01881-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
41
|
MotieGhader H, Masoudi-Sobhanzadeh Y, Ashtiani SH, Masoudi-Nejad A. mRNA and microRNA selection for breast cancer molecular subtype stratification using meta-heuristic based algorithms. Genomics 2020; 112:3207-3217. [DOI: 10.1016/j.ygeno.2020.06.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 05/13/2020] [Accepted: 06/02/2020] [Indexed: 02/06/2023]
|
42
|
Gene selection of non-small cell lung cancer data for adjuvant chemotherapy decision using cell separation algorithm. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01740-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
43
|
Uzma, Al-Obeidat F, Tubaishat A, Shah B, Halim Z. Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05101-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
44
|
Baliarsingh SK, Vipsita S. Chaotic emperor penguin optimised extreme learning machine for microarray cancer classification. IET Syst Biol 2020; 14:85-95. [PMID: 32196467 DOI: 10.1049/iet-syb.2019.0028] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Microarray technology plays a significant role in cancer classification, where a large number of genes and samples are simultaneously analysed. For the efficient analysis of the microarray data, there is a great demand for the development of intelligent techniques. In this article, the authors propose a novel hybrid technique employing Fisher criterion, ReliefF, and extreme learning machine (ELM) based on the principle of chaotic emperor penguin optimisation algorithm (CEPO). EPO is a recently developed metaheuristic method. In the proposed method, initially, Fisher score and ReliefF are independently used as filters for relevant gene selection. Further, a novel population-based metaheuristic, namely, CEPO was proposed to pre-train the ELM by selecting the optimal input weights and hidden biases. The authors have successfully conducted experiments on seven well-known data sets. To evaluate the effectiveness, the proposed method is compared with original EPO, genetic algorithm, and particle swarm optimisation-based ELM along with other state-of-the-art techniques. The experimental results show that the proposed framework achieves better accuracy as compared to the state-of-the-art schemes. The efficacy of the proposed method is demonstrated in terms of accuracy, sensitivity, specificity, and F-measure.
Collapse
Affiliation(s)
- Santos Kumar Baliarsingh
- DST-FIST Bioinformatics Lab, Department of Computer Science and Engineering, International Institute of Information Technology, Bhubaneswar, India.
| | - Swati Vipsita
- DST-FIST Bioinformatics Lab, Department of Computer Science and Engineering, International Institute of Information Technology, Bhubaneswar, India
| |
Collapse
|
45
|
Al-Betar MA, Alomari OA, Abu-Romman SM. A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 2020; 112:114-126. [DOI: 10.1016/j.ygeno.2019.09.015] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 09/05/2019] [Accepted: 09/17/2019] [Indexed: 10/25/2022]
|
46
|
A memetic algorithm using emperor penguin and social engineering optimization for medical data classification. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105773] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
47
|
Sharma A, Rani R. C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 178:219-235. [PMID: 31416551 DOI: 10.1016/j.cmpb.2019.06.029] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/24/2019] [Accepted: 06/27/2019] [Indexed: 05/21/2023]
Abstract
BACKGROUND AND OBJECTIVE Over the last two decades, DNA microarray technology has emerged as a powerful tool for early cancer detection and prevention. It helps to provide a detailed overview of disease complex microenvironment. Moreover, online availability of thousands of gene expression assays made microarray data classification an active research area. A common goal is to find a minimum subset of genes and maximizing the classification accuracy. METHODS In pursuit of a similar objective, we have proposed framework (C-HMOSHSSA) for gene selection using multi-objective spotted hyena optimizer (MOSHO) and salp swarm algorithm (SSA). The real-life optimization problems with more than one objective usually face the challenge to maintain convergence and diversity. Salp Swarm Algorithm (SSA) maintains diversity but, suffers from the overhead of maintaining the necessary information. On the other hand, the calculation of MOSHO requires low computational efforts hence is used for maintaining the necessary information. Therefore, the proposed algorithm is a hybrid algorithm that utilizes the features of both SSA and MOSHO to facilitate its exploration and exploitation capability. RESULTS Four different classifiers are trained on seven high-dimensional datasets using a subset of features (genes), which are obtained after applying the proposed hybrid gene selection algorithm. The results show that the proposed technique significantly outperforms existing state-of-the-art techniques. CONCLUSION It is also shown that the new sets of informative and biologically relevant genes are successfully identified by the proposed technique. The proposed approach can also be applied to other problem domains of interest which involve feature selection.
Collapse
Affiliation(s)
- Aman Sharma
- Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Patiala, Punjab, India.
| | - Rinkle Rani
- Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Patiala, Punjab, India.
| |
Collapse
|
48
|
A new optimal gene selection approach for cancer classification using enhanced Jaya-based forest optimization algorithm. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04355-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
49
|
Jansi Rani M, Devaraj D. Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification. J Med Syst 2019; 43:235. [PMID: 31209677 DOI: 10.1007/s10916-019-1372-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 06/05/2019] [Indexed: 01/20/2023]
Abstract
Cancer is a deadly disease which requires a very complex and costly treatment. Microarray data classification plays an important role in cancer treatment. An efficient gene selection technique to select the more promising genes is necessary for cancer classification. Here, we propose a Two-stage MI-GA Gene Selection algorithm for selecting informative genes in cancer data classification. In the first stage, Mutual Information based gene selection is applied which selects only the genes that have high information related to the cancer. The genes which have high mutual information value are given as input to the second stage. The Genetic Algorithm based gene selection is applied in the second stage to identify and select the optimal set of genes required for accurate classification. For classification, Support Vector Machine (SVM) is used. The proposed MI-GA gene selection approach is applied to Colon, Lung and Ovarian cancer datasets and the results show that the proposed gene selection approach results in higher classification accuracy compared to the existing methods.
Collapse
Affiliation(s)
- M Jansi Rani
- School of Computing, Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India.
| | - D Devaraj
- School of Electronics & Electrical Technology, Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India
| |
Collapse
|
50
|
Scaria LTT, Christopher T. A Bio-inspired Algorithm based Multi-class Classification Scheme for Microarray Gene Data. J Med Syst 2019; 43:208. [PMID: 31144036 DOI: 10.1007/s10916-019-1353-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 05/20/2019] [Indexed: 11/24/2022]
Abstract
Microarray gene data is widely known for its high dimensionality and volume. The utilization of microarray gene data is increasing now-a-days, owing to the advancement of medical science. Microarray gene data helps in diagnosing diseases quite accurately. However, processing microarray gene data is difficult and is usually not understandable. Taking this challenge into account, this work presents a user-friendly rule based classification model, which is easily understandable and does not demand users to have prior knowledge. The classification rules are formed with the help of cuckoo search optimization algorithm and the rules are pruned by the associative rule mining. Finally, the classification is performed with the help of the pruned rules. The performance of the proposed approach is satisfactory in terms of accuracy, sensitivity, specificity and time consumption.
Collapse
Affiliation(s)
- L T Thomas Scaria
- Department of Computer Science, St. Pius X College, Kasaragod, Kerala, India.
| | - T Christopher
- PG and Research Department of Information Technology, Government Arts College, Coimbatore, India
| |
Collapse
|