1
|
Mishra M, Acharjya DP. A hybridized red deer and rough set clinical information retrieval system for hepatitis B diagnosis. Sci Rep 2024; 14:3815. [PMID: 38360918 PMCID: PMC10869783 DOI: 10.1038/s41598-024-53170-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 01/29/2024] [Indexed: 02/17/2024] Open
Abstract
Healthcare is a big concern in the current booming population. Many approaches for improving health are imposed, such as early disease identification, treatment, and prevention. Therefore, knowledge acquisition is highly essential at different stages of decision-making. Inferring knowledge from the information system, which necessitates multiple steps for extracting useful information, is one technique to address this problem. Handling uncertainty throughout data analysis is also another challenging task. Computer intelligence is a step forward to this end while selecting characteristics, classification, clustering, and developing clinical information retrieval systems. According to recent studies, swarm optimization is a useful technique for discovering key features while resolving real-world issues. However, it is ineffective in managing uncertainty. Conversely, a rough set helps a decision system generate decision rules. This produces decision rules without any additional information. In order to assess real-world information systems while managing uncertainties, a hybrid strategy that combines a rough set and red deer algorithm is presented in this research. In the red deer optimization algorithm, the suggested method selects the optimal characteristics in terms of the degree of dependence on the rough set. In order to determine the decision rules, further a rough set is used. The efficiency of the suggested model is also contrasted with that of the decision tree algorithm and the conventional rough set. An empirical study on hepatitis disease illustrates the viability of the proposed research as compared to the decision tree and crisp rough set. The proposed hybridization of rough set and red deer algorithm achieves an accuracy of 91.7% accuracy. The acquired accuracy for the decision tree, and rough set methods is 82.9%, and 88.9%, respectively. It suggests that the proposed research is viable.
Collapse
Affiliation(s)
- Madhusmita Mishra
- Vellore Institute of Technology, School of Computer Science and Engineering, Vellore, 632014, India
| | - D P Acharjya
- Vellore Institute of Technology, School of Computer Science and Engineering, Vellore, 632014, India.
| |
Collapse
|
2
|
Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023; 24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.
Collapse
Affiliation(s)
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
- Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
3
|
Binary Approaches of Quantum-Based Avian Navigation Optimizer to Select Effective Features from High-Dimensional Medical Data. MATHEMATICS 2022. [DOI: 10.3390/math10152770] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Many metaheuristic approaches have been developed to select effective features from different medical datasets in a feasible time. However, most of them cannot scale well to large medical datasets, where they fail to maximize the classification accuracy and simultaneously minimize the number of selected features. Therefore, this paper is devoted to developing an efficient binary version of the quantum-based avian navigation optimizer algorithm (QANA) named BQANA, utilizing the scalability of the QANA to effectively select the optimal feature subset from high-dimensional medical datasets using two different approaches. In the first approach, several binary versions of the QANA are developed using S-shaped, V-shaped, U-shaped, Z-shaped, and quadratic transfer functions to map the continuous solutions of the canonical QANA to binary ones. In the second approach, the QANA is mapped to binary space by converting each variable to 0 or 1 using a threshold. To evaluate the proposed algorithm, first, all binary versions of the QANA are assessed on different medical datasets with varied feature sizes, including Pima, HeartEW, Lymphography, SPECT Heart, PenglungEW, Parkinson, Colon, SRBCT, Leukemia, and Prostate tumor. The results show that the BQANA developed by the second approach is superior to other binary versions of the QANA to find the optimal feature subset from the medical datasets. Then, the BQANA was compared with nine well-known binary metaheuristic algorithms, and the results were statistically assessed using the Friedman test. The experimental and statistical results demonstrate that the proposed BQANA has merit for feature selection from medical datasets.
Collapse
|
4
|
Zou L, Zhou S, Li X. An Efficient Improved Greedy Harris Hawks Optimizer and Its Application to Feature Selection. ENTROPY 2022; 24:e24081065. [PMID: 36010729 PMCID: PMC9407072 DOI: 10.3390/e24081065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/21/2022] [Accepted: 07/30/2022] [Indexed: 01/27/2023]
Abstract
To overcome the lack of flexibility of Harris Hawks Optimization (HHO) in switching between exploration and exploitation, and the low efficiency of its exploitation phase, an efficient improved greedy Harris Hawks Optimizer (IGHHO) is proposed and applied to the feature selection (FS) problem. IGHHO uses a new transformation strategy that enables flexible switching between search and development, enabling it to jump out of local optima. We replace the original HHO exploitation process with improved differential perturbation and a greedy strategy to improve its global search capability. We tested it in experiments against seven algorithms using single-peaked, multi-peaked, hybrid, and composite CEC2017 benchmark functions, and IGHHO outperformed them on optimization problems with different feature functions. We propose new objective functions for the problem of data imbalance in FS and apply IGHHO to it. IGHHO outperformed comparison algorithms in terms of classification accuracy and feature subset length. The results show that IGHHO applies not only to global optimization of different feature functions but also to practical optimization problems.
Collapse
|
5
|
Trajectory Control of an Active and Passive Hybrid Hydraulic Ankle Prosthesis Using an Improved PSO-PID Controller. J INTELL ROBOT SYST 2022. [DOI: 10.1007/s10846-022-01670-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
6
|
Absar N, Das EK, Shoma SN, Khandaker MU, Miraz MH, Faruque MRI, Tamam N, Sulieman A, Pathan RK. The Efficacy of Machine-Learning-Supported Smart System for Heart Disease Prediction. Healthcare (Basel) 2022; 10:healthcare10061137. [PMID: 35742188 PMCID: PMC9222326 DOI: 10.3390/healthcare10061137] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/13/2022] [Accepted: 06/14/2022] [Indexed: 11/26/2022] Open
Abstract
The disease may be an explicit status that negatively affects human health. Cardiopathy is one of the common deadly diseases that is attributed to unhealthy human habits compared to alternative diseases. With the help of machine learning (ML) algorithms, heart disease can be noticed in a short time as well as at a low cost. This study adopted four machine learning models, such as random forest (RF), decision tree (DT), AdaBoost (AB), and K-nearest neighbor (KNN), to detect heart disease. A generalized algorithm was constructed to analyze the strength of the relevant factors that contribute to heart disease prediction. The models were evaluated using the datasets Cleveland, Hungary, Switzerland, and Long Beach (CHSLB), and all were collected from Kaggle. Based on the CHSLB dataset, RF, DT, AB, and KNN models predicted an accuracy of 99.03%, 96.10%, 100%, and 100%, respectively. In the case of a single (Cleveland) dataset, only two models, namely RF and KNN, show good accuracy of 93.437% and 97.83%, respectively. Finally, the study used Streamlit, an internet-based cloud hosting platform, to develop a computer-aided smart system for disease prediction. It is expected that the proposed tool together with the ML algorithm will play a key role in diagnosing heart diseases in a very convenient manner. Above all, the study has made a substantial contribution to the computation of strength scores with significant predictors in the prognosis of heart disease.
Collapse
Affiliation(s)
- Nurul Absar
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh; (N.A.); (E.K.D.); (S.N.S.)
| | - Emon Kumar Das
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh; (N.A.); (E.K.D.); (S.N.S.)
| | - Shamsun Nahar Shoma
- Department of Computer Science and Engineering, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh; (N.A.); (E.K.D.); (S.N.S.)
| | - Mayeen Uddin Khandaker
- Centre for Applied Physics and Radiation Technologies, School of Engineering and Technology, Sunway University, Petaling Jaya 47500, Selangor, Malaysia
- Department of General Educational Development, Faculty of Science and Information Technology, Daffodil International University, DIU Rd, Dhaka 1341, Bangladesh
- Correspondence: author:
| | - Mahadi Hasan Miraz
- Department of Business Analytics, Sunway University, Petaling Jaya 47500, Selangor, Malaysia;
| | - M. R. I. Faruque
- Space Science Center, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia;
| | - Nissren Tamam
- Department of Physics, College of Science, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia;
| | - Abdelmoneim Sulieman
- Department of Radiology and Medical Imaging, Prince Sattam Bin Abdulaziz University, Alkharj 11942, Saudi Arabia;
| | - Refat Khan Pathan
- Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, Petaling Jaya 47500, Selangor, Malaysia;
| |
Collapse
|
7
|
Rashno A, Shafipour M, Fadaei S. Particle ranking: An Efficient Method for Multi-Objective Particle Swarm Optimization Feature Selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Fan Z, Chiong R, Hu Z, Keivanian F, Chiong F. Body fat prediction through feature extraction based on anthropometric and laboratory measurements. PLoS One 2022; 17:e0263333. [PMID: 35192644 PMCID: PMC8863283 DOI: 10.1371/journal.pone.0263333] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 01/17/2022] [Indexed: 01/15/2023] Open
Abstract
Obesity, associated with having excess body fat, is a critical public health problem that can cause serious diseases. Although a range of techniques for body fat estimation have been developed to assess obesity, these typically involve high-cost tests requiring special equipment. Thus, the accurate prediction of body fat percentage based on easily accessed body measurements is important for assessing obesity and its related diseases. By considering the characteristics of different features (e.g. body measurements), this study investigates the effectiveness of feature extraction for body fat prediction. It evaluates the performance of three feature extraction approaches by comparing four well-known prediction models. Experimental results based on two real-world body fat datasets show that the prediction models perform better on incorporating feature extraction for body fat prediction, in terms of the mean absolute error, standard deviation, root mean square error and robustness. These results confirm that feature extraction is an effective pre-processing step for predicting body fat. In addition, statistical analysis confirms that feature extraction significantly improves the performance of prediction methods. Moreover, the increase in the number of extracted features results in further, albeit slight, improvements to the prediction models. The findings of this study provide a baseline for future research in related areas.
Collapse
Affiliation(s)
- Zongwen Fan
- School of Information and Physical Sciences, The University of Newcastle, Callaghan, NSW, Australia
- College of Computer Science and Technology, Huaqiao University, Xiamen, China
| | - Raymond Chiong
- School of Information and Physical Sciences, The University of Newcastle, Callaghan, NSW, Australia
- * E-mail:
| | - Zhongyi Hu
- School of Information Management, Wuhan University, Wuhan, China
| | - Farshid Keivanian
- School of Information and Physical Sciences, The University of Newcastle, Callaghan, NSW, Australia
| | | |
Collapse
|
9
|
Rough Set Based Classification and Feature Selection Using Improved Harmony Search for Peptide Analysis and Prediction of Anti-HIV-1 Activities. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12042020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
AIDS, which is caused by the most widespread HIV-1 virus, attacks the immune system of the human body, and despite the incredible endeavors for finding proficient medication strategies, the continuing spread of AIDS and claiming subsequent infections has not yet been decreased. Consequently, the discovery of innovative medicinal methodologies is highly in demand. Some available therapies, based on peptides, proclaim the treatment for several deadly diseases such as AIDS and cancer. Since many experimental types of research are restricted by the analysis period and expenses, computational methods overcome the issues effectually. In computational technique, the peptide residues with anti-HIV-1 activity are predicted by classification method, and the learning process of the classification is improved with significant features. Rough set-based algorithms are capable of dealing with the gaps and imperfections present in real-time data. In this work, feature selection using Rough Set Improved Harmony Search Quick Reduct and Rough Set Improved Harmony Search Relative Reduct with Rough Set Classification framework is implemented to classify Anti-HIV-1 peptides. The primary objective of the proposed methodology is to predict the peptides with an anti-HIV-1 activity using effective feature selection and classification algorithms incorporated in the proposed framework. The results of the proposed algorithms are comparatively studied with existing rough set feature selection algorithms and benchmark classifiers, and the reliability of the algorithms implemented in the proposed framework is measured by validity measures, such as Precision, Recall, F-measure, Kulczynski Index, and Fowlkes–Mallows Index. The final results show that the proposed framework analyzed and classified the peptides with a high predictive accuracy of 96%. In this study, we have investigated the ability of a rough set-based framework with sequence-based numeric features to classify anti-HIV-1 peptides, and the experimentation results show that the proposed framework discloses the most satisfactory solutions, where it rapidly congregates in the problem space and finds the best reduct, which improves the prediction accuracy of the given dataset.
Collapse
|
10
|
A Review of the Modification Strategies of the Nature Inspired Algorithms for Feature Selection Problem. MATHEMATICS 2022. [DOI: 10.3390/math10030464] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This survey is an effort to provide a research repository and a useful reference for researchers to guide them when planning to develop new Nature-inspired Algorithms tailored to solve Feature Selection problems (NIAs-FS). We identified and performed a thorough literature review in three main streams of research lines: Feature selection problem, optimization algorithms, particularly, meta-heuristic algorithms, and modifications applied to NIAs to tackle the FS problem. We provide a detailed overview of 156 different articles about NIAs modifications for tackling FS. We support our discussions by analytical views, visualized statistics, applied examples, open-source software systems, and discuss open issues related to FS and NIAs. Finally, the survey summarizes the main foundations of NIAs-FS with approximately 34 different operators investigated. The most popular operator is chaotic maps. Hybridization is the most widely used modification technique. There are three types of hybridization: Integrating NIA with another NIA, integrating NIA with a classifier, and integrating NIA with a classifier. The most widely used hybridization is the one that integrates a classifier with the NIA. Microarray and medical applications are the dominated applications where most of the NIA-FS are modified and used. Despite the popularity of the NIAs-FS, there are still many areas that need further investigation.
Collapse
|
11
|
Abdulwahab HM, Ajitha S, Saif MAN. Feature selection techniques in the context of big data: taxonomy and analysis. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03118-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
12
|
Ogundokun RO, Misra S, Sadiku PO, Adeniyi JK. Assessment of Machine Learning Classifiers for Heart Diseases Discovery. INFORM SYST 2022. [DOI: 10.1007/978-3-030-95947-0_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Faizan M, Alsolami F, Khan RA. Hybrid Binary Butterfly Optimization Algorithm and Simulated Annealing for Feature Selection Problem. INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING 2022. [DOI: 10.4018/ijamc.2022010104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Feature selection is performed to eliminate irrelevant features to reduce computational overheads. Metaheuristic algorithms have become popular for the task of feature selection due to their effectiveness and flexibility. Hybridization of two or more such metaheuristics has become popular in solving optimization problems. In this paper, we propose a hybrid wrapper feature selection technique based on binary butterfly optimization algorithm (bBOA) and Simulated Annealing (SA). The SA is combined with the bBOA in a pipeline fashion such that the best solution obtained by the bBOA is passed on to the SA for further improvement. The SA solution improves the best solution obtained so far by searching in its neighborhood. Thus the SA tries to enhance the exploitation property of the bBOA. The proposed method is tested on twenty datasets from the UCI repository and the results are compared with five popular algorithms for feature selection. The results confirm the effectiveness of the hybrid approach in improving the classification accuracy and selecting the optimal feature subset.
Collapse
Affiliation(s)
- Mohd Faizan
- Babasaheb Bhimrao Ambedkar University, India
| | | | | |
Collapse
|
14
|
Azadifar S, Ahmadi A. A graph-based gene selection method for medical diagnosis problems using a many-objective PSO algorithm. BMC Med Inform Decis Mak 2021; 21:333. [PMID: 34838034 PMCID: PMC8627636 DOI: 10.1186/s12911-021-01696-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 11/16/2021] [Indexed: 11/16/2022] Open
Abstract
Background Gene expression data play an important role in bioinformatics applications. Although there may be a large number of features in such data, they mainly tend to contain only a few samples. This can negatively impact the performance of data mining and machine learning algorithms. One of the most effective approaches to alleviate this problem is to use gene selection methods. The aim of gene selection is to reduce the dimensions (features) of gene expression data leading to eliminating irrelevant and redundant genes. Methods This paper presents a hybrid gene selection method based on graph theory and a many-objective particle swarm optimization (PSO) algorithm. To this end, a filter method is first utilized to reduce the initial space of the genes. Then, the gene space is represented as a graph to apply a graph clustering method to group the genes into several clusters. Moreover, the many-objective PSO algorithm is utilized to search an optimal subset of genes according to several criteria, which include classification error, node centrality, specificity, edge centrality, and the number of selected genes. A repair operator is proposed to cover the whole space of the genes and ensure that at least one gene is selected from each cluster. This leads to an increasement in the diversity of the selected genes. Results To evaluate the performance of the proposed method, extensive experiments are conducted based on seven datasets and two evaluation measures. In addition, three classifiers—Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN)—are utilized to compare the effectiveness of the proposed gene selection method with other state-of-the-art methods. The results of these experiments demonstrate that our proposed method not only achieves more accurate classification, but also selects fewer genes than other methods. Conclusion This study shows that the proposed multi-objective PSO algorithm simultaneously removes irrelevant and redundant features using several different criteria. Also, the use of the clustering algorithm and the repair operator has improved the performance of the proposed method by covering the whole space of the problem.
Collapse
Affiliation(s)
- Saeid Azadifar
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.
| | - Ali Ahmadi
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| |
Collapse
|
15
|
Yang L, Qin K, Sang B, Xu W. Dynamic fuzzy neighborhood rough set approach for interval-valued information systems with fuzzy decision. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107679] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Amato F, Coppolino L, Cozzolino G, Mazzeo G, Moscato F, Nardone R. Enhancing random forest classification with NLP in DAMEH: A system for DAta Management in eHealth Domain. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.08.091] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
17
|
Sathiyabhama B, Kumar SU, Jayanthi J, Sathiya T, Ilavarasi AK, Yuvarajan V, Gopikrishna K. A novel feature selection framework based on grey wolf optimizer for mammogram image analysis. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06099-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
A Simultaneous Moth Flame Optimizer Feature Selection Approach Based on Levy Flight and Selection Operators for Medical Diagnosis. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05478-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
19
|
Bania RK, Halder A. R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification. Artif Intell Med 2021; 114:102049. [PMID: 33875164 DOI: 10.1016/j.artmed.2021.102049] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 11/28/2022]
Abstract
Feature selection is one of the trustworthy processes of dimensionality reduction technique to select a subset of relevant and non-redundant features from large datasets. Ensemble feature selection (EFS) approach is a recent technique aiming at accumulating diversity in the subset of selected features. It improves the performance of learning algorithms and obtains more stable and robust results. In this paper, a novel rough set theory (RST) based heterogeneous EFS method (R-HEFS) is proposed for selecting the less redundant and highly relevant features during the aggregation of diverse feature subsets by applying the feature-class, feature-feature rough dependency and feature-significance measures. In R-HEFS five state-of-the-art RST based filter methods are used as a base feature selectors. Experiments are carried out on 10 benchmark medical datasets collected from the UCI repository. For the imputation of the missing values and discretization of the continuous features, k nearest neighbor (kNN) imputation method and RST based discretization techniques are applied. The effectiveness of the proposed R-HEFS method is evaluated and analyzed by using four benchmark classifiers viz., Naïve Bayes (NB), random forest (RF), support vector machine (SVM), and AdaBoost. The proposed R-HEFS method turns out to be effective by removing the non-relevant and redundant features during the process of aggregation of base feature selectors and it assists to increase the classification accuracy. Out of 10 different medical datasets, on 7 datasets, R-HEFS has achieved better average classification accuracy. So, the overall results strongly suggest that the proposed R-HEFS method can reduce the dimension of large medical datasets and may help the physicians or medical experts to diagnose (classify) different diseases with lesser computational complexities.
Collapse
Affiliation(s)
- Rubul Kumar Bania
- Department of Computer Application, North-Eastern Hill University, Tura Campus, Tura 794002, Meghalaya, India.
| | - Anindya Halder
- Department of Computer Application, North-Eastern Hill University, Tura Campus, Tura 794002, Meghalaya, India.
| |
Collapse
|
20
|
Ray P, Reddy SS, Banerjee T. Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 2021. [DOI: 10.1007/s10462-020-09928-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Singh D, Singh B. Investigating the impact of data normalization on classification performance. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105524] [Citation(s) in RCA: 211] [Impact Index Per Article: 52.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
22
|
An improved runner-root algorithm for solving feature selection problems based on rough sets and neighborhood rough sets. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105517] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
23
|
Elaziz MA, Moemen YS, Hassanien AE, Xiong S. Toxicity risks evaluation of unknown FDA biotransformed drugs based on a multi-objective feature selection approach. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105509] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Abu Khurmaa R, Aljarah I, Sharieh A. An intelligent feature selection approach based on moth flame optimization for medical diagnosis. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05483-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
25
|
Zhao J, Liang JM, Dong ZN, Tang DY, Liu Z. NEC: A nested equivalence class-based dependency calculation approach for fast feature selection using rough set theory. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
26
|
Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics 2020; 112:4370-4384. [PMID: 32717320 DOI: 10.1016/j.ygeno.2020.07.027] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 06/22/2020] [Accepted: 07/14/2020] [Indexed: 01/19/2023]
Abstract
In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale medical datasets. On the other, medical applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the disease diagnosis and reduce its computational complexity. In this paper, a novel PSO-based multi objective feature selection method is proposed. The proposed method consists of three main phases. In the first phase, the original features are showed as a graph representation model. In the next phase, feature centralities for all nodes in the graph are calculated, and finally, in the third phase, an improved PSO-based search process is utilized to final feature selection. The results on five medical datasets indicate that the proposed method improves previous related methods in terms of efficiency and effectiveness.
Collapse
|
27
|
Zhou Y, Kang J, Zhang X. A Cooperative Coevolutionary Approach to Discretization-Based Feature Selection for High-Dimensional Data. ENTROPY 2020; 22:e22060613. [PMID: 33286385 PMCID: PMC7517144 DOI: 10.3390/e22060613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 05/27/2020] [Accepted: 05/28/2020] [Indexed: 11/30/2022]
Abstract
Recent discretization-based feature selection methods show great advantages by introducing the entropy-based cut-points for features to integrate discretization and feature selection into one stage for high-dimensional data. However, current methods usually consider the individual features independently, ignoring the interaction between features with cut-points and those without cut-points, which results in information loss. In this paper, we propose a cooperative coevolutionary algorithm based on the genetic algorithm (GA) and particle swarm optimization (PSO), which searches for the feature subsets with and without entropy-based cut-points simultaneously. For the features with cut-points, a ranking mechanism is used to control the probability of mutation and crossover in GA. In addition, a binary-coded PSO is applied to update the indices of the selected features without cut-points. Experimental results on 10 real datasets verify the effectiveness of our algorithm in classification accuracy compared with several state-of-the-art competitors.
Collapse
Affiliation(s)
- Yu Zhou
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China; (Y.Z.); (J.K.)
| | - Junhao Kang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China; (Y.Z.); (J.K.)
| | - Xiao Zhang
- College of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
- Hubei Provincial Engineering Research Center for Intelligent Management of Manufacturing Enterprises, Wuhan 430074, China
- Correspondence:
| |
Collapse
|
28
|
Enhancing BCI-Based Emotion Recognition Using an Improved Particle Swarm Optimization for Feature Selection. SENSORS 2020; 20:s20113028. [PMID: 32471047 PMCID: PMC7309000 DOI: 10.3390/s20113028] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/23/2020] [Accepted: 05/25/2020] [Indexed: 11/30/2022]
Abstract
Electroencephalogram (EEG) signals have been widely used in emotion recognition. However, the current EEG-based emotion recognition has low accuracy of emotion classification, and its real-time application is limited. In order to address these issues, in this paper, we proposed an improved feature selection algorithm to recognize subjects’ emotion states based on EEG signal, and combined this feature selection method to design an online emotion recognition brain-computer interface (BCI) system. Specifically, first, different dimensional features from the time-domain, frequency domain, and time-frequency domain were extracted. Then, a modified particle swarm optimization (PSO) method with multi-stage linearly-decreasing inertia weight (MLDW) was purposed for feature selection. The MLDW algorithm can be used to easily refine the process of decreasing the inertia weight. Finally, the emotion types were classified by the support vector machine classifier. We extracted different features from the EEG data in the DEAP data set collected by 32 subjects to perform two offline experiments. Our results showed that the average accuracy of four-class emotion recognition reached 76.67%. Compared with the latest benchmark, our proposed MLDW-PSO feature selection improves the accuracy of EEG-based emotion recognition. To further validate the efficiency of the MLDW-PSO feature selection method, we developed an online two-class emotion recognition system evoked by Chinese videos, which achieved good performance for 10 healthy subjects with an average accuracy of 89.5%. The effectiveness of our method was thus demonstrated.
Collapse
|
29
|
Improved Dominance Soft Set Based Decision Rules with Pruning for Leukemia Image Classification. ELECTRONICS 2020. [DOI: 10.3390/electronics9050794] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Acute lymphoblastic leukemia is a well-known type of pediatric cancer that affects the blood and bone marrow. If left untreated, it ends in fatal conditions due to its proliferation into the circulation system and other indispensable organs. All over the world, leukemia primarily attacks youngsters and grown-ups. The early diagnosis of leukemia is essential for the recovery of patients, particularly in the case of children. Computational tools for medical image analysis, therefore, have significant use and become the focus of research in medical image processing. The particle swarm optimization algorithm (PSO) is employed to segment the nucleus in the leukemia image. The texture, shape, and color features are extracted from the nucleus. In this article, an improved dominance soft set-based decision rules with pruning (IDSSDRP) algorithm is proposed to predict the blast and non-blast cells of leukemia. This approach proceeds with three distinct phases: (i) improved dominance soft set-based attribute reduction using AND operation in multi-soft set theory, (ii) generation of decision rules using dominance soft set, and (iii) rule pruning. The efficiency of the proposed system is compared with other benchmark classification algorithms. The research outcomes demonstrate that the derived rules efficiently classify cancer and non-cancer cells. Classification metrics are applied along with receiver operating characteristic (ROC) curve analysis to evaluate the efficiency of the proposed framework.
Collapse
|
30
|
A Review of Multimodal Medical Image Fusion Techniques. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8279342. [PMID: 32377226 PMCID: PMC7195632 DOI: 10.1155/2020/8279342] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 02/26/2020] [Accepted: 04/03/2020] [Indexed: 11/17/2022]
Abstract
The medical image fusion is the process of coalescing multiple images from multiple imaging modalities to obtain a fused image with a large amount of information for increasing the clinical applicability of medical images. In this paper, we attempt to give an overview of multimodal medical image fusion methods, putting emphasis on the most recent advances in the domain based on (1) the current fusion methods, including based on deep learning, (2) imaging modalities of medical image fusion, and (3) performance analysis of medical image fusion on mainly data set. Finally, the conclusion of this paper is that the current multimodal medical image fusion research results are more significant and the development trend is on the rise but with many challenges in the research field.
Collapse
|
31
|
Ambika M, Raghuraman G, SaiRamesh L, Ayyasamy A. Intelligence – based decision support system for diagnosing the incidence of hypertensive type. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-190143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- M. Ambika
- Department of Computer Science and Engineering, SSN College of Engineering, Kalavakkam, Chennai, Tamil Nadu, India
| | - G. Raghuraman
- Department of Computer Science and Engineering, SSN College of Engineering, Kalavakkam, Chennai, Tamil Nadu, India
| | - L. SaiRamesh
- Department of Information Science and Technology, CEG, Anna University Chennai, Tamil Nadu, India
| | - A. Ayyasamy
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Tamil Nadu, India
| |
Collapse
|
32
|
Bania RK, Halder A. R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 184:105122. [PMID: 31622857 DOI: 10.1016/j.cmpb.2019.105122] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 05/28/2023]
Abstract
BACKGROUND AND OBJECTIVE Retrieving meaningful information from high dimensional dataset is an important and challenging task. Normally, medical dataset suffers from several issues such as curse of dimensionality problem, uncertainty, presence of missing values, non-relevant and redundant attributes, etc. Any machine learning technique applied on such data (without any preprocessing) by and large takes a considerable amount of computational time and may degrade the performance of the model. METHODS In this article, R-Ensembler, a parameter free greedy ensemble attribute selection method is proposed adopting the concept of rough set theory by using the attribute-class, attribute-significance and attribute-attribute relevance measures to select a subset of attributes which are most relevant, significant and non-redundant from a pool of different attribute subsets in order to predict the presence or absence of different diseases in medical dataset. The main role of the proposed ensembler is to combine multiple subsets of attributes produced by different rough set filters and to produce an optimal subset of attributes for subsequent classification task. A novel n number of set intersection method is also proposed to reduce the biasness during the time of attribute selection process. Before selecting the minimal attribute set from a given data by the proposed R-Ensembler method, the dataset is preprocessed by the k nearest neighbour (kNN) imputation method for missing value treatment. RESULTS Experiments are carried out on seven benchmark medical datasets collected from University of California at Irvine (UCI) repository. The performance of the proposed ensemble method is compared with five state-of-the-art attribute selection algorithms, results of which are measured using three benchmark classifiers viz., Naïve Bayes, decision trees and random forest. Experimental results clearly justify the superiority of the proposed R-Ensembler method over other attribute selection algorithms. Results of paired t-test performed on average accuracies produced by different classifiers simulated on the reduced data sets achieved by the proposed and counter part attribute selection methods confirm the statistical significance of the better reduced attribute subsets achieved by the proposed R-Ensembler method compared to others. CONCLUSION The proposed ensemble method turned out to be very effective for selecting high relevant, high significant and less redundant attributes from a pool of different subsets of attributes.
Collapse
Affiliation(s)
- Rubul Kumar Bania
- Dept. of Computer Application, North-Eastern Hill University Tura Campus, Tura, Meghalaya 794002, India.
| | - Anindya Halder
- Dept. of Computer Application, North-Eastern Hill University Tura Campus, Tura, Meghalaya 794002, India.
| |
Collapse
|
33
|
Elaziz MA, Ewees AA, Ibrahim RA, Lu S. Opposition-based moth-flame optimization improved by differential evolution for feature selection. MATHEMATICS AND COMPUTERS IN SIMULATION 2020; 168:48-75. [DOI: 10.1016/j.matcom.2019.06.017] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
34
|
Leukemia Image Segmentation Using a Hybrid Histogram-Based Soft Covering Rough K-Means Clustering Algorithm. ELECTRONICS 2020. [DOI: 10.3390/electronics9010188] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Segmenting an image of a nucleus is one of the most essential tasks in a leukemia diagnostic system. Accurate and rapid segmentation methods help the physicians identify the diseases and provide better treatment at the appropriate time. Recently, hybrid clustering algorithms have started being widely used for image segmentation in medical image processing. In this article, a novel hybrid histogram-based soft covering rough k-means clustering (HSCRKM) algorithm for leukemia nucleus image segmentation is discussed. This algorithm combines the strengths of a soft covering rough set and rough k-means clustering. The histogram method was utilized to identify the number of clusters to avoid random initialization. Different types of features such as gray level co-occurrence matrix (GLCM), color, and shape-based features were extracted from the segmented image of the nucleus. Machine learning prediction algorithms were applied to classify the cancerous and non-cancerous cells. The proposed strategy is compared with an existing clustering algorithm, and the efficiency is evaluated based on the prediction metrics. The experimental results show that the HSCRKM method efficiently segments the nucleus, and it is also inferred that logistic regression and neural network perform better than other prediction algorithms.
Collapse
|
35
|
Feature Selection of Grey Wolf Optimizer Based on Quantum Computing and Uncertain Symmetry Rough Set. Symmetry (Basel) 2019. [DOI: 10.3390/sym11121470] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Considering the crucial influence of feature selection on data classification accuracy, a grey wolf optimizer based on quantum computing and uncertain symmetry rough set (QCGWORS) was proposed. QCGWORS was to apply a parallel of three theories to feature selection, and each of them owned the unique advantages of optimizing feature selection algorithm. Quantum computing had a good balance ability when exploring feature sets between global and local searches. Grey wolf optimizer could effectively explore all possible feature subsets, and uncertain symmetry rough set theory could accurately evaluate the correlation of potential feature subsets. QCGWORS intelligent algorithm could minimize the number of features while maximizing classification performance. In the experimental stage, k nearest neighbors (KNN) classifier and random forest (RF) classifier guided the machine learning process of the proposed algorithm, and 13 datasets were compared for testing experiments. Experimental results showed that compared with other feature selection methods, QCGWORS improved the classification accuracy on 12 datasets, among which the best accuracy was increased by 20.91%. In attribute reduction, each dataset had a benefit of the reduction effect of the minimum feature number.
Collapse
|
36
|
Xie X, Qin X, Zhou Q, Zhou Y, Zhang T, Janicki R, Zhao W. A novel test-cost-sensitive attribute reduction approach using the binary bat algorithm. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.104938] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
37
|
Yadav N, Chatterjee N. Rough sets based span and its application to extractive text summarization. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-190402] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Nidhika Yadav
- Department of Mathematics, Indian Institute of Technology, Delhi, India
| | | |
Collapse
|
38
|
Tawhid MA, Ibrahim AM. Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00996-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
39
|
Kar S, Das Sharma K, Maitra M. Adaptive weighted aggregation in Group Improvised Harmony Search for lung nodule classification. J EXP THEOR ARTIF IN 2019. [DOI: 10.1080/0952813x.2019.1647561] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Subhajit Kar
- Department of Electrical Engineering, Future Institute of Engineering and Management, Kolkata, India
| | | | - Madhubanti Maitra
- Department of Electrical Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
40
|
Liu H, Hu QV, He L. Term-Based Personalization for Feature Selection in Clinical Handover Form Auto-Filling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1219-1230. [PMID: 30296238 DOI: 10.1109/tcbb.2018.2874237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Feature learning and selection have been widely applied in many research areas because of their good performance and lower complexity. Traditional methods usually treat all terms with same feature sets, such that performance can be damaged when noisy information is brought via wrong features for a given term. In this paper, we propose a term-based personalization approach to finding the best features for each term. First, features are given as the input so that we focus on selection strategies. Second, the importance of each feature subset to a given term is evaluated by the term-feature probabilistic relevance model. We present a feature searching method to generate feature candidate subsets for each term, since evaluating all the possible feature subsets is computationally intensive. Finally, we obtain the personalized feature set for each term as a subset of all features. Experiments have been conducted on the NICTA Synthetic Nursing Handover dataset and the results show that our approach is promising and effective.
Collapse
|
41
|
Ye M, Wang W, Yao C, Fan R, Wang P. Gene Selection Method for Microarray Data Classification Using Particle Swarm Optimization and Neighborhood Rough Set. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190204150918] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Mining knowledge from microarray data is one of the popular research
topics in biomedical informatics. Gene selection is a significant research trend in biomedical data
mining, since the accuracy of tumor identification heavily relies on the genes biologically relevant
to the identified problems.
Objective:
In order to select a small subset of informative genes from numerous genes for tumor
identification, various computational intelligence methods were presented. However, due to the
high data dimensions, small sample size, and the inherent noise available, many computational
methods confront challenges in selecting small gene subset.
Methods:
In our study, we propose a novel algorithm PSONRS_KNN for gene selection based on
the particle swarm optimization (PSO) algorithm along with the neighborhood rough set (NRS) reduction
model and the K-nearest neighborhood (KNN) classifier.
Results:
First, the top-ranked candidate genes are obtained by the GainRatioAttributeEval preselection
algorithm in WEKA. Then, the minimum possible meaningful set of genes is selected by
combining PSO with NRS and KNN classifier.
Conclusion:
Experimental results on five microarray gene expression datasets demonstrate that the
performance of the proposed method is better than existing state-of-the-art methods in terms of
classification accuracy and the number of selected genes.
Collapse
Affiliation(s)
- Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Weiwei Wang
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Chuanwen Yao
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Rong Fan
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Peipei Wang
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| |
Collapse
|
42
|
Priyanga S, Gauthama Raman M, Jagtap SS, Aswin N, Kirthivasan K, Shankar Sriram V. An Improved Rough Set Theory based Feature Selection Approach for Intrusion Detection in SCADA Systems. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-169960] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- S. Priyanga
- Centre for Information Super Highway (CISH), School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India
| | - M.R. Gauthama Raman
- iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design (SUTD), Singapore City, Singapore
| | - Sujeet S. Jagtap
- Centre for Information Super Highway (CISH), School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India
| | - N. Aswin
- Centre for Information Super Highway (CISH), School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India
| | - Kannan Kirthivasan
- Discrete Mathematics Research Laboratory (DMRL), Department of Mathematics, SASTRA Deemed to be University, Thanjavur, Tamilnadu, India
| | - V.S. Shankar Sriram
- Centre for Information Super Highway (CISH), School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India
| |
Collapse
|
43
|
|
44
|
Gauthama Raman MR, Nivethitha S, Kannan K, Shankar Sriram VS. A hybrid approach using rough set theory and hypergraph for feature selection on high-dimensional medical datasets. Soft comput 2019. [DOI: 10.1007/s00500-019-03818-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
45
|
An improved rough set approach for optimal trust measure parameter selection in cloud environments. Soft comput 2019. [DOI: 10.1007/s00500-018-03753-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
46
|
Arunkumar C, Ramakrishnan S. Prediction of cancer using customised fuzzy rough machine learning approaches. Healthc Technol Lett 2018; 6:13-18. [PMID: 30881694 PMCID: PMC6407447 DOI: 10.1049/htl.2018.5055] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 10/09/2018] [Indexed: 11/20/2022] Open
Abstract
This Letter proposes a customised approach for attribute selection applied to the fuzzy rough quick reduct algorithm. The unbalanced data is balanced using synthetic minority oversampling technique. The huge dimensionality of the cancer data is reduced using a correlation-based filter. The dimensionality reduced balanced attribute gene subset is used to compute the final minimal reduct set using a customised fuzzy triangular norm operator on the fuzzy rough quick reduct algorithm. The customised fuzzy triangular norm operator is used with a Lukasiewicz fuzzy implicator to compute the fuzzy approximation. The customised operator selects the least number of informative feature genes from the dimensionality reduced datasets. Classification accuracy using leave-one-out cross validation of 94.85, 76.54, 98.11, and 99.13% is obtained using a customised function for Lukasiewicz triangular norm operator on leukemia, central nervous system, lung, and ovarian datasets, respectively. Performance analysis of the conventional fuzzy rough quick reduct and the proposed method are performed using parameters such as classification accuracy, precision, recall, F-measure, scatter plots, receiver operating characteristic area, McNemar test, chi-squared test, Matthew's correlation coefficient and false discovery rate that are used to prove that the proposed approach performs better than available methods in the literature.
Collapse
Affiliation(s)
- Chinnaswamy Arunkumar
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, 641112, Amrita Vishwa Vidyapeetham, India
| | - Srinivasan Ramakrishnan
- Department of Information Technology, Dr. Mahalingam College of Engineering and Technology, Pollachi, 642003, India
| |
Collapse
|
47
|
A parallel rough set based dependency calculation method for efficient feature selection. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.10.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
48
|
Nagireddy V, Parwekar P, Mishra TK. Velocity adaptation based PSO for localization in wireless sensor networks. EVOLUTIONARY INTELLIGENCE 2018. [DOI: 10.1007/s12065-018-0170-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
49
|
A feature selection technique based on rough set and improvised PSO algorithm (PSORS-FS) for permission based detection of Android malwares. INT J MACH LEARN CYB 2018. [DOI: 10.1007/s13042-018-0838-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
50
|
Arunkumar C, Ramakrishnan S. Attribute selection using fuzzy roughset based customized similarity measure for lung cancer microarray gene expression data. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.fcij.2018.02.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|