1
|
Wang J, Zhang Z, Wang Y. Utilizing Feature Selection Techniques for AI-Driven Tumor Subtype Classification: Enhancing Precision in Cancer Diagnostics. Biomolecules 2025; 15:81. [PMID: 39858475 PMCID: PMC11763904 DOI: 10.3390/biom15010081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 01/02/2025] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Cancer's heterogeneity presents significant challenges in accurate diagnosis and effective treatment, including the complexity of identifying tumor subtypes and their diverse biological behaviors. This review examines how feature selection techniques address these challenges by improving the interpretability and performance of machine learning (ML) models in high-dimensional datasets. Feature selection methods-such as filter, wrapper, and embedded techniques-play a critical role in enhancing the precision of cancer diagnostics by identifying relevant biomarkers. The integration of multi-omics data and ML algorithms facilitates a more comprehensive understanding of tumor heterogeneity, advancing both diagnostics and personalized therapies. However, challenges such as ensuring data quality, mitigating overfitting, and addressing scalability remain critical limitations of these methods. Artificial intelligence (AI)-powered feature selection offers promising solutions to these issues by automating and refining the feature extraction process. This review highlights the transformative potential of these approaches while emphasizing future directions, including the incorporation of deep learning (DL) models and integrative multi-omics strategies for more robust and reproducible findings.
Collapse
Affiliation(s)
- Jihan Wang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Zhengxiang Zhang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
2
|
Varzaneh ZA, Hosseini S. An improved equilibrium optimization algorithm for feature selection problem in network intrusion detection. Sci Rep 2024; 14:18696. [PMID: 39134565 PMCID: PMC11319621 DOI: 10.1038/s41598-024-67488-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 07/11/2024] [Indexed: 08/15/2024] Open
Abstract
In this paper, an enhanced equilibrium optimization (EO) version named Levy-opposition-equilibrium optimization (LOEO) is proposed to select effective features in network intrusion detection systems (IDSs). The opposition-based learning (OBL) approach is applied by this algorithm to improve the diversity of the population. Also, the Levy flight method is utilized to escape local optima. Then, the binary rendition of the algorithm called BLOEO is employed to feature selection in IDSs. One of the main challenges in IDSs is the high-dimensional feature space, with many irrelevant or redundant features. The BLOEO algorithm is designed to intelligently select the most informative subset of features. The empirical findings on NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets demonstrate the effectiveness of the BLOEO algorithm. This algorithm has an acceptable ability to effectively reduce the number of data features, maintaining a high intrusion detection accuracy of over 95%. Specifically, on the UNSW-NB15 dataset, BLOEO selected only 10.8 features on average, achieving an accuracy of 97.6% and a precision of 100%.
Collapse
Affiliation(s)
- Zahra Asghari Varzaneh
- Department of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Soodeh Hosseini
- Department of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran.
| |
Collapse
|
3
|
Lee J, Yoon Y, Kim J, Kim YH. Metaheuristic-Based Feature Selection Methods for Diagnosing Sarcopenia with Machine Learning Algorithms. Biomimetics (Basel) 2024; 9:179. [PMID: 38534863 DOI: 10.3390/biomimetics9030179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/01/2024] [Accepted: 03/13/2024] [Indexed: 03/28/2024] Open
Abstract
This study explores the efficacy of metaheuristic-based feature selection in improving machine learning performance for diagnosing sarcopenia. Extraction and utilization of features significantly impacting diagnosis efficacy emerge as a critical facet when applying machine learning for sarcopenia diagnosis. Using data from the 8th Korean Longitudinal Study on Aging (KLoSA), this study examines harmony search (HS) and the genetic algorithm (GA) for feature selection. Evaluation of the resulting feature set involves a decision tree, a random forest, a support vector machine, and naïve bayes algorithms. As a result, the HS-derived feature set trained with a support vector machine yielded an accuracy of 0.785 and a weighted F1 score of 0.782, which outperformed traditional methods. These findings underscore the competitive edge of metaheuristic-based selection, demonstrating its potential in advancing sarcopenia diagnosis. This study advocates for further exploration of metaheuristic-based feature selection's pivotal role in future sarcopenia research.
Collapse
Affiliation(s)
- Jaehyeong Lee
- Department of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of Korea
| | - Yourim Yoon
- Department of Computer Engineering, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of Korea
| | - Jiyoun Kim
- Department of Exercise Rehabilitation, Gachon University, 191 Hambakmoe-ro, Yeonsu-gu, Incheon 21936, Republic of Korea
| | - Yong-Hyuk Kim
- School of Software, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of Korea
| |
Collapse
|
4
|
Liu G, Guo Z, Liu W, Jiang F, Fu E. A feature selection method based on the Golden Jackal-Grey Wolf Hybrid Optimization Algorithm. PLoS One 2024; 19:e0295579. [PMID: 38165924 PMCID: PMC10760777 DOI: 10.1371/journal.pone.0295579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 11/20/2023] [Indexed: 01/04/2024] Open
Abstract
This paper proposes a feature selection method based on a hybrid optimization algorithm that combines the Golden Jackal Optimization (GJO) and Grey Wolf Optimizer (GWO). The primary objective of this method is to create an effective data dimensionality reduction technique for eliminating redundant, irrelevant, and noisy features within high-dimensional datasets. Drawing inspiration from the Chinese idiom "Chai Lang Hu Bao," hybrid algorithm mechanisms, and cooperative behaviors observed in natural animal populations, we amalgamate the GWO algorithm, the Lagrange interpolation method, and the GJO algorithm to propose the multi-strategy fusion GJO-GWO algorithm. In Case 1, the GJO-GWO algorithm addressed eight complex benchmark functions. In Case 2, GJO-GWO was utilized to tackle ten feature selection problems. Experimental results consistently demonstrate that under identical experimental conditions, whether solving complex benchmark functions or addressing feature selection problems, GJO-GWO exhibits smaller means, lower standard deviations, higher classification accuracy, and reduced execution times. These findings affirm the superior optimization performance, classification accuracy, and stability of the GJO-GWO algorithm.
Collapse
Affiliation(s)
- Guangwei Liu
- College of Mining, Liaoning Technical University, Fuxin, Liaoning, China
| | - Zhiqing Guo
- College of Mining, Liaoning Technical University, Fuxin, Liaoning, China
| | - Wei Liu
- College of Science, Liaoning Technical University, Fuxin, Liaoning, China
| | - Feng Jiang
- College of Science, Liaoning Technical University, Fuxin, Liaoning, China
| | - Ensan Fu
- College of Mining, Liaoning Technical University, Fuxin, Liaoning, China
| |
Collapse
|
5
|
Betshrine Rachel R, Khanna Nehemiah H, Singh VK, Manoharan RMV. Diagnosis of Covid-19 from CT slices using Whale Optimization Algorithm, Support Vector Machine and Multi-Layer Perceptron. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024; 32:253-269. [PMID: 38189732 DOI: 10.3233/xst-230196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
BACKGROUND The coronavirus disease 2019 is a serious and highly contagious disease caused by infection with a newly discovered virus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). OBJECTIVE A Computer Aided Diagnosis (CAD) system to assist physicians to diagnose Covid-19 from chest Computed Tomography (CT) slices is modelled and experimented. METHODS The lung tissues are segmented using Otsu's thresholding method. The Covid-19 lesions have been annotated as the Regions of Interest (ROIs), which is followed by texture and shape extraction. The obtained features are stored as feature vectors and split into 80:20 train and test sets. To choose the optimal features, Whale Optimization Algorithm (WOA) with Support Vector Machine (SVM) classifier's accuracy is employed. A Multi-Layer Perceptron (MLP) classifier is trained to perform classification with the selected features. RESULTS Comparative experimentations of the proposed system with existing eight benchmark Machine Learning classifiers using real-time dataset demonstrates that the proposed system with 88.94% accuracy outperforms the benchmark classifier's results. Statistical analysis namely, Friedman test, Mann Whitney U test and Kendall's Rank Correlation Coefficient Test has been performed which indicates that the proposed method has a significant impact on the novel dataset considered. CONCLUSION The MLP classifier's accuracy without feature selection yielded 80.40%, whereas with feature selection using WOA, it yielded 88.94%.
Collapse
Affiliation(s)
- R Betshrine Rachel
- Ramanujan Computing Centre, College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India
| | - H Khanna Nehemiah
- Ramanujan Computing Centre, College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India
| | - Vaibhav Kumar Singh
- Alumna, Department of Information Science and Technology, College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India
| | - Rebecca Mercy Victoria Manoharan
- Alumna, Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India
| |
Collapse
|
6
|
Alweshah M, Aldabbas Y, Abu-Salih B, Oqeil S, Hasan HS, Alkhalaileh S, Kassaymeh S. Hybrid black widow optimization with iterated greedy algorithm for gene selection problems. Heliyon 2023; 9:e20133. [PMID: 37809602 PMCID: PMC10559925 DOI: 10.1016/j.heliyon.2023.e20133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/03/2023] [Accepted: 09/12/2023] [Indexed: 10/10/2023] Open
Abstract
Gene Selection (GS) is a strategy method targeted at reducing redundancy, limited expressiveness, and low informativeness in gene expression datasets obtained by DNA Microarray technology. These datasets contain a plethora of diverse and high-dimensional samples and genes, with a significant discrepancy in the number of samples and genes present. The complexities of GS are especially noticeable in the context of microarray expression data analysis, owing to the inherent data imbalance. The main goal of this study is to offer a simplified and computationally effective approach to dealing with the conundrum of attribute selection in microarray gene expression data. We use the Black Widow Optimization algorithm (BWO) in the context of GS to achieve this, using two unique methodologies: the unaltered BWO variation and the hybridized BWO variant combined with the Iterated Greedy algorithm (BWO-IG). By improving the local search capabilities of BWO, this hybridization attempts to promote more efficient gene selection. A series of tests was carried out using nine benchmark datasets that were obtained from the gene expression data repository in the pursuit of empirical validation. The results of these tests conclusively show that the BWO-IG technique performs better than the traditional BWO algorithm. Notably, the hybridized BWO-IG technique excels in the efficiency of local searches, making it easier to identify relevant genes and producing findings with higher levels of reliability in terms of accuracy and the degree of gene pruning. Additionally, a comparison analysis is done against five modern wrapper Feature Selection (FS) methodologies, namely BIMFOHHO, BMFO, BHHO, BCS, and BBA, in order to put the suggested BWO-IG method's effectiveness into context. The comparison that follows highlights BWO-IG's obvious superiority in reducing the number of selected genes while also obtaining remarkably high classification accuracy. The key findings were an average classification accuracy of 94.426, average fitness values of 0.061, and an average number of selected genes of 2933.767.
Collapse
Affiliation(s)
- Mohammed Alweshah
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Yasmeen Aldabbas
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Bilal Abu-Salih
- Department of Computer Science, King Abdullah II School of Information Technology, The University of Jordan, Amman, Jordan
| | - Saleh Oqeil
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Hazem S. Hasan
- Department of Plant Production and Protection, Faculty of Agricultural Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Saleh Alkhalaileh
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Sofian Kassaymeh
- Software Engineering Department, Faculty of Information Technology, Aqaba University of Technology, Aqaba, Jordan
| |
Collapse
|
7
|
A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack Detection. INFORMATION 2023. [DOI: 10.3390/info14030191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open
Abstract
Cyber attack detection technology plays a vital role today, since cyber attacks have been causing great harm and loss to organizations and individuals. Feature selection is a necessary step for many cyber-attack detection systems, because it can reduce training costs, improve detection performance, and make the detection system lightweight. Many techniques related to feature selection for cyber attack detection have been proposed, and each technique has advantages and disadvantages. Determining which technology should be selected is a challenging problem for many researchers and system developers, and although there have been several survey papers on feature selection techniques in the field of cyber security, most of them try to be all-encompassing and are too general, making it difficult for readers to grasp the concrete and comprehensive image of the methods. In this paper, we survey the filter-based feature selection technique in detail and comprehensively for the first time. The filter-based technique is one popular kind of feature selection technique and is widely used in both research and application. In addition to general descriptions of this kind of method, we also explain in detail search algorithms and relevance measures, which are two necessary technical elements commonly used in the filter-based technique.
Collapse
|
8
|
Karlupia N, Abrol P. Wrapper-based optimized feature selection using nature-inspired algorithms. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08383-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
9
|
An in-depth and contrasting survey of meta-heuristic approaches with classical feature selection techniques specific to cervical cancer. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-022-01825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
10
|
Lai J, Chen H, Li T, Yang X. Adaptive graph learning for semi-supervised feature selection with redundancy minimization. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
11
|
Saeed A, Zaffar M, Abbas MA, Quraishi KS, Shahrose A, Irfan M, Huneif MA, Abdulwahab A, Alduraibi SK, Alshehri F, Alduraibi AK, Almushayti Z. A Turf-Based Feature Selection Technique for Predicting Factors Affecting Human Health during Pandemic. Life (Basel) 2022; 12:life12091367. [PMID: 36143404 PMCID: PMC9502730 DOI: 10.3390/life12091367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/22/2022] [Accepted: 08/24/2022] [Indexed: 11/30/2022] Open
Abstract
Worldwide, COVID-19 is a highly contagious epidemic that has affected various fields. Using Artificial Intelligence (AI) and particular feature selection approaches, this study evaluates the aspects affecting the health of students throughout the COVID-19 lockdown time. The research presented in this paper plays a vital role in indicating the factor affecting the health of students during the lockdown in the COVID-19 pandemic. The research presented in this article investigates COVID-19’s impact on student health using feature selections. The Filter feature selection technique is used in the presented work to statistically analyze all the features in the dataset, and for better accuracy. ReliefF (TuRF) filter feature selection is tuned and utilized in such a way that it helps to identify the factors affecting students’ health from a benchmark dataset of students studying during COVID-19. Random Forest (RF), Gradient Boosted Decision Trees (GBDT), Support Vector Machine (SVM), and 2- layer Neural Network (NN), helps in identifying the most critical indicators for rapid intervention. Results of the approach presented in the paper identified that the students who maintained their weight and kept themselves busy in health activities in the pandemic, such student’s remained healthy through this pandemic and study from home in a positive manner. The results suggest that the 2- layer NN machine-learning algorithm showed better accuracy (90%) to predict the factors affecting on health issues of students during COVID-19 lockdown time.
Collapse
Affiliation(s)
- Alqahtani Saeed
- Department of Surgery, Faculty of Medicine, Najran University, Najran 61441, Saudi Arabia
| | - Maryam Zaffar
- Faculty of Computer Sciences, IBADAT International University, Islamabad 44000, Pakistan
- Correspondence:
| | - Mohammed Ali Abbas
- Faculty of Computer Sciences, IBADAT International University, Islamabad 44000, Pakistan
| | - Khurrum Shehzad Quraishi
- Department of Chemical Engineering, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 44000, Pakistan
| | - Abdullah Shahrose
- Department of Computer Science, HITEC University, Taxila 47080, Pakistan
| | - Muhammad Irfan
- Electrical Engineering Department, College of Engineering, Najran University Saudi Arabia, Najran 61441, Saudi Arabia
| | - Mohammed Ayed Huneif
- Department of Pediatrics, College of Medicine, Najran University, Najran 61441, Saudi Arabia
| | - Alqahtani Abdulwahab
- Department of Pediatrics, College of Medicine, Najran University, Najran 61441, Saudi Arabia
| | | | - Fahad Alshehri
- Department of Radiology, College of Medicine, Qassim University, Buraidah 52571, Saudi Arabia
| | - Alaa Khalid Alduraibi
- Department of Radiology, College of Medicine, Qassim University, Buraidah 52571, Saudi Arabia
| | - Ziyad Almushayti
- Department of Radiology, College of Medicine, Qassim University, Buraidah 52571, Saudi Arabia
| |
Collapse
|
12
|
Feature Selection Using Artificial Gorilla Troop Optimization for Biomedical Data: A Case Analysis with COVID-19 Data. MATHEMATICS 2022. [DOI: 10.3390/math10152742] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Feature selection (FS) is commonly thought of as a pre-processing strategy for determining the best subset of characteristics from a given collection of features. Here, a novel discrete artificial gorilla troop optimization (DAGTO) technique is introduced for the first time to handle FS tasks in the healthcare sector. Depending on the number and type of objective functions, four variants of the proposed method are implemented in this article, namely: (1) single-objective (SO-DAGTO), (2) bi-objective (wrapper) (MO-DAGTO1), (3) bi-objective (filter wrapper hybrid) (MO-DAGTO2), and (4) tri-objective (filter wrapper hybrid) (MO-DAGTO3) for identifying relevant features in diagnosing a particular disease. We provide an outstanding gorilla initialization strategy based on the label mutual information (MI) with the aim of increasing population variety and accelerate convergence. To verify the performance of the presented methods, ten medical datasets are taken into consideration, which are of variable dimensions. A comparison is also implemented between the best of the four suggested approaches (MO-DAGTO2) and four established multi-objective FS strategies, and it is statistically proven to be the superior one. Finally, a case study with COVID-19 samples is performed to extract the critical factors related to it and to demonstrate how this method is fruitful in real-world applications.
Collapse
|
13
|
Dokeroglu T, Deniz A, Kiziloz HE. A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.083] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
14
|
Valcárcel LV, San José-Enériz E, Cendoya X, Rubio Á, Agirre X, Prósper F, Planes FJ. BOSO: A novel feature selection algorithm for linear regression with high-dimensional data. PLoS Comput Biol 2022; 18:e1010180. [PMID: 35639775 PMCID: PMC9187084 DOI: 10.1371/journal.pcbi.1010180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 06/10/2022] [Accepted: 05/07/2022] [Indexed: 11/18/2022] Open
Abstract
With the frenetic growth of high-dimensional datasets in different biomedical domains, there is an urgent need to develop predictive methods able to deal with this complexity. Feature selection is a relevant strategy in machine learning to address this challenge. We introduce a novel feature selection algorithm for linear regression called BOSO (Bilevel Optimization Selector Operator). We conducted a benchmark of BOSO with key algorithms in the literature, finding a superior accuracy for feature selection in high-dimensional datasets. Proof-of-concept of BOSO for predicting drug sensitivity in cancer is presented. A detailed analysis is carried out for methotrexate, a well-studied drug targeting cancer metabolism. We present BOSO (Bilevel Optimization Selector Operator), a novel method to conduct feature selection in linear regression models. In machine learning, feature selection consists of identifying the subset of input variables (features) that are correctly associated with the response variable that is aimed to be predicted. An adequate feature selection is particularly relevant for high-dimensional datasets, commonly encountered in biomedical research questions that rely on -omics data, e.g. predictive models of drug sensitivity, resistance or toxicity, construction of gene regulatory networks, biomarker selection or association studies. The need of feature selection is emphasized in many of these complex problems, since the number of features is greater than the number of samples, which makes it harder to obtain accurate and general predictive models. In this context, we show that the models derived by BOSO make a better combination of accuracy and simplicity than competing approaches in the literature. The relevance of BOSO is illustrated in the prediction of drug sensitivity of cancer cell lines, using RNA-seq data and drug screenings from GDSC (Genomics of Drug Sensitivity in Cancer) database. BOSO obtains linear regression models with a similar level of accuracy but involving a substantially lower number of features, which simplifies the interpretation and validation of predictive models.
Collapse
Affiliation(s)
- Luis V. Valcárcel
- Universidad de Navarra, Tecnun Escuela de Ingeniería, San Sebastián, Spain
- Universidad de Navarra, CIMA Centro de Investigación de Medicina Aplicada, Pamplona, Spain
| | - Edurne San José-Enériz
- Universidad de Navarra, CIMA Centro de Investigación de Medicina Aplicada, Pamplona, Spain
- CIBERONC Centro de Investigación Biomédica en Red de Cáncer, Pamplona, Spain
| | - Xabier Cendoya
- Universidad de Navarra, Tecnun Escuela de Ingeniería, San Sebastián, Spain
| | - Ángel Rubio
- Universidad de Navarra, Tecnun Escuela de Ingeniería, San Sebastián, Spain
- Universidad de Navarra, Centro de Ingeniería Biomédica, Pamplona, Spain
- Universidad de Navarra, DATAI Instituto de Ciencia de los Datos e Inteligencia Artificial, Pamplona, Spain
| | - Xabier Agirre
- Universidad de Navarra, CIMA Centro de Investigación de Medicina Aplicada, Pamplona, Spain
- CIBERONC Centro de Investigación Biomédica en Red de Cáncer, Pamplona, Spain
| | - Felipe Prósper
- Universidad de Navarra, CIMA Centro de Investigación de Medicina Aplicada, Pamplona, Spain
- CIBERONC Centro de Investigación Biomédica en Red de Cáncer, Pamplona, Spain
- IdiSNA Instituto de Investigación Sanitaria de Navarra, Pamplona, Spain
- Clínica Universidad de Navarra, Pamplona, Spain
| | - Francisco J. Planes
- Universidad de Navarra, Tecnun Escuela de Ingeniería, San Sebastián, Spain
- Universidad de Navarra, Centro de Ingeniería Biomédica, Pamplona, Spain
- Universidad de Navarra, DATAI Instituto de Ciencia de los Datos e Inteligencia Artificial, Pamplona, Spain
- * E-mail:
| |
Collapse
|
15
|
Abstract
AbstractFeature Selection (FS) is an important preprocessing step that is involved in machine learning and data mining tasks for preparing data (especially high-dimensional data) by eliminating irrelevant and redundant features, thus reducing the potential curse of dimensionality of a given large dataset. Consequently, FS is arguably a combinatorial NP-hard problem in which the computational time increases exponentially with an increase in problem complexity. To tackle such a problem type, meta-heuristic techniques have been opted by an increasing number of scholars. Herein, a novel meta-heuristic algorithm, called Sparrow Search Algorithm (SSA), is presented. The SSA still performs poorly on exploratory behavior and exploration-exploitation trade-off because it does not duly stimulate the search within feasible regions, and the exploitation process suffers noticeable stagnation. Therefore, we improve SSA by adopting: i) a strategy for Random Re-positioning of Roaming Agents (3RA); and ii) a novel Local Search Algorithm (LSA), which are algorithmically incorporated into the original SSA structure. To the FS problem, SSA is improved and cloned as a binary variant, namely, the improved Binary SSA (iBSSA), which would strive to select the optimal or near-optimal features from a given dataset while keeping the classification accuracy maximized. For binary conversion, the iBSSA was primarily validated against nine common S-shaped and V-shaped Transfer Functions (TFs), thus producing nine iBSSA variants. To verify the robustness of these variants, three well-known classification techniques, including k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Random Forest (RF) were adopted as fitness evaluators with the proposed iBSSA approach and many other competing algorithms, on 18 multifaceted, multi-scale benchmark datasets from the University of California Irvine (UCI) data repository. Then, the overall best-performing iBSSA variant for each of the three classifiers was compared with binary variants of 12 different well-known meta-heuristic algorithms, including the original SSA (BSSA), Artificial Bee Colony (BABC), Particle Swarm Optimization (BPSO), Bat Algorithm (BBA), Grey Wolf Optimization (BGWO), Whale Optimization Algorithm (BWOA), Grasshopper Optimization Algorithm (BGOA) SailFish Optimizer (BSFO), Harris Hawks Optimization (BHHO), Bird Swarm Algorithm (BBSA), Atom Search Optimization (BASO), and Henry Gas Solubility Optimization (BHGSO). Based on a Wilcoxon’s non-parametric statistical test ($$\alpha =0.05$$
α
=
0.05
), the superiority of iBSSA with the three classifiers was very evident against counterparts across the vast majority of the selected datasets, achieving a feature size reduction of up to 92% along with up to 100% classification accuracy on some of those datasets.
Collapse
|
16
|
Quincozes SE, Mosse D, Passos D, Albuquerque C, Ochi LS, dos Santos VF. On the Performance of GRASP-Based Feature Selection for CPS Intrusion Detection. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2021.3088763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
17
|
Isaac A, Nehemiah HK, Dunston SD, Elgin Christo V, Kannan A. Feature selection using competitive coevolution of bio-inspired algorithms for the diagnosis of pulmonary emphysema. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103340] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
18
|
Abed-alguni BH, Paul D. Island-based Cuckoo Search with elite opposition-based learning and multiple mutation methods for solving optimization problems. Soft comput 2022. [DOI: 10.1007/s00500-021-06665-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
19
|
Uzma, Halim Z. An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
20
|
Bayzidi H, Talatahari S, Saraee M, Lamarche CP. Social Network Search for Solving Engineering Optimization Problems. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8548639. [PMID: 34630556 PMCID: PMC8497131 DOI: 10.1155/2021/8548639] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 11/18/2022]
Abstract
In this paper, a new metaheuristic optimization algorithm, called social network search (SNS), is employed for solving mixed continuous/discrete engineering optimization problems. The SNS algorithm mimics the social network user's efforts to gain more popularity by modeling the decision moods in expressing their opinions. Four decision moods, including imitation, conversation, disputation, and innovation, are real-world behaviors of users in social networks. These moods are used as optimization operators that model how users are affected and motivated to share their new views. The SNS algorithm was verified with 14 benchmark engineering optimization problems and one real application in the field of remote sensing. The performance of the proposed method is compared with various algorithms to show its effectiveness over other well-known optimizers in terms of computational cost and accuracy. In most cases, the optimal solutions achieved by the SNS are better than the best solution obtained by the existing methods.
Collapse
Affiliation(s)
- Hadi Bayzidi
- Department of Civil Engineering, University of Tabriz, Tabriz, Iran
| | - Siamak Talatahari
- Department of Civil Engineering, University of Tabriz, Tabriz, Iran
- Engineering Faculty, Near East University, North Cyprus, Mersin 10, Turkey
| | - Meysam Saraee
- Department of Civil Engineering, University of Tabriz, Tabriz, Iran
| | | |
Collapse
|
21
|
A robust multiobjective Harris’ Hawks Optimization algorithm for the binary classification problem. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107219] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
22
|
Lappas PZ, Yannacopoulos AN. A machine learning approach combining expert knowledge with genetic algorithms in feature selection for credit risk assessment. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107391] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Elminaam DSA, Nabil A, Ibraheem SA, Houssein EH. An Efficient Marine Predators Algorithm for Feature Selection. IEEE ACCESS 2021; 9:60136-60153. [DOI: 10.1109/access.2021.3073261] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
24
|
Evolutionary computing for clinical dataset classification using a novel feature selection algorithm. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2020. [DOI: 10.1016/j.jksuci.2020.12.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Solving feature selection problems by combining mutation and crossover operations with the monarch butterfly optimization algorithm. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01981-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
|
27
|
Tawhid MA, Ibrahim AM. Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00996-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
28
|
Feature selection for intrusion detection using new multi-objective estimation of distribution algorithms. APPL INTELL 2019. [DOI: 10.1007/s10489-019-01503-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
29
|
Ahmad SR, Bakar AA, Yaakub MR. Ant colony optimization for text feature selection in sentiment analysis. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-173740] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Siti Rohaidah Ahmad
- Department of Computer Science, Faculty of Defence Science and Technology, Universiti Pertahanan Nasional Malaysia, Kuala Lumpur 57000, Malaysia
| | - Azuraliza Abu Bakar
- Data Mining and Optimization Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi Selangor 46000, Malaysia
| | - Mohd Ridzwan Yaakub
- Data Mining and Optimization Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi Selangor 46000, Malaysia
| |
Collapse
|
30
|
Ahmad SR, Bakar AA, Yaakub MR. A review of feature selection techniques in sentiment analysis. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-173763] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Siti Rohaidah Ahmad
- Department of Computer Science, Faculty of Defence Science and Technology, Universiti Pertahanan Nasional Malaysia, Kuala Lumpur, Malaysia
| | - Azuraliza Abu Bakar
- Data Mining and Optimization Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| | - Mohd Ridzwan Yaakub
- Data Mining and Optimization Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
| |
Collapse
|
31
|
A Fuzzy Classifier with Feature Selection Based on the Gravitational Search Algorithm. Symmetry (Basel) 2018. [DOI: 10.3390/sym10110609] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper concerns several important topics of the Symmetry journal, namely, pattern recognition, computer-aided design, diversity and similarity. We also take advantage of the symmetric and asymmetric structure of a transfer function, which is responsible to map a continuous search space to a binary search space. A new method for design of a fuzzy-rule-based classifier using metaheuristics called Gravitational Search Algorithm (GSA) is discussed. The paper identifies three basic stages of the classifier construction: feature selection, creating of a fuzzy rule base and optimization of the antecedent parameters of rules. At the first stage, several feature subsets are obtained by using the wrapper scheme on the basis of the binary GSA. Creating fuzzy rules is a serious challenge in designing the fuzzy-rule-based classifier in the presence of high-dimensional data. The classifier structure is formed by the rule base generation algorithm by using minimum and maximum feature values. The optimal fuzzy-rule-based parameters are extracted from the training data using the continuous GSA. The classifier performance is tested on real-world KEEL (Knowledge Extraction based on Evolutionary Learning) datasets. The results demonstrate that highly accurate classifiers could be constructed with relatively few fuzzy rules and features.
Collapse
|
32
|
Medical data mining in sentiment analysis based on optimized swarm search feature selection. AUSTRALASIAN PHYSICAL & ENGINEERING SCIENCES IN MEDICINE 2018; 41:1087-1100. [PMID: 30206813 DOI: 10.1007/s13246-018-0674-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 08/09/2018] [Indexed: 10/28/2022]
Abstract
In this paper, we propose a novel technique termed as optimized swarm search-based feature selection (OS-FS), which is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. In terms of gaining insights from unstructured medical based texts, sentiment prediction is becoming an increasingly crucial machine learning technique. In fact, due to its robustness and accuracy, it recently gained popularity in the medical industries. Medical text mining is well known as a fundamental data analytic for sentiment prediction. To form a high-dimensional sparse matrix, a popular preprocessing step in text mining is employed to transform medical text strings to word vectors. However, such a sparse matrix poses problems to the induction of accurate sentiment prediction model. The swarm search in our proposed OS-FS can be optimized by a new feature evaluation technique called clustering-by-coefficient-of-variation. In order to find a subset of features from all the original features from the sparse matrix, this type of feature selection has been a commonly utilized dimensionality reduction technique, and has the capability to improve accuracy of the prediction model. We implement this method based on a case scenario where 279 medical articles related to 'meaningful use functionalities on health care quality, safety, and efficiency' from a systematic review of previous medical IT literature. For this medical text mining, a multi-class of sentiments, positive, mixed-positive, neutral and negative is recognized from the document contents. Our experimental results demonstrate the superiority of OS-FS over traditional feature selection methods in literature.
Collapse
|
33
|
Sato T, Takano Y, Nakahara T. Investigating consumers’ store-choice behavior via hierarchical variable selection. ADV DATA ANAL CLASSI 2018. [DOI: 10.1007/s11634-018-0327-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
34
|
Chebouba L, Boughaci D, Guziolowski C. Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification. J Med Syst 2018; 42:129. [PMID: 29869179 DOI: 10.1007/s10916-018-0972-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 05/18/2018] [Indexed: 01/02/2023]
Abstract
The use of data issued from high throughput technologies in drug target problems is widely widespread during the last decades. This study proposes a meta-heuristic framework using stochastic local search (SLS) combined with random forest (RF) where the aim is to specify the most important genes and proteins leading to the best classification of Acute Myeloid Leukemia (AML) patients. First we use a stochastic local search meta-heuristic as a feature selection technique to select the most significant proteins to be used in the classification task step. Then we apply RF to classify new patients into their corresponding classes. The evaluation technique is to run the RF classifier on the training data to get a model. Then, we apply this model on the test data to find the appropriate class. We use as metrics the balanced accuracy (BAC) and the area under the receiver operating characteristic curve (AUROC) to measure the performance of our model. The proposed method is evaluated on the dataset issued from DREAM 9 challenge. The comparison is done with a pure random forest (without feature selection), and with the two best ranked results of the DREAM 9 challenge. We used three types of data: only clinical data, only proteomics data, and finally clinical and proteomics data combined. The numerical results show that the highest scores are obtained when using clinical data alone, and the lowest is obtained when using proteomics data alone. Further, our method succeeds in finding promising results compared to the methods presented in the DREAM challenge.
Collapse
Affiliation(s)
- Lokmane Chebouba
- Department of Computer Science, LRIA Laboratory, Electrical Engineering and Computer Science Faculty, University of Science and Technology Houari Boumediene (USTHB), El-Alia BP 32, Bab-Ezzouar, Algiers, Algeria.
| | - Dalila Boughaci
- Department of Computer Science, LRIA Laboratory, Electrical Engineering and Computer Science Faculty, University of Science and Technology Houari Boumediene (USTHB), El-Alia BP 32, Bab-Ezzouar, Algiers, Algeria
| | | |
Collapse
|
35
|
Sheykhizadeh S, Naseri A. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2018; 194:202-210. [PMID: 29353216 DOI: 10.1016/j.saa.2018.01.028] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/08/2018] [Accepted: 01/11/2018] [Indexed: 06/07/2023]
Abstract
Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.
Collapse
Affiliation(s)
- Saheleh Sheykhizadeh
- Department of Analytical Chemistry, Faculty of Chemistry, University of Tabriz, Tabriz, Iran
| | - Abdolhossein Naseri
- Department of Analytical Chemistry, Faculty of Chemistry, University of Tabriz, Tabriz, Iran.
| |
Collapse
|
36
|
Dong H, Li T, Ding R, Sun J. A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.12.048] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
37
|
Yazdani S, Shanbehzadeh J, Hadavandi E. MBCGP-FE: A modified balanced cartesian genetic programming feature extractor. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
38
|
RIFS: a randomly restarted incremental feature selection algorithm. Sci Rep 2017; 7:13013. [PMID: 29026108 PMCID: PMC5638869 DOI: 10.1038/s41598-017-13259-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 09/21/2017] [Indexed: 11/24/2022] Open
Abstract
The advent of big data era has imposed both running time and learning efficiency challenges for the machine learning researchers. Biomedical OMIC research is one of these big data areas and has changed the biomedical research drastically. But the high cost of data production and difficulty in participant recruitment introduce the paradigm of “large p small n” into the biomedical research. Feature selection is usually employed to reduce the high number of biomedical features, so that a stable data-independent classification or regression model may be achieved. This study randomly changes the first element of the widely-used incremental feature selection (IFS) strategy and selects the best feature subset that may be ranked low by the statistical association evaluation algorithms, e.g. t-test. The hypothesis is that two low-ranked features may be orchestrated to achieve a good classification performance. The proposed Randomly re-started Incremental Feature Selection (RIFS) algorithm demonstrates both higher classification accuracy and smaller feature number than the existing algorithms. RIFS also outperforms the existing methylomic diagnosis model for the prostate malignancy with a larger accuracy and a lower number of transcriptomic features.
Collapse
|
39
|
A novel Hybrid Genetic Local Search Algorithm for feature selection and weighting with an application in strategic decision making in innovation management. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.04.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
40
|
Pereira LAM, Papa JP, Coelho ALV, Lima CAM, Pereira DR, de Albuquerque VHC. Automatic identification of epileptic EEG signals through binary magnetic optimization algorithms. Neural Comput Appl 2017. [DOI: 10.1007/s00521-017-3124-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
41
|
Deniz A, Kiziloz HE, Dokeroglu T, Cosar A. Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.033] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
Qureshi MNI, Oh J, Min B, Jo HJ, Lee B. Multi-modal, Multi-measure, and Multi-class Discrimination of ADHD with Hierarchical Feature Extraction and Extreme Learning Machine Using Structural and Functional Brain MRI. Front Hum Neurosci 2017; 11:157. [PMID: 28420972 PMCID: PMC5378777 DOI: 10.3389/fnhum.2017.00157] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 03/16/2017] [Indexed: 12/18/2022] Open
Abstract
Structural and functional MRI unveil many hidden properties of the human brain. We performed this multi-class classification study on selected subjects from the publically available attention deficit hyperactivity disorder ADHD-200 dataset of patients and healthy children. The dataset has three groups, namely, ADHD inattentive, ADHD combined, and typically developing. We calculated the global averaged functional connectivity maps across the whole cortex to extract anatomical atlas parcellation based features from the resting-state fMRI (rs-fMRI) data and cortical parcellation based features from the structural MRI (sMRI) data. In addition, the preprocessed image volumes from both of these modalities followed an ANOVA analysis separately using all the voxels. This study utilized the average measure from the most significant regions acquired from ANOVA as features for classification in addition to the multi-modal and multi-measure features of structural and functional MRI data. We extracted most discriminative features by hierarchical sparse feature elimination and selection algorithm. These features include cortical thickness, image intensity, volume, cortical thickness standard deviation, surface area, and ANOVA based features respectively. An extreme learning machine performed both the binary and multi-class classifications in comparison with support vector machines. This article reports prediction accuracy of both unimodal and multi-modal features from test data. We achieved 76.190% (p < 0.0001) classification accuracy in multi-class settings as well as 92.857% (p < 0.0001) classification accuracy in binary settings. In addition, we found ANOVA-based significant regions of the brain that also play a vital role in the classification of ADHD. Thus, from a clinical perspective, this multi-modal group analysis approach with multi-measure features may improve the accuracy of the ADHD differential diagnosis.
Collapse
Affiliation(s)
- Muhammad Naveed Iqbal Qureshi
- Department of Biomedical Science and Engineering, Institute of Integrated Technology, Gwangju Institute of Science and TechnologyGwangju, South Korea
| | - Jooyoung Oh
- Department of Biomedical Science and Engineering, Institute of Integrated Technology, Gwangju Institute of Science and TechnologyGwangju, South Korea
| | - Beomjun Min
- Department of Neuropsychiatry, Seoul National University HospitalSeoul, South Korea
| | - Hang Joon Jo
- Department of Neurologic Surgery, Mayo ClinicRochester, MN, USA
| | - Boreom Lee
- Department of Biomedical Science and Engineering, Institute of Integrated Technology, Gwangju Institute of Science and TechnologyGwangju, South Korea
| |
Collapse
|
43
|
Tiwari S, Singh B, Kaur M. An approach for feature selection using local searching and global optimization techniques. Neural Comput Appl 2017. [DOI: 10.1007/s00521-017-2959-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
44
|
Kesharaju M, Nagarajah R. Particle Swarm Optimization approach to defect detection in armour ceramics. ULTRASONICS 2017; 75:124-131. [PMID: 27951501 DOI: 10.1016/j.ultras.2016.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Revised: 07/10/2016] [Accepted: 07/18/2016] [Indexed: 06/06/2023]
Abstract
In this research, various extracted features were used in the development of an automated ultrasonic sensor based inspection system that enables defect classification in each ceramic component prior to despatch to the field. Classification is an important task and large number of irrelevant, redundant features commonly introduced to a dataset reduces the classifiers performance. Feature selection aims to reduce the dimensionality of the dataset while improving the performance of a classification system. In the context of a multi-criteria optimization problem (i.e. to minimize classification error rate and reduce number of features) such as one discussed in this research, the literature suggests that evolutionary algorithms offer good results. Besides, it is noted that Particle Swarm Optimization (PSO) has not been explored especially in the field of classification of high frequency ultrasonic signals. Hence, a binary coded Particle Swarm Optimization (BPSO) technique is investigated in the implementation of feature subset selection and to optimize the classification error rate. In the proposed method, the population data is used as input to an Artificial Neural Network (ANN) based classification system to obtain the error rate, as ANN serves as an evaluator of PSO fitness function.
Collapse
Affiliation(s)
- Manasa Kesharaju
- Swinburne University of Technology, Faculty of Engineering & Industrial Sciences, Melbourne, Victoria 3122, Australia; Defence Materials Technology Centre (DMTC LTD), Melbourne, Victoria 3122, Australia.
| | - Romesh Nagarajah
- Swinburne University of Technology, Faculty of Engineering & Industrial Sciences, Melbourne, Victoria 3122, Australia; Defence Materials Technology Centre (DMTC LTD), Melbourne, Victoria 3122, Australia
| |
Collapse
|
45
|
Amraoui H, Mhamdi F, Elloumi M. Survey of Metaheuristics and Statistical Methods for Multifactorial Diseases Analyses. AIMS MEDICAL SCIENCE 2017. [DOI: 10.3934/medsci.2017.3.291] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
46
|
Feature Subset Selection for Cancer Classification Using Weight Local Modularity. Sci Rep 2016; 6:34759. [PMID: 27703256 PMCID: PMC5050509 DOI: 10.1038/srep34759] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 09/19/2016] [Indexed: 11/27/2022] Open
Abstract
Microarray is recently becoming an important tool for profiling the global gene expression patterns of tissues. Gene selection is a popular technology for cancer classification that aims to identify a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers to obtain a high predictive accuracy. This technique has been extensively studied in recent years. This study develops a novel feature selection (FS) method for gene subset selection by utilizing the Weight Local Modularity (WLM) in a complex network, called the WLMGS. In the proposed method, the discriminative power of gene subset is evaluated by using the weight local modularity of a weighted sample graph in the gene subset where the intra-class distance is small and the inter-class distance is large. A higher local modularity of the gene subset corresponds to a greater discriminative of the gene subset. With the use of forward search strategy, a more informative gene subset as a group can be selected for the classification process. Computational experiments show that the proposed algorithm can select a small subset of the predictive gene as a group while preserving classification accuracy.
Collapse
|
47
|
Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study. PLoS One 2016; 11:e0160697. [PMID: 27500640 PMCID: PMC4976974 DOI: 10.1371/journal.pone.0160697] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 07/22/2016] [Indexed: 12/28/2022] Open
Abstract
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex.
Collapse
|
48
|
Zeng D, Peng J, Fong S, Qiu Y, Wong R, Mon YJ. WITHDRAWN: Sentiment prediction by text mining medical documents using optimized swarm search-based feature selection. Comput Med Imaging Graph 2016:S0895-6111(16)30074-X. [PMID: 27693005 DOI: 10.1016/j.compmedimag.2016.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Revised: 07/14/2016] [Accepted: 07/28/2016] [Indexed: 10/21/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Daohui Zeng
- First affiliated hospital of Guangzhou university of TCM, Guangzhou, China.
| | | | - Simon Fong
- Department of Computer and Information Science, University of Macau, Taipa, Macau.
| | - Yining Qiu
- Department of Computer and Information Science, University of Macau, Taipa, Macau.
| | - Raymond Wong
- School of Computer Science and Engineering, University of New South Wales, Sydney, Australia.
| | - Yi-Jen Mon
- Department of Information Engineering, Taoyuan Innovation Institute of Technology, Taoyuan City, Taiwan.
| |
Collapse
|
49
|
Aziz MAE, Hassanien AE. Modified cuckoo search algorithm with rough sets for feature selection. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2473-7] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Ma X, Chou CA, Sayama H, Chaovalitwongse WA. Brain response pattern identification of fMRI data using a particle swarm optimization-based approach. Brain Inform 2016; 3:181-192. [PMID: 27747594 PMCID: PMC4999570 DOI: 10.1007/s40708-016-0049-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 03/15/2016] [Indexed: 12/02/2022] Open
Abstract
Many neuroscience studies have been devoted to understand brain neural responses correlating to cognition using functional magnetic resonance imaging (fMRI). In contrast to univariate analysis to identify response patterns, it is shown that multi-voxel pattern analysis (MVPA) of fMRI data becomes a relatively effective approach using machine learning techniques in the recent literature. MVPA can be considered as a multi-objective pattern classification problem with the aim to optimize response patterns, in which informative voxels interacting with each other are selected, achieving high classification accuracy associated with cognitive stimulus conditions. To solve the problem, we propose a feature interaction detection framework, integrating hierarchical heterogeneous particle swarm optimization and support vector machines, for voxel selection in MVPA. In the proposed approach, we first select the most informative voxels and then identify a response pattern based on the connectivity of the selected voxels. The effectiveness of the proposed approach was examined for the Haxby’s dataset of object-level representations. The computational results demonstrated higher classification accuracy by the extracted response patterns, compared to state-of-the-art feature selection algorithms, such as forward selection and backward selection.
Collapse
Affiliation(s)
- Xinpei Ma
- Department of Systems Science & Industrial Engineering, Binghamton University, the State University of New York, Binghamton, USA
| | - Chun-An Chou
- Department of Systems Science & Industrial Engineering, Binghamton University, the State University of New York, Binghamton, USA
| | - Hiroki Sayama
- Department of Systems Science & Industrial Engineering, Binghamton University, the State University of New York, Binghamton, USA
| | - Wanpracha Art Chaovalitwongse
- Departments of Industrial & Systems Engineering, Department of Radiology, Integrated Brain Imaging Center, University of Washington, Seattle, USA
- Department of Radiology, Integrated Brain Imaging Center, University of Washington, Seattle, USA
| |
Collapse
|