1
|
Ruan J, Wang M, Liu D, Chen M, Gao X. Multi-Label Feature Selection with Feature-Label Subgraph Association and Graph Representation Learning. ENTROPY (BASEL, SWITZERLAND) 2024; 26:992. [PMID: 39593936 PMCID: PMC11592953 DOI: 10.3390/e26110992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2024] [Revised: 11/06/2024] [Accepted: 11/16/2024] [Indexed: 11/28/2024]
Abstract
In multi-label data, a sample is associated with multiple labels at the same time, and the computational complexity is manifested in the high-dimensional feature space as well as the interdependence and unbalanced distribution of labels, which leads to challenges regarding feature selection. As a result, a multi-label feature selection method based on feature-label subgraph association with graph representation learning (SAGRL) is proposed to represent the complex correlations of features and labels, especially the relationships between features and labels. Specifically, features and labels are mapped to nodes in the graph structure, and the connections between nodes are established to form feature and label sets, respectively, which increase intra-class correlation and decrease inter-class correlation. Further, feature-label subgraphs are constructed by feature and label sets to provide abundant feature combinations. The relationship between each subgraph is adjusted by graph representation learning, the crucial features in different label sets are selected, and the optimal feature subset is obtained by ranking. Experimental studies on 11 datasets show the superior performance of the proposed method with six evaluation metrics over some state-of-the-art multi-label feature selection methods.
Collapse
Affiliation(s)
- Jinghou Ruan
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China; (J.R.); (D.L.)
| | - Mingwei Wang
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China; (J.R.); (D.L.)
| | - Deqing Liu
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China; (J.R.); (D.L.)
| | - Maolin Chen
- School of Smart City, Chongqing Jiaotong University, Chongqing 400074, China;
| | - Xianjun Gao
- School of Geosciences, Yangtze University, Wuhan 430100, China;
| |
Collapse
|
2
|
Peng C, Dai C, Xue X. A Many-Objective Evolutionary Algorithm Based on Dual Selection Strategy. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1015. [PMID: 37509962 PMCID: PMC10378021 DOI: 10.3390/e25071015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/30/2023]
Abstract
In high-dimensional space, most multi-objective optimization algorithms encounter difficulties in solving many-objective optimization problems because they cannot balance convergence and diversity. As the number of objectives increases, the non-dominated solutions become difficult to distinguish while challenging the assessment of diversity in high-dimensional objective space. To reduce selection pressure and improve diversity, this article proposes a many-objective evolutionary algorithm based on dual selection strategy (MaOEA/DS). First, a new distance function is designed as an effective distance metric. Then, based distance function, a point crowding-degree (PC) strategy, is proposed to further enhance the algorithm's ability to distinguish superior solutions in population. Finally, a dual selection strategy is proposed. In the first selection, the individuals with the best convergence are selected from the top few individuals with good diversity in the population, focusing on population convergence. In the second selection, the PC strategy is used to further select individuals with larger crowding distance values, emphasizing population diversity. To extensively evaluate the performance of the algorithm, this paper compares the proposed algorithm with several state-of-the-art algorithms. The experimental results show that MaOEA/DS outperforms other comparison algorithms in overall performance, indicating the effectiveness of the proposed algorithm.
Collapse
Affiliation(s)
- Cheng Peng
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Cai Dai
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Xingsi Xue
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou 350118, China
| |
Collapse
|
3
|
Wang R, Wang H, Shi L, Han C, He Q, Che Y, Luo L. A novel framework of MOPSO-GDM in recognition of Alzheimer's EEG-based functional network. Front Aging Neurosci 2023; 15:1160534. [PMID: 37455939 PMCID: PMC10339813 DOI: 10.3389/fnagi.2023.1160534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/13/2023] [Indexed: 07/18/2023] Open
Abstract
Background Most patients with Alzheimer's disease (AD) have an insidious onset and frequently atypical clinical symptoms, which are considered a normal consequence of aging, making it difficult to diagnose AD medically. But then again, accurate diagnosis is critical to prevent degeneration and provide early treatment for AD patients. Objective This study aims to establish a novel EEG-based classification framework with deep learning methods for AD recognition. Methods First, considering the network interactions in different frequency bands (δ, θ, α, β, and γ), multiplex networks are reconstructed by the phase synchronization index (PSI) method, and fourteen topology features are extracted subsequently, forming a high-dimensional feature vector. However, in feature combination, not all features can provide effective information for recognition. Moreover, combining features by manual selection is time-consuming and laborious. Thus, a feature selection optimization algorithm called MOPSO-GDM was proposed by combining multi-objective particle swarm optimization (MOPSO) algorithm with Gaussian differential mutation (GDM) algorithm. In addition to considering the classification error rates of support vector machine, naive bayes, and discriminant analysis classifiers, our algorithm also considers distance measure as an optimization objective. Results Finally, this method proposed achieves an excellent classification error rate of 0.0531 (5.31%) with the feature vector size of 8, by a ten-fold cross-validation strategy. Conclusion These findings show that our framework can adaptively combine the best brain network features to explore network synchronization, functional interactions, and characterize brain functional abnormalities, which can improve the recognition efficiency of diseases. While improving the classification accuracy of application algorithms, we aim to expand our understanding of the brain function of patients with neurological disorders through the analysis of brain networks.
Collapse
Affiliation(s)
- Ruofan Wang
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Haodong Wang
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Lianshuan Shi
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Chunxiao Han
- Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Qiguang He
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Yanqiu Che
- Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Li Luo
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| |
Collapse
|
4
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
5
|
Li Y, Zhang Y, Hu W. Adaptive multi-objective particle swarm optimization based on virtual Pareto front. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
6
|
Chamlal H, Ouaderhman T, Aaboub F. A graph based preordonnances theoretic supervised feature selection in high dimensional data. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Qin Y, Wu J, Xiao W, Wang K, Huang A, Liu B, Yu J, Li C, Yu F, Ren Z. Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192215027. [PMID: 36429751 PMCID: PMC9690067 DOI: 10.3390/ijerph192215027] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/04/2022] [Accepted: 11/10/2022] [Indexed: 06/01/2023]
Abstract
The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999-2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.
Collapse
Affiliation(s)
- Yifan Qin
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jinlong Wu
- College of Physical Education, Southwest University, Chongqing 400715, China
| | - Wen Xiao
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Kun Wang
- Physical Education College, Yanching Institute of Technology, Langfang 065201, China
| | - Anbing Huang
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Bowen Liu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jingxuan Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Chuhao Li
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Fengyu Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Zhanbing Ren
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| |
Collapse
|
8
|
Li Y, Feng X, Yu H. A constrained multiobjective evolutionary algorithm with the two-archive weak cooperation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
9
|
Li T, Zhan ZH, Xu JC, Yang Q, Ma YY. A binary individual search strategy-based bi-objective evolutionary algorithm for high-dimensional feature selection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.183] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
10
|
MICQ-IPSO: An effective two-stage hybrid feature selection algorithm for high-dimensional data. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.048] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
11
|
Adaptive Multistrategy Ensemble Particle Swarm Optimization with Signal-to-Noise Ratio Distance Metric. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
12
|
Han F, Wang T, Ling Q. An improved feature selection method based on angle-guided multi-objective PSO and feature-label mutual information. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03465-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Rashno A, Shafipour M, Fadaei S. Particle ranking: An Efficient Method for Multi-Objective Particle Swarm Optimization Feature Selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
14
|
Liu W, Wang J. Recursive elimination current algorithms and a distributed computing scheme to accelerate wrapper feature selection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Alrefai N, Ibrahim O. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07147-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
16
|
Wang Y, Wong KC, Li X. Exploring high-throughput biomolecular data with multiobjective robust continuous clustering. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.11.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
17
|
Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H. An efficient Pareto-based feature selection algorithm for multi-label classification. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.09.052] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
18
|
Jiang Z, Zhang Y, Wang J. A multi-surrogate-assisted dual-layer ensemble feature selection algorithm. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
19
|
Xu Z, Shen D, Kou Y, Nie T. A hybrid feature selection algorithm combining ReliefF and Particle swarm optimization for high-dimensional medical data. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.
Collapse
Affiliation(s)
- Zhaozhao Xu
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Derong Shen
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Yue Kou
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| | - Tiezheng Nie
- School of Computer Science and Engineering, Northeastern University, Shenyang, China
| |
Collapse
|
20
|
Feature subset selection via an improved discretization-based particle swarm optimization. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106794] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|