1
|
Peng L, Cai Z, Heidari AA, Zhang L, Chen H. Hierarchical Harris hawks optimizer for feature selection. J Adv Res 2023; 53:261-278. [PMID: 36690206 PMCID: PMC10658428 DOI: 10.1016/j.jare.2023.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/12/2022] [Accepted: 01/14/2023] [Indexed: 01/21/2023] Open
Abstract
INTRODUCTION The main feature selection methods include filter, wrapper-based, and embedded methods. Because of its characteristics, the wrapper method must include a swarm intelligence algorithm, and its performance in feature selection is closely related to the algorithm's quality. Therefore, it is essential to choose and design a suitable algorithm to improve the performance of the feature selection method based on the wrapper. Harris hawks optimization (HHO) is a superb optimization approach that has just been introduced. It has a high convergence rate and a powerful global search capability but it has an unsatisfactory optimization effect on high dimensional problems or complex problems. Therefore, we introduced a hierarchy to improve HHO's ability to deal with complex problems and feature selection. OBJECTIVES To make the algorithm obtain good accuracy with fewer features and run faster in feature selection, we improved HHO and named it EHHO. On 30 UCI datasets, the improved HHO (EHHO) can achieve very high classification accuracy with less running time and fewer features. METHODS We first conducted extensive experiments on 23 classical benchmark functions and compared EHHO with many state-of-the-art metaheuristic algorithms. Then we transform EHHO into binary EHHO (bEHHO) through the conversion function and verify the algorithm's ability in feature extraction on 30 UCI data sets. RESULTS Experiments on 23 benchmark functions show that EHHO has better convergence speed and minimum convergence than other peers. At the same time, compared with HHO, EHHO can significantly improve the weakness of HHO in dealing with complex functions. Moreover, on 30 datasets in the UCI repository, the performance of bEHHO is better than other comparative optimization algorithms. CONCLUSION Compared with the original bHHO, bEHHO can achieve excellent classification accuracy with fewer features and is also better than bHHO in running time.
Collapse
Affiliation(s)
- Lemin Peng
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
| | - Zhennao Cai
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
| | - Ali Asghar Heidari
- School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran.
| | - Lejun Zhang
- Cyberspace Institute Advanced Technology, Guangzhou University, Guangzhou 510006, China; College of Information Engineering, Yangzhou University, Yangzhou 225127, China; Research and Development Center for E-Learning , Ministry of Education, Beijing 100039, China.
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
| |
Collapse
|
2
|
Liu J, Yang S, Zhang H, Sun Z, Du J. Online Multi-Label Streaming Feature Selection Based on Label Group Correlation and Feature Interaction. Entropy (Basel) 2023; 25:1071. [PMID: 37510018 PMCID: PMC10377943 DOI: 10.3390/e25071071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/10/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Multi-label streaming feature selection has received widespread attention in recent years because the dynamic acquisition of features is more in line with the needs of practical application scenarios. Most previous methods either assume that the labels are independent of each other, or, although label correlation is explored, the relationship between related labels and features is difficult to understand or specify. In real applications, both situations may occur where the labels are correlated and the features may belong specifically to some labels. Moreover, these methods treat features individually without considering the interaction between features. Based on this, we present a novel online streaming feature selection method based on label group correlation and feature interaction (OSLGC). In our design, we first divide labels into multiple groups with the help of graph theory. Then, we integrate label weight and mutual information to accurately quantify the relationships between features under different label groups. Subsequently, a novel feature selection framework using sliding windows is designed, including online feature relevance analysis and online feature interaction analysis. Experiments on ten datasets show that the proposed method outperforms some mature MFS algorithms in terms of predictive performance, statistical analysis, stability analysis, and ablation experiments.
Collapse
Affiliation(s)
- Jinghua Liu
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Songwei Yang
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Hongbo Zhang
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Zhenzhen Sun
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| | - Jixiang Du
- Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
- Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China
- Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen 361021, China
| |
Collapse
|
3
|
Senthilkumar D, Reshmy A, Paulraj S. Dimensionality reduction strategy for Multi-Target Regression paradigm. IFS 2023. [DOI: 10.3233/jifs-220412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Multi-Target Regression (MTR) is used to study the relationship between the same set of input variables and multiple continuous target variables simultaneously. A dataset with many input and output variables is the prime issue to address in the MTR, which is computationally complex to build a prediction model. Also, dimensionality reduction from multiple target variables is a challenging and essential task that aims to reduce the size of the dataset to optimize the time complexity of analysis and remove the redundant and irrelevant variables. This paper proposes an efficient feature selection strategy, Multi-Target Feature Subset Selection (MTFSS), for MTR that constructs a unique subset of features by considering multiple targets. On the other hand, two feature evaluators, correlation and ReliefF, support the MTR dataset without discretization. Furthermore, two new score functions, weighted mean aggregation strategy and threshold function, are introduced to identify the significant features. To evaluate the effectiveness of the proposed MTFSS, experiments were carried out on a benchmark dataset. The experimental results demonstrate that the proposed MTFSS can select fewer features and perform better than the original dataset results. Also, the correlation-based feature evaluator performs better than ReliefF with better performance.
Collapse
Affiliation(s)
- D. Senthilkumar
- Department of Computer Science and Engineering, University College of Engineering, Anna University, Tiruchirappalli, Tamil Nadu, India
| | - A.K. Reshmy
- Department of Computational Intelligence, School of Computing, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur Campus, Chengalpattu, Tamil Nadu, India
| | - S. Paulraj
- Department of Mathematics, College of Engineering Guindy Campus, Anna University, Chennai, Tamil Nadu, India
| |
Collapse
|
4
|
|
5
|
Abstract
Multi-label text classification refers to a text divided into multiple categories simultaneously, which corresponds to a text associated with multiple topics in the real world. The feature space generated by text data has the characteristics of high dimensionality and sparsity. Feature selection is an efficient technology that removes useless and redundant features, reduces the dimension of the feature space, and avoids dimension disaster. A feature selection method for multi-label text based on feature importance is proposed in this paper. Firstly, multi-label texts are transformed into single-label texts using the label assignment method. Secondly, the importance of each feature is calculated using the method based on Category Contribution (CC). Finally, features with higher importance are selected to construct the feature space. In the proposed method, the feature importance is calculated from the perspective of the category, which ensures the selected features have strong category discrimination ability. Specifically, the contributions of the features to each category from two aspects of inter-category and intra-category are calculated, then the importance of the features is obtained with the combination of them. The proposed method is tested on six public data sets and the experimental results are good, which demonstrates the effectiveness of the proposed method.
Collapse
|
6
|
Li L, Luo Q, Xiao W, Li J, Zhou S, Li Y, Zheng X, Yang H. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features. J Bioinform Comput Biol 2016; 15:1650025. [PMID: 27411307 DOI: 10.1142/s0219720016500256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.
Collapse
Affiliation(s)
- Liqi Li
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Qifa Luo
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Weidong Xiao
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Jinhui Li
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Shiwen Zhou
- † National Drug Clinical Trial Institution, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Yongsheng Li
- ‡ Institute of Cancer, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Xiaoqi Zheng
- § Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| | - Hua Yang
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| |
Collapse
|
7
|
Zhu F, Wang X, Zhu D, Liu Y. A Supervised Requirement-oriented Patent Classification Scheme Based on the Combination of Metadata and Citation Information. INT J COMPUT INT SYS 2015. [DOI: 10.1080/18756891.2015.1023588] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
8
|
Abstract
Background As a high dimensional problem, analysis of microarray data sets is a challenging task, where many weakly relevant or redundant features affect overall performance of classifiers. Methods The previous works used redundant feature detection methods to select discriminative compact gene set, which only considered the relationship among features, not the redundancy of classification ability among features. This study propose a novel algorithm named RESI (Redundant fEature Selection depending on Instance), which considers label information in the measure of feature subset redundancy. Results Experimental results on benchmark data sets show that RESI performs better than the previous state-of-the-art algorithms on redundant feature selection methods like mRMR. Conclusions We propose an effective supervised redundant feature detection method for tumor classification.
Collapse
|