1
|
Hu Z, Wang J, Zhang K, Pedrycz W, Pal NR. Bi-Level Spectral Feature Selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6597-6611. [PMID: 38896511 DOI: 10.1109/tnnls.2024.3408208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Unsupervised feature selection (UFS) aims to learn an indicator matrix relying on some characteristics of the high-dimensional data to identify the features to be selected. However, traditional unsupervised methods perform only at the feature level, i.e., they directly select useful features by feature ranking. Such methods do not pay any attention to the interaction information with other tasks such as classification, which severely degrades their feature selection performance. In this article, we propose an UFS method which also takes into account the classification level, and selects features that perform well both in clustering and classification. To achieve this, we design a bi-level spectral feature selection (BLSFS) method, which combines classification level and feature level. More concretely, at the classification level, we first apply the spectral clustering to generate pseudolabels, and then train a linear classifier to obtain the optimal regression matrix. At the feature level, we select useful features via maintaining the intrinsic structure of data in the embedding space with the learned regression matrix from the classification level, which in turn guides classifier training. We utilize a balancing parameter to seamlessly bridge the classification and feature levels together to construct a unified framework. A series of experiments on 12 benchmark datasets are carried out to demonstrate the superiority of BLSFS in both clustering and classification performance.
Collapse
|
2
|
Shang R, Zhong J, Zhang W, Xu S, Li Y. Multilabel Feature Selection via Shared Latent Sublabel Structure and Simultaneous Orthogonal Basis Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5288-5303. [PMID: 38656846 DOI: 10.1109/tnnls.2024.3382911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Multilabel feature selection solves the dimension distress of high-dimensional multilabel data by selecting the optimal subset of features. Noisy and incomplete labels of raw multilabel data hinder the acquisition of label-guided information. In existing approaches, mapping the label space to a low-dimensional latent space by semantic decomposition to mitigate label noise is considered an effective strategy. However, the decomposed latent label space contains redundant label information, which misleads the capture of potential label relevance. To eliminate the effect of redundant information on the extraction of latent label correlations, a novel method named SLOFS via shared latent sublabel structure and simultaneous orthogonal basis clustering for multilabel feature selection is proposed. First, a latent orthogonal base structure shared (LOBSS) term is engineered to guide the construction of a redundancy-free latent sublabel space via the separated latent clustering center structure. The LOBSS term simultaneously retains latent sublabel information and latent clustering center structure. Moreover, the structure and relevance information of nonredundant latent sublabels are fully explored. The introduction of graph regularization ensures structural consistency in the data space and latent sublabels, thus helping the feature selection process. SLOFS employs a dynamic sublabel graph to obtain a high-quality sublabel space and uses regularization to constrain label correlations on dynamic sublabel projections. Finally, an effective convergence provable optimization scheme is proposed to solve the SLOFS method. The experimental studies on the 18 datasets demonstrate that the presented method performs consistently better than previous feature selection methods.
Collapse
|
3
|
Liao H, Chen H, Yin T, Yuan Z, Horng SJ, Li T. A general adaptive unsupervised feature selection with auto-weighting. Neural Netw 2025; 181:106840. [PMID: 39515083 DOI: 10.1016/j.neunet.2024.106840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 10/11/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
Feature selection (FS) is essential in machine learning and data mining as it makes handling high-dimensional data more efficient and reliable. More attention has been paid to unsupervised feature selection (UFS) due to the extra resources required to obtain labels for data in the real world. Most of the existing embedded UFS utilize a sparse projection matrix for FS. However, this may introduce additional regularization terms, and it is difficult to control the sparsity of the projection matrix well. Moreover, such methods may seriously destroy the original feature structure in the embedding space. Instead, avoiding projecting the original data into the low-dimensional embedding space and identifying features directly from the raw features that perform well in the process of making the data show a distinct cluster structure is a feasible solution. Inspired by this, this paper proposes a model called A General Adaptive Unsupervised Feature Selection with Auto-weighting (GAWFS), which utilizes two techniques, non-negative matrix factorization, and adaptive graph learning, to simulate the process of dividing data into clusters, and identifies the features that are most discriminative in the clustering process by a feature weighting matrix Θ. Since the weighting matrix is sparse, it also plays the role of FS or a filter. Finally, experiments comparing GAWFS with several state-of-the-art UFS methods on synthetic datasets and real-world datasets are conducted, and the results demonstrate the superiority of the GAWFS.
Collapse
Affiliation(s)
- Huming Liao
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| | - Hongmei Chen
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| | - Tengyu Yin
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| | - Zhong Yuan
- College of Computer Science, Sichuan University, Chengdu 610065, China.
| | - Shi-Jinn Horng
- Department of Computer Science and Information Engineering, Asia University, Taichung 41354, Taiwan; Department of Medical Research, China Medical University Hospital, China Medical University, Taichung 404327, Taiwan.
| | - Tianrui Li
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| |
Collapse
|
4
|
Chen B, Guan J, Li Z. Unsupervised Feature Selection via Graph Regularized Nonnegative CP Decomposition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2582-2594. [PMID: 35298373 DOI: 10.1109/tpami.2022.3160205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Unsupervised feature selection has attracted remarkable attention recently. With the development of data acquisition technology, multi-dimensional tensor data has been appeared in enormous real-world applications. However, most existing unsupervised feature selection methods are non-tensor-based which results the vectorization of tensor data as a preprocessing step. This seemingly ordinary operation has led to an unnecessary loss of the multi-dimensional structural information and eventually restricted the quality of the selected features. To overcome the limitation, in this paper, we propose a novel unsupervised feature selection model: Nonnegative tensor CP (CANDECOMP/PARAFAC) decomposition based unsupervised feature selection, CPUFS for short. In specific, we devise new tensor-oriented linear classifier and feature selection matrix for CPUFS. In addition, CPUFS simultaneously conducts graph regularized nonnegative CP decomposition and newly-designed tensor-oriented pseudo label regression and feature selection to fully preserve the multi-dimensional data structure. To solve the CPUFS model, we propose an efficient iterative optimization algorithm with theoretically guaranteed convergence, whose computational complexity scales linearly in the number of features. A variation of the CPUFS model by incorporating nonnegativity into the linear classifier, namely CPUFSnn, is also proposed and studied. Experimental results on ten real-world benchmark datasets demonstrate the effectiveness of both CPUFS and CPUFSnn over the state-of-the-arts.
Collapse
|