1
|
Moslemi A, Naeini FB. Subspace learning using low-rank latent representation learning and perturbation theorem: Unsupervised gene selection. Comput Biol Med 2025; 185:109567. [PMID: 39675215 DOI: 10.1016/j.compbiomed.2024.109567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 08/12/2024] [Accepted: 12/10/2024] [Indexed: 12/17/2024]
Abstract
In recent years, gene expression data analysis has gained growing significance in the fields of machine learning and computational biology. Typically, microarray gene datasets exhibit a scenario where the number of features exceeds the number of samples, resulting in an ill-posed and underdetermined equation system. The presence of redundant features in high-dimensional data leads to suboptimal performance and increased computational time for learning algorithms. Although feature extraction and feature selection are two approaches that can be employed to deal with this challenge, feature selection has greater interpretability ability which causes it to receive more attention. In this study, we propose an unsupervised feature selection which is based on pseudo label latent representation learning and perturbation theory. In the first step, pseudo labels are extracted and constructed using latent representation learning. In the second step, the least square problem is solved for original data matrix and perturbed data matrix. Features are clustered based on the similarity between the original data matrix and the perturbed data matrix using k-means. In the last step, features in each subcluster are ranked based on information gain criterion. To showcase the efficacy of the proposed approach, numerical experiments were carried out on six benchmark microarray datasets and two RNA-Sequencing benchmark datasets. The outcomes indicate that the proposed technique surpasses eight state-of-the-art unsupervised feature selection methods in both clustering accuracy and normalized mutual information.
Collapse
Affiliation(s)
- Amir Moslemi
- Department of Physics, Toronto Metropolitan University, Ontario, Canada; School of Software Design & Data Science, Seneca Polytechnic, Toronto, ON, M4N 3M5, Canada; Physical Sciences, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
| | - Fariborz Baghaei Naeini
- Faculty of Engineering, Computing and the Environment, Kingston University, Penrhyn Road Campus, Kingston Upon Thames, London, KT1 2EE, UK
| |
Collapse
|
2
|
Wei Y, Ma J, Ma Z, Huang Y. Subspace Learning for Dual High-Order Graph Learning Based on Boolean Weight. ENTROPY (BASEL, SWITZERLAND) 2025; 27:107. [PMID: 40003104 PMCID: PMC11854825 DOI: 10.3390/e27020107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 01/18/2025] [Accepted: 01/21/2025] [Indexed: 02/27/2025]
Abstract
Subspace learning has achieved promising performance as a key technique for unsupervised feature selection. The strength of subspace learning lies in its ability to identify a representative subspace encompassing a cluster of features that are capable of effectively approximating the space of the original features. Nonetheless, most existing unsupervised feature selection methods based on subspace learning are constrained by two primary challenges. (1) Many methods only predominantly focus on the relationships between samples in the data space but ignore the correlated information between features in the feature space, which is unreliable for exploiting the intrinsic spatial structure. (2) Graph-based methods typically only take account of one-order neighborhood structures, neglecting high-order neighborhood structures inherent in original data, thereby failing to accurately preserve local geometric characteristics of the data. To pursue filling this gap in research, taking dual high-order graph learning into account, we propose a framework called subspace learning for dual high-order graph learning based on Boolean weight (DHBWSL). Firstly, a framework for unsupervised feature selection based on subspace learning is proposed, which is extended by dual-graph regularization to fully investigate geometric structure information on dual spaces. Secondly, the dual high-order graph is designed by embedding Boolean weights to learn a more extensive node from the original space such that the appropriate high-order adjacency matrix can be selected adaptively and flexibly. Experimental results on 12 public datasets demonstrate that the proposed DHBWSL outperforms the nine recent state-of-the-art algorithms.
Collapse
Affiliation(s)
- Yilong Wei
- School of Mathematics and Information Science, North Minzu University, Yinchuan 750021, China; (Y.W.); (Z.M.)
| | - Jinlin Ma
- School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China
| | - Ziping Ma
- School of Mathematics and Information Science, North Minzu University, Yinchuan 750021, China; (Y.W.); (Z.M.)
| | - Yulei Huang
- School of Mathematics and Statistics, Ningxia University, Yinchuan 750021, China;
| |
Collapse
|
3
|
Liao H, Chen H, Yin T, Yuan Z, Horng SJ, Li T. A general adaptive unsupervised feature selection with auto-weighting. Neural Netw 2025; 181:106840. [PMID: 39515083 DOI: 10.1016/j.neunet.2024.106840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 10/11/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
Feature selection (FS) is essential in machine learning and data mining as it makes handling high-dimensional data more efficient and reliable. More attention has been paid to unsupervised feature selection (UFS) due to the extra resources required to obtain labels for data in the real world. Most of the existing embedded UFS utilize a sparse projection matrix for FS. However, this may introduce additional regularization terms, and it is difficult to control the sparsity of the projection matrix well. Moreover, such methods may seriously destroy the original feature structure in the embedding space. Instead, avoiding projecting the original data into the low-dimensional embedding space and identifying features directly from the raw features that perform well in the process of making the data show a distinct cluster structure is a feasible solution. Inspired by this, this paper proposes a model called A General Adaptive Unsupervised Feature Selection with Auto-weighting (GAWFS), which utilizes two techniques, non-negative matrix factorization, and adaptive graph learning, to simulate the process of dividing data into clusters, and identifies the features that are most discriminative in the clustering process by a feature weighting matrix Θ. Since the weighting matrix is sparse, it also plays the role of FS or a filter. Finally, experiments comparing GAWFS with several state-of-the-art UFS methods on synthetic datasets and real-world datasets are conducted, and the results demonstrate the superiority of the GAWFS.
Collapse
Affiliation(s)
- Huming Liao
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| | - Hongmei Chen
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| | - Tengyu Yin
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| | - Zhong Yuan
- College of Computer Science, Sichuan University, Chengdu 610065, China.
| | - Shi-Jinn Horng
- Department of Computer Science and Information Engineering, Asia University, Taichung 41354, Taiwan; Department of Medical Research, China Medical University Hospital, China Medical University, Taichung 404327, Taiwan.
| | - Tianrui Li
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu, 611756, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Southwest Jiaotong University, Chengdu 611756, China.
| |
Collapse
|
4
|
Liu R, Fang R, Zeng T, Fei H, Qi Q, Zuo P, Xu L, Liu W. A Novel Adaptive Sand Cat Swarm Optimization Algorithm for Feature Selection and Global Optimization. Biomimetics (Basel) 2024; 9:701. [PMID: 39590273 PMCID: PMC11591711 DOI: 10.3390/biomimetics9110701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/12/2024] [Accepted: 11/12/2024] [Indexed: 11/28/2024] Open
Abstract
Feature selection (FS) constitutes a critical stage within the realms of machine learning and data mining, with the objective of eliminating irrelevant features while guaranteeing model accuracy. Nevertheless, in datasets featuring a multitude of features, choosing the optimal feature poses a significant challenge. This study presents an enhanced Sand Cat Swarm Optimization algorithm (MSCSO) to improve the feature selection process, augmenting the algorithm's global search capacity and convergence rate via multiple innovative strategies. Specifically, this study devised logistic chaotic mapping and lens imaging reverse learning approaches for population initialization to enhance population diversity; balanced global exploration and local development capabilities through nonlinear parameter processing; and introduced a Weibull flight strategy and triangular parade strategy to optimize individual position updates. Additionally, the Gaussian-Cauchy mutation strategy was employed to improve the algorithm's ability to overcome local optima. The experimental results demonstrate that MSCSO performs well on 65.2% of the test functions in the CEC2005 benchmark test; on the 15 datasets of UCI, MSCSO achieved the best average fitness in 93.3% of the datasets and achieved the fewest feature selections in 86.7% of the datasets while attaining the best average accuracy across 100% of the datasets, significantly outperforming other comparative algorithms.
Collapse
Affiliation(s)
- Ruru Liu
- College of Information Science and Technology, Shihezi University, Shihezi 832000, China
| | - Rencheng Fang
- College of Information Science and Technology, Shihezi University, Shihezi 832000, China
| | - Tao Zeng
- College of Information Science and Technology, Shihezi University, Shihezi 832000, China
| | - Hongmei Fei
- College of Information Science and Technology, Shihezi University, Shihezi 832000, China
| | - Quan Qi
- College of Information Science and Technology, Shihezi University, Shihezi 832000, China
| | - Pengxiang Zuo
- College of Medicine, Shihezi University, Shihezi 832000, China
| | - Liping Xu
- College of Science, Shihezi University, Shihezi 832000, China
| | - Wei Liu
- College of Medicine, Shihezi University, Shihezi 832000, China
| |
Collapse
|
5
|
Yang X, Che H, Leung MF, Wen S. Self-paced regularized adaptive multi-view unsupervised feature selection. Neural Netw 2024; 175:106295. [PMID: 38614023 DOI: 10.1016/j.neunet.2024.106295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/14/2024] [Accepted: 04/05/2024] [Indexed: 04/15/2024]
Abstract
Multi-view unsupervised feature selection (MUFS) is an efficient approach for dimensional reduction of heterogeneous data. However, existing MUFS approaches mostly assign the samples the same weight, thus the diversity of samples is not utilized efficiently. Additionally, due to the presence of various regularizations, the resulting MUFS problems are often non-convex, making it difficult to find the optimal solutions. To address this issue, a novel MUFS method named Self-paced Regularized Adaptive Multi-view Unsupervised Feature Selection (SPAMUFS) is proposed. Specifically, the proposed approach firstly trains the MUFS model with simple samples, and gradually learns complex samples by using self-paced regularizer. l2,p-norm (0
Collapse
Affiliation(s)
- Xuanhao Yang
- College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China.
| | - Hangjun Che
- College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China; Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Chongqing, 400715, China.
| | - Man-Fai Leung
- School of Computing and Information Science, Faculty of Science and Engineering, Anglia Ruskin University, Cambridge, UK.
| | - Shiping Wen
- Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW 2007, Australia.
| |
Collapse
|
6
|
Zhang H, Lin J, Zhou L, Shen J, Sheng W. Facial age recognition based on deep manifold learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:4485-4500. [PMID: 38549337 DOI: 10.3934/mbe.2024198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Facial age recognition has been widely used in real-world applications. Most of current facial age recognition methods use deep learning to extract facial features to identify age. However, due to the high dimension features of faces, deep learning methods might extract a lot of redundant features, which is not beneficial for facial age recognition. To improve facial age recognition effectively, this paper proposed the deep manifold learning (DML), a combination of deep learning and manifold learning. In DML, deep learning was used to extract high-dimensional facial features, and manifold learning selected age-related features from these high-dimensional facial features for facial age recognition. Finally, we validated the DML on Multivariate Observations of Reactions and Physical Health (MORPH) and Face and Gesture Recognition Network (FG-NET) datasets. The results indicated that the mean absolute error (MAE) of MORPH is 1.60 and that of FG-NET is 2.48. Moreover, compared with the state of the art facial age recognition methods, the accuracy of DML has been greatly improved.
Collapse
Affiliation(s)
- Huiying Zhang
- Pujiang Institute, Nanjing Tech University, Nanjing 211200, China
| | - Jiayan Lin
- Pujiang Institute, Nanjing Tech University, Nanjing 211200, China
| | - Lan Zhou
- Pujiang Institute, Nanjing Tech University, Nanjing 211200, China
| | - Jiahui Shen
- Pujiang Institute, Nanjing Tech University, Nanjing 211200, China
| | - Wenshun Sheng
- Pujiang Institute, Nanjing Tech University, Nanjing 211200, China
| |
Collapse
|