1
|
Dai Z, Hu L, Sun H. Robust generalized PCA for enhancing discriminability and recoverability. Neural Netw 2025; 181:106814. [PMID: 39447431 DOI: 10.1016/j.neunet.2024.106814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 10/05/2024] [Accepted: 10/11/2024] [Indexed: 10/26/2024]
Abstract
The dependency of low-dimensional embedding to principal component space seriously limits the effectiveness of existing robust principal component analysis (PCA) algorithms. Simply projecting the original sample coordinates onto orthogonal principal component directions may not effectively address various noise-corrupted scenarios, impairing both discriminability and recoverability. Our method addresses this issue through a generalized PCA (GPCA), which optimizes regression bias rather than sample mean, leading to more adaptable properties. And, we propose a robust GPCA model with joint loss and regularization based on the ℓ2,μ norm and ℓ2,ν norms, respectively. This approach not only mitigates sensitivity to outliers but also enhances feature extraction and selection flexibility. Additionally, we introduce a truncated and reweighted loss strategy, where truncation eliminates severely deviated outliers, and reweighting prioritizes the remaining samples. These innovations collectively improve the GPCA model's performance. To solve the proposed model, we propose a non-greedy iterative algorithm and theoretically guarantee the convergence. Experimental results demonstrate that the proposed GPCA model outperforms the previous robust PCA models in both recoverability and discrimination.
Collapse
Affiliation(s)
- Zhenlei Dai
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Liangchen Hu
- School of Computer and Information, Anhui Normal University, Wuhu 241002, China
| | - Huaijiang Sun
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
| |
Collapse
|
2
|
Wang J, Xie F, Nie F, Li X. Generalized and Robust Least Squares Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7006-7020. [PMID: 36264726 DOI: 10.1109/tnnls.2022.3213594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
As a simple yet effective method, least squares regression (LSR) is extensively applied for data regression and classification. Combined with sparse representation, LSR can be extended to feature selection (FS) as well, in which l1 regularization is often applied in embedded FS algorithms. However, because the loss function is in the form of squared error, LSR and its variants are sensitive to noises, which significantly degrades the effectiveness and performance of classification and FS. To cope with the problem, we propose a generalized and robust LSR (GRLSR) for classification and FS, which is made up of arbitrary concave loss function and the l2,p -norm regularization term. Meanwhile, an iterative algorithm is applied to efficiently deal with the nonconvex minimization problem, in which an additional weight to suppress the effect of noises is added to each data point. The weights can be automatically assigned according to the error of the samples. When the error is large, the value of the corresponding weight is small. It is this mechanism that allows GRLSR to reduce the impact of noises and outliers. According to the different formulations of the concave loss function, four specific methods are proposed to clarify the essence of the framework. Comprehensive experiments on corrupted datasets have proven the advantage of the proposed method.
Collapse
|
3
|
Wang R, Bian J, Nie F, Li X. Nonlinear Feature Selection Neural Network via Structured Sparse Regularization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9493-9505. [PMID: 36395136 DOI: 10.1109/tnnls.2022.3209716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Feature selection is an important and effective data preprocessing method, which can remove the noise and redundant features while retaining the relevant and discriminative features in high-dimensional data. In real-world applications, the relationships between data samples and their labels are usually nonlinear. However, most of the existing feature selection models focus on learning a linear transformation matrix, which cannot capture such a nonlinear structure in practice and will degrade the performance of downstream tasks. To address the issue, we propose a novel nonlinear feature selection method to select those most relevant and discriminative features in high-dimensional dataset. Specifically, our method learns the nonlinear structure of high-dimensional data by a neural network with cross entropy loss function, and then using the structured sparsity norm such as l2,p -norm to regularize the weights matrix connecting the input layer and the first hidden layer of the neural network model to learn weight of each feature. Therefore, a structural sparse weights matrix is obtained by conducting nonlinear learning based on a neural network with structured sparsity regularization. Then, we use the gradient descent method to achieve the optimal solution of the proposed model. Evaluating the experimental results on several synthetic datasets and real-world datasets shows the effectiveness and superiority of the proposed nonlinear feature selection model.
Collapse
|
4
|
Lai Z, Chen X, Zhang J, Kong H, Wen J. Maximal Margin Support Vector Machine for Feature Representation and Classification. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:6700-6713. [PMID: 37018685 DOI: 10.1109/tcyb.2022.3232800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
High-dimensional small sample size data, which may lead to singularity in computation, are becoming increasingly common in the field of pattern recognition. Moreover, it is still an open problem how to extract the most suitable low-dimensional features for the support vector machine (SVM) and simultaneously avoid singularity so as to enhance the SVM's performance. To address these problems, this article designs a novel framework that integrates the discriminative feature extraction and sparse feature selection into the support vector framework to make full use of the classifiers' characteristics to find the optimal/maximal classification margin. As such, the extracted low-dimensional features from high-dimensional data are more suitable for SVM to obtain good performance. Thus, a novel algorithm, called the maximal margin SVM (MSVM), is proposed to achieve this goal. An alternatively iterative learning strategy is adopted in MSVM to learn the optimal discriminative sparse subspace and the corresponding support vectors. The mechanism and the essence of the designed MSVM are revealed. The computational complexity and convergence are also analyzed and validated. Experimental results on some well-known databases (including breastmnist, pneumoniamnist, colon-cancer, etc.) show the great potential of MSVM against classical discriminant analysis methods and SVM-related methods, and the codes can be available on https://www.scholat.com/laizhihui.
Collapse
|
5
|
Xia S, Zheng S, Wang G, Gao X, Wang B. Granular Ball Sampling for Noisy Label Classification or Imbalanced Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2144-2155. [PMID: 34460405 DOI: 10.1109/tnnls.2021.3105984] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article presents a general sampling method, called granular-ball sampling (GBS), for classification problems by introducing the idea of granular computing. The GBS method uses some adaptively generated hyperballs to cover the data space, and the points on the hyperballs constitute the sampled data. GBS is the first sampling method that not only reduces the data size but also improves the data quality in noisy label classification. In addition, because the GBS method can be used to exactly describe the boundary, it can obtain almost the same classification accuracy as the results on the original datasets, and it can obtain an obviously higher classification accuracy than random sampling. Therefore, for the data reduction classification task, GBS is a general method that is not especially restricted by any specific classifier or dataset. Moreover, the GBS can be effectively used as an undersampling method for imbalanced classification. It has a time complexity that is close to O( N ), so it can accelerate most classifiers. These advantages make GBS powerful for improving the performance of classifiers. All codes have been released in the open source GBS library at http://www.cquptshuyinxia.com/GBS.html.
Collapse
|
6
|
Wei P, Zhang X. Feature extraction of linear separability using robust autoencoder with distance metric. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-223017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
This paper proposes a robust autoencoder with Wasserstein distance metric to extract the linear separability features from the input data. To minimize the difference between the reconstructed feature space and the original feature space, using Wasserstein distance realizes a homeomorphic transformation of the original feature space, i.e., the so-called the reconstruction of feature space. The autoencoder is used for features extraction of linear separability in the reconstructed feature space. Experiment results on real datasets show that the proposed method reaches up 0.9777 and 0.7112 on the low-dimensional and high-dimensional datasets in extracted accuracies, respectively, and also outperforms competitors. Results also confirm that compared with feature metric-based methods and deep network architectures-based method, the linear separabilities of those features extracted by distance metric-based methods win over them. More importantly, the linear separabilities of those features obtained by evaluating distance similarity of the data are better than those obtained by evaluating feature importance of data. We also demonstrate that the data distribution in the feature space reconstructed by a homeomorphic transformation can be closer to the original data distribution.
Collapse
Affiliation(s)
- Pingping Wei
- School of Intelligent Science and Engineering, Yunnan Technology and Business University, Yunnan, China
| | - Xin Zhang
- School of Intelligent Science and Engineering, Yunnan Technology and Business University, Yunnan, China
| |
Collapse
|
7
|
Zhu J, Chen J, Xu B, Yang H, Nie F. Fast Orthogonal Locality-preserving Projections for Unsupervised Feature Selection. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
8
|
Zhong G, Pun CM. Local Learning-based Multi-task Clustering. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
9
|
Fan M, Zhang X, Hu J, Gu N, Tao D. Adaptive Data Structure Regularized Multiclass Discriminative Feature Selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5859-5872. [PMID: 33882003 DOI: 10.1109/tnnls.2021.3071603] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Feature selection (FS), which aims to identify the most informative subset of input features, is an important approach to dimensionality reduction. In this article, a novel FS framework is proposed for both unsupervised and semisupervised scenarios. To make efficient use of data distribution to evaluate features, the framework combines data structure learning (as referred to as data distribution modeling) and FS in a unified formulation such that the data structure learning improves the results of FS and vice versa. Moreover, two types of data structures, namely the soft and hard data structures, are learned and used in the proposed FS framework. The soft data structure refers to the pairwise weights among data samples, and the hard data structure refers to the estimated labels obtained from clustering or semisupervised classification. Both of these data structures are naturally formulated as regularization terms in the proposed framework. In the optimization process, the soft and hard data structures are learned from data represented by the selected features, and then, the most informative features are reselected by referring to the data structures. In this way, the framework uses the interactions between data structure learning and FS to select the most discriminative and informative features. Following the proposed framework, a new semisupervised FS (SSFS) method is derived and studied in depth. Experiments on real-world data sets demonstrate the effectiveness of the proposed method.
Collapse
|
10
|
Zheng J, Qu H, Li Z, Li L, Tang X, Guo F. A novel autoencoder approach to feature extraction with linear separability for high-dimensional data. PeerJ Comput Sci 2022; 8:e1061. [PMID: 37547057 PMCID: PMC10403198 DOI: 10.7717/peerj-cs.1061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 07/18/2022] [Indexed: 08/08/2023]
Abstract
Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the searching of those features scattered in subspaces. As such, it is a tricky task for feature extraction from the data upon a high-dimensional space. To address this issue, this article proposes a novel autoencoder method using Mahalanobis distance metric of rescaling transformation. The key idea of the method is that by implementing Mahalanobis distance metric of rescaling transformation, the difference between the reconstructed distribution and the original distribution can be reduced, so as to improve the ability of feature extraction to the autoencoder. Results show that the proposed approach wins the state-of-the-art methods in terms of both the accuracy of feature extraction and the linear separabilities of the extracted features. We indicate that distance metric-based methods are more suitable for extracting those features with linear separabilities from high-dimensional data than feature selection-based methods. In a high-dimensional space, evaluating feature similarity is relatively easier than evaluating feature importance, so that distance metric methods by evaluating feature similarity gain advantages over feature selection methods by assessing feature importance for feature extraction, while evaluating feature importance is more computationally efficient than evaluating feature similarity.
Collapse
Affiliation(s)
- Jian Zheng
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
| | - Hongchun Qu
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Zhaoni Li
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
| | - Lin Li
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
| | - Xiaoming Tang
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Fei Guo
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
11
|
Zhang Q, Cheng Y, Zhao F, Wang G, Xia S. Optimal Scale Combination Selection Integrating Three-Way Decision With Hasse Diagram. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3675-3689. [PMID: 33635795 DOI: 10.1109/tnnls.2021.3054063] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Multi-scale decision system (MDS) is an effective tool to describe hierarchical data in machine learning. Optimal scale combination (OSC) selection and attribute reduction are two key issues related to knowledge discovery in MDSs. However, searching for all OSCs may result in a combinatorial explosion, and the existing approaches typically incur excessive time consumption. In this study, searching for all OSCs is considered as an optimization problem with the scale space as the search space. Accordingly, a sequential three-way decision model of the scale space is established to reduce the search space by integrating three-way decision with the Hasse diagram. First, a novel scale combination is proposed to perform scale selection and attribute reduction simultaneously, and then an extended stepwise optimal scale selection (ESOSS) method is introduced to quickly search for a single local OSC on a subset of the scale space. Second, based on the obtained local OSCs, a sequential three-way decision model of the scale space is established to divide the search space into three pair-wise disjoint regions, namely the positive, negative, and boundary regions. The boundary region is regarded as a new search space, and it can be proved that a local OSC on the boundary region is also a global OSC. Therefore, all OSCs of a given MDS can be obtained by searching for the local OSCs on the boundary regions in a step-by-step manner. Finally, according to the properties of the Hasse diagram, a formula for calculating the maximal elements of a given boundary region is provided to alleviate space complexity. Accordingly, an efficient OSC selection algorithm is proposed to improve the efficiency of searching for all OSCs by reducing the search space. The experimental results demonstrate that the proposed method can significantly reduce computational time.
Collapse
|
12
|
Yang B, Wu J, Sun A, Gao N, Zhang X. Robust landmark graph-based clustering for high-dimensional data. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Nie F, Wang Z, Tian L, Wang R, Li X. Subspace Sparse Discriminative Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4221-4233. [PMID: 33055053 DOI: 10.1109/tcyb.2020.3025205] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this article, we propose a novel feature selection approach via explicitly addressing the long-standing subspace sparsity issue. Leveraging l2,1 -norm regularization for feature selection is the major strategy in existing methods, which, however, confronts sparsity limitation and parameter-tuning trouble. To circumvent this problem, employing the l2,0 -norm constraint to improve the sparsity of the model has gained more attention recently whereas, optimizing the subspace sparsity constraint is still an unsolved problem, which only can acquire an approximate solution and without convergence proof. To address the above challenges, we innovatively propose a novel subspace sparsity discriminative feature selection (S2DFS) method which leverages a subspace sparsity constraint to avoid tuning parameters. In addition, the trace ratio formulated objective function extremely ensures the discriminability of selected features. Most important, an efficient iterative optimization algorithm is presented to explicitly solve the proposed problem with a closed-form solution and strict convergence proof. To the best of our knowledge, such an optimization algorithm of solving the subspace sparsity issue is first proposed in this article, and a general formulation of the optimization algorithm is provided for improving the extensibility and portability of our method. Extensive experiments conducted on several high-dimensional text and image datasets demonstrate that the proposed method outperforms related state-of-the-art methods in pattern classification and image retrieval tasks.
Collapse
|
14
|
Zheng J, Wang Q, Liu C, Wang J, Liu H, Li J. Relation patterns extraction from high-dimensional climate data with complicated multi-variables using deep neural networks. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03737-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
15
|
Dornaika F, Khoder A, Moujahid A, Khoder W. A supervised discriminant data representation: application to pattern classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07332-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractThe performance of machine learning and pattern recognition algorithms generally depends on data representation. That is why, much of the current effort in performing machine learning algorithms goes into the design of preprocessing frameworks and data transformations able to support effective machine learning. The method proposed in this work consists of a hybrid linear feature extraction scheme to be used in supervised multi-class classification problems. Inspired by two recent linear discriminant methods: robust sparse linear discriminant analysis (RSLDA) and inter-class sparsity-based discriminative least square regression (ICS_DLSR), we propose a unifying criterion that is able to retain the advantages of these two powerful methods. The resulting transformation relies on sparsity-promoting techniques both to select the features that most accurately represent the data and to preserve the row-sparsity consistency property of samples from the same class. The linear transformation and the orthogonal matrix are estimated using an iterative alternating minimization scheme based on steepest descent gradient method and different initialization schemes. The proposed framework is generic in the sense that it allows the combination and tuning of other linear discriminant embedding methods. According to the experiments conducted on several datasets including faces, objects, and digits, the proposed method was able to outperform competing methods in most cases.
Collapse
|
16
|
Bhadra T, Mallik S, Hasan N, Zhao Z. Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer. BMC Bioinformatics 2022; 23:153. [PMID: 35484501 PMCID: PMC9052461 DOI: 10.1186/s12859-022-04678-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 04/11/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND As many complex omics data have been generated during the last two decades, dimensionality reduction problem has been a challenging issue in better mining such data. The omics data typically consists of many features. Accordingly, many feature selection algorithms have been developed. The performance of those feature selection methods often varies by specific data, making the discovery and interpretation of results challenging. METHODS AND RESULTS In this study, we performed a comprehensive comparative study of five widely used supervised feature selection methods (mRMR, INMIFS, DFS, SVM-RFE-CBR and VWMRmR) for multi-omics datasets. Specifically, we used five representative datasets: gene expression (Exp), exon expression (ExpExon), DNA methylation (hMethyl27), copy number variation (Gistic2), and pathway activity dataset (Paradigm IPLs) from a multi-omics study of acute myeloid leukemia (LAML) from The Cancer Genome Atlas (TCGA). The different feature subsets selected by the aforesaid five different feature selection algorithms are assessed using three evaluation criteria: (1) classification accuracy (Acc), (2) representation entropy (RE) and (3) redundancy rate (RR). Four different classifiers, viz., C4.5, NaiveBayes, KNN, and AdaBoost, were used to measure the classification accuary (Acc) for each selected feature subset. The VWMRmR algorithm obtains the best Acc for three datasets (ExpExon, hMethyl27 and Paradigm IPLs). The VWMRmR algorithm offers the best RR (obtained using normalized mutual information) for three datasets (Exp, Gistic2 and Paradigm IPLs), while it gives the best RR (obtained using Pearson correlation coefficient) for two datasets (Gistic2 and Paradigm IPLs). It also obtains the best RE for three datasets (Exp, Gistic2 and Paradigm IPLs). Overall, the VWMRmR algorithm yields best performance for all three evaluation criteria for majority of the datasets. In addition, we identified signature genes using supervised learning collected from the overlapped top feature set among five feature selection methods. We obtained a 7-gene signature (ZMIZ1, ENG, FGFR1, PAWR, KRT17, MPO and LAT2) for EXP, a 9-gene signature for ExpExon, a 7-gene signature for hMethyl27, one single-gene signature (PIK3CG) for Gistic2 and a 3-gene signature for Paradigm IPLs. CONCLUSION We performed a comprehensive comparison of the performance evaluation of five well-known feature selection methods for mining features from various high-dimensional datasets. We identified signature genes using supervised learning for the specific omic data for the disease. The study will help incorporate higher order dependencies among features.
Collapse
Affiliation(s)
- Tapas Bhadra
- Department of Computer Science and Engineering, Aliah University, Kolkata, West Bengal, 700160, India
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Neaj Hasan
- Department of Computer Science and Engineering, Aliah University, Kolkata, West Bengal, 700160, India
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
17
|
Dornaika F, Khoder A, Khoder W. Data representation via refined discriminant analysis and common class structure. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
Wang Y, Gao X, Ru X, Sun P, Wang J. A hybrid feature selection algorithm and its application in bioinformatics. PeerJ Comput Sci 2022; 8:e933. [PMID: 35494789 PMCID: PMC9044222 DOI: 10.7717/peerj-cs.933] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Xinxin Ru
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Pengzhan Sun
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Jihan Wang
- Institute of Medical Research, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| |
Collapse
|
19
|
Sun Z, Yu Y. Robust multi-class feature selection via l2,0-norm regularization minimization. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-205724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Feature selection is an important data preprocessing in data mining and machine learning, that can reduce the number of features without deteriorating model’s performance. Recently, sparse regression has received considerable attention in feature selection task due to its good performance. However, because the l2,0-norm regularization term is non-convex, this problem is hard to solve, and most of the existing methods relaxed it by l2,1-norm. Unlike the existing methods, this paper proposes a novel method to solve the l2,0-norm regularized least squares problem directly based on iterative hard thresholding, which can produce exact row-sparsity solution for weights matrix, and features can be selected more precisely. Furthermore, two homotopy strategies are derived to reduce the computational time of the optimization method, which are more practical for real-world applications. The proposed method is verified on eight biological datasets, experimental results show that our method can achieve higher classification accuracy with fewer number of selected features than the approximate convex counterparts and other state-of-the-art feature selection methods.
Collapse
Affiliation(s)
- Zhenzhen Sun
- College of Computer Science and Technology, HuaQiao University, Quanzhou, Fujian, China
| | - Yuanlong Yu
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian, China
| |
Collapse
|
20
|
Self-paced non-convex regularized analysis-synthesis dictionary learning for unsupervised feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
21
|
Robust active representation via ℓ2,p-norm constraints. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
22
|
Shu L, Huang K, Jiang W, Wu W, Liu H. Feature selection using autoencoders with Bayesian methods to high-dimensional data. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-211348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
It is easy to lead to poor generalization in machine learning tasks using real-world data directly, since such data is usually high-dimensional dimensionality and limited. Through learning the low dimensional representations of high-dimensional data, feature selection can retain useful features for machine learning tasks. Using these useful features effectively trains machine learning models. Hence, it is a challenge for feature selection from high-dimensional data. To address this issue, in this paper, a hybrid approach consisted of an autoencoder and Bayesian methods is proposed for a novel feature selection. Firstly, Bayesian methods are embedded in the proposed autoencoder as a special hidden layer. This of doing is to increase the precision during selecting non-redundant features. Then, the other hidden layers of the autoencoder are used for non-redundant feature selection. Finally, compared with the mainstream approaches for feature selection, the proposed method outperforms them. We find that the way consisted of autoencoders and probabilistic correction methods is more meaningful than that of stacking architectures or adding constraints to autoencoders as regards feature selection. We also demonstrate that stacked autoencoders are more suitable for large-scale feature selection, however, sparse autoencoders are beneficial for a smaller number of feature selection. We indicate that the value of the proposed method provides a theoretical reference to analyze the optimality of feature selection.
Collapse
Affiliation(s)
- Lei Shu
- Chongqing Aerospace Polytechnic, Chongqing, China
| | - Kun Huang
- Urban Vocational College of Sichuan, P.R. China
| | - Wenhao Jiang
- Chongqing Aerospace Polytechnic, Chongqing, China
| | - Wenming Wu
- Chongqing Aerospace Polytechnic, Chongqing, China
| | - Hongling Liu
- Chongqing Aerospace Polytechnic, Chongqing, China
| |
Collapse
|
23
|
Feature selection via minimizing global redundancy for imbalanced data. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02855-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Wang Z, Nie F, Zhang C, Wang R, Li X. Joint nonlinear feature selection and continuous values regression network. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.06.035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
25
|
Ma J, Wang R, Ji W, Zhao J, Zong M, Gilman A. Robust multi-view continuous subspace clustering. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2018.12.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
26
|
Zhi X, Liu J, Wu S, Niu C. A generalized l 2,p-norm regression based feature selection algorithm. J Appl Stat 2021; 50:703-723. [PMID: 36819074 PMCID: PMC9930865 DOI: 10.1080/02664763.2021.1975662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 08/28/2021] [Indexed: 10/20/2022]
Abstract
Feature selection is an important data dimension reduction method, and it has been used widely in applications involving high-dimensional data such as genetic data analysis and image processing. In order to achieve robust feature selection, the latest works apply the l 2 , 1 or l 2 , p -norm of matrix to the loss function and regularization terms in regression, and have achieved encouraging results. However, these existing works rigidly set the matrix norms used in the loss function and the regularization terms to the same l 2 , 1 or l 2 , p -norm, which limit their applications. In addition, the algorithms for solutions they present either have high computational complexity and are not suitable for large data sets, or cannot provide satisfying performance due to the approximate calculation. To address these problems, we present a generalizedl 2 , p -norm regression based feature selection ( l 2 , p -RFS) method based on a new optimization criterion. The criterion extends the optimization criterion of ( l 2 , p -RFS) when the loss function and the regularization terms in regression use different matrix norms. We cast the new optimization criterion in a regression framework without regularization. In this framework, the new optimization criterion can be solved using an iterative re-weighted least squares (IRLS) procedure in which the least squares problem can be solved efficiently by using the least square QR decomposition (LSQR) algorithm. We have conducted extensive experiments to evaluate the proposed algorithm on various well-known data sets of both gene expression and image data sets, and compare it with other related feature selection methods.
Collapse
Affiliation(s)
- X. Zhi
- School of Science, Xi'an University of Posts and Telecommunications, Xi'an, People's Republic of China
| | - J. Liu
- School of Communication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an, People's Republic of China
| | - S. Wu
- School of Communication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an, People's Republic of China
| | - C. Niu
- School of Communication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an, People's Republic of China
| |
Collapse
|
27
|
Bhadra T, Bandyopadhyay S. Supervised feature selection using integration of densest subgraph finding with floating forward–backward search. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
28
|
Automatic determining optimal parameters in multi-kernel collaborative fuzzy clustering based on dimension constraint. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
29
|
Zhang X, Fan M, Wang D, Zhou P, Tao D. Top-k Feature Selection Framework Using Robust 0-1 Integer Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3005-3019. [PMID: 32735538 DOI: 10.1109/tnnls.2020.3009209] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection (FS), which identifies the relevant features in a data set to facilitate subsequent data analysis, is a fundamental problem in machine learning and has been widely studied in recent years. Most FS methods rank the features in order of their scores based on a specific criterion and then select the k top-ranked features, where k is the number of desired features. However, these features are usually not the top- k features and may present a suboptimal choice. To address this issue, we propose a novel FS framework in this article to select the exact top- k features in the unsupervised, semisupervised, and supervised scenarios. The new framework utilizes the l0,2 -norm as the matrix sparsity constraint rather than its relaxations, such as the l1,2 -norm. Since the l0,2 -norm constrained problem is difficult to solve, we transform the discrete l0,2 -norm-based constraint into an equivalent 0-1 integer constraint and replace the 0-1 integer constraint with two continuous constraints. The obtained top- k FS framework with two continuous constraints is theoretically equivalent to the l0,2 -norm constrained problem and can be optimized by the alternating direction method of multipliers (ADMM). Unsupervised and semisupervised FS methods are developed based on the proposed framework, and extensive experiments on real-world data sets are conducted to demonstrate the effectiveness of the proposed FS framework.
Collapse
|
30
|
Song P, Zheng W, Yu Y, Ou S. Speech Emotion Recognition Based on Robust Discriminative Sparse Regression. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.2990928] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
31
|
Shi Z, Wen B, Gao Q, Zhang B. Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data. Mol Cell Proteomics 2021; 20:100083. [PMID: 33887487 PMCID: PMC8165452 DOI: 10.1016/j.mcpro.2021.100083] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 03/25/2021] [Accepted: 04/14/2021] [Indexed: 01/11/2023] Open
Abstract
Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms.
Collapse
Key Words
- auroc, area under the receiver operating characteristic curve
- crc, colorectal carcinoma
- fpkm, fragments per kilobase of transcript per million mapped reads
- gbm, gradient boosting machine
- go, gene ontology
- hcc, hepatocellular carcinoma
- ibaq, intensity-based absolute quantification
- knn, k-nearest neighbor
- lasso, least absolute shrinkage and selection operator
- lpcat1, lysophosphatidylcholine acyltransferase 1
- lr, logistic regression
- mrmr, maximum relevance minimum redundancy
- ms, mass spectrometry
- msi, microsatellite instability
- mss, microsatellite stable
- pc, principal component
- pca, principal component analysis
- proms, protein marker selection
- proms_mo, protein marker selection_multiomics
- rf, random forests
- rsem, rna-seq by expectation maximization
- smc4, structural maintenance of chromosome subunit 4
- spca, supervised principal component analysis
- stat1, signal transducer and activator of transcription 1
- svm, support vector machine
- tmt, tandem mass tag
Collapse
Affiliation(s)
- Zhiao Shi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Qiang Gao
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital, Fudan University, and Key Laboratory of Carcinogenesis and Cancer Invasion of Ministry of Education, Shanghai, China
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
32
|
Xia G, Chen B, Sun H, Liu Q. Nonconvex Low-Rank Kernel Sparse Subspace Learning for Keyframe Extraction and Motion Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1612-1626. [PMID: 32340963 DOI: 10.1109/tnnls.2020.2985817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
By exploiting the kernel trick, the sparse subspace model is extended to the nonlinear version with one or a combination of predefined kernels, but the high-dimensional space induced by predefined kernels is not guaranteed to be able to capture the features of the nonlinear data in theory. In this article, we propose a nonconvex low-rank learning framework in an unsupervised way to learn a kernel to replace the predefined kernel in the sparse subspace model. The learned kernel by a nonconvex relaxation of rank can better exploiting the low-rank property of nonlinear data to induce a high-dimensional Hilbert space that more closely approaches the true feature space. Furthermore, we give a global closed-form optimal solution of the nonconvex rank minimization and prove it. Considering the low-rank and sparseness characteristics of motion capture data in its feature space, we use them to verify the better representation of nonlinear data with the learned kernel via two tasks: keyframe extraction and motion segmentation. The performances on both tasks demonstrate the advantage of our model over the sparse subspace model with predefined kernels and some other related state-of-art methods.
Collapse
|
33
|
|
34
|
Khoder A, Dornaika F. An enhanced approach to the robust discriminant analysis and class sparsity based embedding. Neural Netw 2021; 136:11-16. [PMID: 33422928 DOI: 10.1016/j.neunet.2020.12.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 12/10/2020] [Accepted: 12/23/2020] [Indexed: 10/22/2022]
Abstract
In recent times, feature extraction attracted much attention in machine learning and pattern recognition fields. This paper extends and improves a scheme for linear feature extraction that can be used in supervised multi-class classification problems. Inspired by recent frameworks for robust sparse LDA and Inter-class sparsity, we propose a unifying criterion able to retain the advantages of these two powerful linear discriminant methods. We introduce an iterative alternating minimization scheme in order to estimate the linear transformation and the orthogonal matrix. The linear transformation is efficiently updated via the steepest descent gradient technique. The proposed framework is generic in the sense that it allows the combination and tuning of other linear discriminant embedding methods. We used our proposed method to fine tune the linear solutions delivered by two recent linear methods: RSLDA and RDA_FSIS. Experiments have been conducted on public image datasets of different types including objects, faces, and digits. The proposed framework compared favorably with several competing methods.
Collapse
Affiliation(s)
- A Khoder
- University of the Basque Country UPV/EHU, San Sebastian, Spain
| | - F Dornaika
- Henan University, Kaifeng, China; University of the Basque Country UPV/EHU, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
| |
Collapse
|
35
|
Multi-view generalized support vector machine via mining the inherent relationship between views with applications to face and fire smoke recognition. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106488] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
36
|
Joint local structure preservation and redundancy minimization for unsupervised feature selection. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01800-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
37
|
|
38
|
Yang Z, Ye Q, Chen Q, Ma X, Fu L, Yang G, Yan H, Liu F. Robust discriminant feature selection via joint L2,1-norm distance minimization and maximization. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106090] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
39
|
Dornaika F, Khoder A. Linear embedding by joint Robust Discriminant Analysis and Inter-class Sparsity. Neural Netw 2020; 127:141-159. [PMID: 32361379 DOI: 10.1016/j.neunet.2020.04.018] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 04/16/2020] [Accepted: 04/16/2020] [Indexed: 10/24/2022]
Abstract
Linear Discriminant Analysis (LDA) and its variants are widely used as feature extraction methods. They have been used for different classification tasks. However, these methods have some limitations that need to be overcome. The main limitation is that the projection obtained by LDA does not provide a good interpretability for the features. In this paper, we propose a novel supervised method used for multi-class classification that simultaneously performs feature selection and extraction. The targeted projection transformation focuses on the most discriminant original features, and at the same time, makes sure that the transformed features (extracted features) belonging to each class have common sparsity. Our proposed method is called Robust Discriminant Analysis with Feature Selection and Inter-class Sparsity (RDA_FSIS). The corresponding model integrates two types of sparsity. The first type is obtained by imposing the ℓ2,1 constraint on the projection matrix in order to perform feature selection. The second type of sparsity is obtained by imposing the inter-class sparsity constraint used for ensuring a common sparsity structure in each class. An orthogonal matrix is also introduced in our model in order to guarantee that the extracted features can retain the main variance of the original data and thus improve the robustness to noise. The proposed method retrieves the LDA transformation by taking into account the two types of sparsity. Various experiments are conducted on several image datasets including faces, objects and digits. The projected features are used for multi-class classification. Obtained results show that the proposed method outperforms other competing methods by learning a more compact and discriminative transformation.
Collapse
Affiliation(s)
- F Dornaika
- University of the Basque Country UPV/EHU, San Sebastian, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.
| | - A Khoder
- University of the Basque Country UPV/EHU, San Sebastian, Spain
| |
Collapse
|
40
|
Zhou P, Chen J, Fan M, Du L, Shen YD, Li X. Unsupervised feature selection for balanced clustering. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105417] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
41
|
Mirzaei A, Pourahmadi V, Soltani M, Sheikhzadeh H. Deep feature selection using a teacher-student network. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
42
|
Lin M, Cui H, Chen W, van Engelen A, de Bruijne M, Azarpazhooh MR, Sohrevardi SM, Spence JD, Chiu B. Longitudinal assessment of carotid plaque texture in three-dimensional ultrasound images based on semi-supervised graph-based dimensionality reduction and feature selection. Comput Biol Med 2020; 116:103586. [DOI: 10.1016/j.compbiomed.2019.103586] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 11/25/2019] [Accepted: 12/13/2019] [Indexed: 11/28/2022]
|
43
|
Li X, Chen M, Wang Q. Self-Tuned Discrimination-Aware Method for Unsupervised Feature Selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2275-2284. [PMID: 30530372 DOI: 10.1109/tnnls.2018.2881211] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Unsupervised feature selection is fundamentally important for processing unlabeled high-dimensional data, and several methods have been proposed on this topic. Most existing embedded unsupervised methods just emphasize the data structure in the input space, which may contain large noise. Therefore, they are limited to perceive the discriminative information implied within the low-dimensional manifold. In addition, these methods always involve several parameters to be tuned, which is time-consuming. In this paper, we present a self-tuned discrimination-aware (STDA) approach for unsupervised feature selection. The main contributions of this paper are threefold: 1) it adopts the advantage of discriminant analysis technique to select the valuable features; 2) it learns the local data structure adaptively in the discriminative subspace to alleviate the effect of data noise; and 3) it performs feature selection and clustering simultaneously with an efficient optimization strategy, and saves the additional efforts to tune parameters. Experimental results on a toy data set and various real-world benchmarks justify the effectiveness of STDA on both feature selection and data clustering, and demonstrate its promising performance against the state of the arts.
Collapse
|
44
|
Rangzan K, Kabolizadeh M, Karimi D, Zareie S. Supervised cross-fusion method: a new triplet approach to fuse thermal, radar, and optical satellite data for land use classification. ENVIRONMENTAL MONITORING AND ASSESSMENT 2019; 191:481. [PMID: 31273539 DOI: 10.1007/s10661-019-7621-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 06/25/2019] [Indexed: 06/09/2023]
Abstract
This study presents a new fusion method namely supervised cross-fusion method to improve the capability of fused thermal, radar, and optical images for classification. The proposed cross-fusion method is a combination of pixel-based and supervised feature-based fusion of thermal, radar, and optical data. The pixel-based fusion was applied to fuse optical data of Sentinel-2 and Landsat 8. According to correlation coefficient (CR) and signal to noise ratio (SNR), among the used pixel-based fusion methods, wavelet obtained the best results for fusion. Considering spectral and spatial information preservation, CR of the wavelet method is 0.97 and 0.96, respectively. The supervised feature-based fusion method is a fusion of best output of pixel-based fusion level, land surface temperature (LST) data, and Sentinel-1 radar image using a supervised approach. The supervised approach is a supervised feature selection and learning of the inputs based on linear discriminant analysis and sparse regularization (LDASR) algorithm. In the present study, the non-negative matrix factorization (NMF) was utilized for feature extraction. A comparison of the obtained results with state of the art fusion method indicated a higher accuracy of our proposed method of classification. The rotation forest (RoF) classification results improvement was 25% and the support vector machine (SVM) results improvement was 31%. The results showed that the proposed method is well classified and separated four main classes of settlements, barren land, river, river bank, and even the bridges over the river. Also, a number of unclassified pixels by SVM are very low compared to other classification methods and can be neglected. The study results showed that LST calculated using thermal data has had positive effects on improving the classification results. By comparing the results of supervised cross-fusion without using LST data to the proposed method results, SVM and RoF classifiers showed 38% and 7% of classification improvement, respectively.
Collapse
Affiliation(s)
- Kazem Rangzan
- Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, Iran.
| | - Mostafa Kabolizadeh
- Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, Iran
| | - Danya Karimi
- Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, Iran
| | - Sajad Zareie
- Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz, Ahvaz, Iran
| |
Collapse
|
45
|
Feature Selection with Conditional Mutual Information Considering Feature Interaction. Symmetry (Basel) 2019. [DOI: 10.3390/sym11070858] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Feature interaction is a newly proposed feature relevance relationship, but the unintentional removal of interactive features can result in poor classification performance for this relationship. However, traditional feature selection algorithms mainly focus on detecting relevant and redundant features while interactive features are usually ignored. To deal with this problem, feature relevance, feature redundancy and feature interaction are redefined based on information theory. Then a new feature selection algorithm named CMIFSI (Conditional Mutual Information based Feature Selection considering Interaction) is proposed in this paper, which makes use of conditional mutual information to estimate feature redundancy and interaction, respectively. To verify the effectiveness of our algorithm, empirical experiments are conducted to compare it with other several representative feature selection algorithms. The results on both synthetic and benchmark datasets indicate that our algorithm achieves better results than other methods in most cases. Further, it highlights the necessity of dealing with feature interaction.
Collapse
|
46
|
Peng H, Liu CL. Discriminative Feature Selection via Employing Smooth and Robust Hinge Loss. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:788-802. [PMID: 30047911 DOI: 10.1109/tnnls.2018.2852297] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A wide variety of sparsity-inducing feature selection methods have been developed in recent years. Most of the loss functions of these approaches are built upon regression since it is general and easy to optimize, but regression is not well suitable for classification. In contrast, the hinge loss (HL) of support vector machines has proved to be powerful to handle classification tasks, but a model with existing multiclass HL and sparsity regularization is difficult to optimize. In view of that, we propose a new loss, called smooth and robust HL, which gathers the merits of regression and HL but overcome their drawbacks, and apply it to our sparsity regularized feature selection model. To optimize the model, we present a new variant of accelerated proximal gradient (APG) algorithm, which boosts the discriminative margins among different classes, compared with standard APG algorithms. We further propose an efficient optimization technique to solve the proximal projection problem at each iteration step, which is a key component of the new APG algorithm. We theoretically prove that the new APG algorithm converges at rate O(1/k2) if it is convex ( k is the iteration counter), which is the optimal convergence rate for smooth problems. Experimental results on nine publicly available data sets demonstrate the effectiveness of our method.
Collapse
|
47
|
Xiong H, Cheng W, Bian J, Hu W, Sun Z, Guo Z. DBSDA : Lowering the Bound of Misclassification Rate for Sparse Linear Discriminant Analysis via Model Debiasing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:707-717. [PMID: 30047901 DOI: 10.1109/tnnls.2018.2846783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Linear discriminant analysis (LDA) is a well-known technique for linear classification, feature extraction, and dimension reduction. To improve the accuracy of LDA under the high dimension low sample size (HDLSS) settings, shrunken estimators, such as Graphical Lasso, can be used to strike a balance between biases and variances. Although the estimator with induced sparsity obtains a faster convergence rate, however, the introduced bias may also degrade the performance. In this paper, we theoretically analyze how the sparsity and the convergence rate of the precision matrix (also known as inverse covariance matrix) estimator would affect the classification accuracy by proposing an analytic model on the upper bound of an LDA misclassification rate. Guided by the model, we propose a novel classifier, DBSDA , which improves classification accuracy through debiasing. Theoretical analysis shows that DBSDA possesses a reduced upper bound of misclassification rate and better asymptotic properties than sparse LDA (SDA). We conduct experiments on both synthetic datasets and real application datasets to confirm the correctness of our theoretical analysis and demonstrate the superiority of DBSDA over LDA, SDA, and other downstream competitors under HDLSS settings.
Collapse
|
48
|
Sun S, Wan Y, Zeng C. Multi-view Embedding with Adaptive Shared Output and Similarity for unsupervised feature selection. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.11.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
49
|
ℓ0-based sparse canonical correlation analysis with application to cross-language document retrieval. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.089] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
50
|
Yuan M, Yang Z, Ji G. Partial maximum correlation information: A new feature selection method for microarray data classification. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.084] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|