1
|
Guo Y, Sun Y, Wang Z, Nie F, Wang F. Double-Structured Sparsity Guided Flexible Embedding Learning for Unsupervised Feature Selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13354-13367. [PMID: 37167052 DOI: 10.1109/tnnls.2023.3267184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
In this article, we propose a novel unsupervised feature selection model combined with clustering, named double-structured sparsity guided flexible embedding learning (DSFEL) for unsupervised feature selection. DSFEL includes a module for learning a block-diagonal structural sparse graph that represents the clustering structure and another module for learning a completely row-sparse projection matrix using the l2,0 -norm constraint to select distinctive features. Compared with the commonly used l2,1 -norm regularization term, the l2,0 -norm constraint can avoid the drawbacks of sparsity limitation and parameter tuning. The optimization of the l2,0 -norm constraint problem, which is a nonconvex and nonsmooth problem, is a formidable challenge, and previous optimization algorithms have only been able to provide approximate solutions. In order to address this issue, this article proposes an efficient optimization strategy that yields a closed-form solution. Eventually, through comprehensive experimentation on nine real-world datasets, it is demonstrated that the proposed method outperforms existing state-of-the-art unsupervised feature selection methods.
Collapse
|
2
|
Xu W, Huang M, Jiang Z, Qian Y. Graph-Based Unsupervised Feature Selection for Interval-Valued Information System. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12576-12589. [PMID: 37067967 DOI: 10.1109/tnnls.2023.3263684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Feature selection has become one of the hot research topics in the era of big data. At the same time, as an extension of single-valued data, interval-valued data with its inherent uncertainty tend to be more applicable than single-valued data in some fields for characterizing inaccurate and ambiguous information, such as medical test results and qualified product indicators. However, there are relatively few studies on unsupervised attribute reduction for interval-valued information systems (IVISs), and it remains to be studied how to effectively control the dramatic increase of time cost in feature selection of large sample datasets. For these reasons, we propose a feature selection method for IVISs based on graph theory. Then, the model complexity could be greatly reduced after we utilize the properties of the matrix power series to optimize the calculation of the original model. Our approach can be divided into two steps. The first is feature ranking with the principles of relevance and nonredundancy, and the second is selecting top-ranked attributes when the number of features to keep is fixed as a priori. In this article, experiments are performed on 14 public datasets and the corresponding seven comparative algorithms. The results of the experiments verify that our algorithm is effective and efficient for feature selection in IVISs.
Collapse
|
3
|
Wang Y, Wang W, Pal NR. Supervised Feature Selection via Collaborative Neurodynamic Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6878-6892. [PMID: 36306292 DOI: 10.1109/tnnls.2022.3213167] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
As a crucial part of machine learning and pattern recognition, feature selection aims at selecting a subset of the most informative features from the set of all available features. In this article, supervised feature selection is at first formulated as a mixed-integer optimization problem with an objective function of weighted feature redundancy and relevancy subject to a cardinality constraint on the number of selected features. It is equivalently reformulated as a bound-constrained mixed-integer optimization problem by augmenting the objective function with a penalty function for realizing the cardinality constraint. With additional bilinear and linear equality constraints for realizing the integrality constraints, it is further reformulated as a bound-constrained biconvex optimization problem with two more penalty terms. Two collaborative neurodynamic optimization (CNO) approaches are proposed for solving the formulated and reformulated feature selection problems. One of the proposed CNO approaches uses a population of discrete-time recurrent neural networks (RNNs), and the other use a pair of continuous-time projection networks operating concurrently on two timescales. Experimental results on 13 benchmark datasets are elaborated to substantiate the superiority of the CNO approaches to several mainstream methods in terms of average classification accuracy with three commonly used classifiers.
Collapse
|
4
|
Chen H, Nie F, Wang R, Li X. Unsupervised Feature Selection With Flexible Optimal Graph. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2014-2027. [PMID: 35839204 DOI: 10.1109/tnnls.2022.3186171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In the unsupervised feature selection method based on spectral analysis, constructing a similarity matrix is a very important part. In existing methods, the linear low-dimensional projection used in the process of constructing the similarity matrix is too hard, it is very challenging to construct a reliable similarity matrix. To this end, we propose a method to construct a flexible optimal graph. Based on this, we propose an unsupervised feature selection method named unsupervised feature selection with flexible optimal graph and l2,1 -norm regularization (FOG-R). Unlike other methods that use linear projection to approximate the low-dimensional manifold of the original data when constructing a similarity matrix, FOG-R can learn a flexible optimal graph, and by combining flexible optimal graph learning and feature selection into a unified framework to get an adaptive similarity matrix. In addition, an iterative algorithm with a strict convergence proof is proposed to solve FOG-R. l2,1 -norm regularization will introduce an additional regularization parameter, which will cause parameter-tuning trouble. Therefore, we propose another unsupervised feature selection method, that is, unsupervised feature selection with a flexible optimal graph and l2,0 -norm constraint (FOG-C), which can avoid tuning additional parameters and obtain a more sparse projection matrix. Most critically, we propose an effective iterative algorithm that can solve FOG-C globally with strict convergence proof. Comparative experiments conducted on 12 public datasets show that FOG-R and FOG-C perform better than the other nine state-of-the-art unsupervised feature selection algorithms.
Collapse
|
5
|
Wang R, Bian J, Nie F, Li X. Nonlinear Feature Selection Neural Network via Structured Sparse Regularization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9493-9505. [PMID: 36395136 DOI: 10.1109/tnnls.2022.3209716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Feature selection is an important and effective data preprocessing method, which can remove the noise and redundant features while retaining the relevant and discriminative features in high-dimensional data. In real-world applications, the relationships between data samples and their labels are usually nonlinear. However, most of the existing feature selection models focus on learning a linear transformation matrix, which cannot capture such a nonlinear structure in practice and will degrade the performance of downstream tasks. To address the issue, we propose a novel nonlinear feature selection method to select those most relevant and discriminative features in high-dimensional dataset. Specifically, our method learns the nonlinear structure of high-dimensional data by a neural network with cross entropy loss function, and then using the structured sparsity norm such as l2,p -norm to regularize the weights matrix connecting the input layer and the first hidden layer of the neural network model to learn weight of each feature. Therefore, a structural sparse weights matrix is obtained by conducting nonlinear learning based on a neural network with structured sparsity regularization. Then, we use the gradient descent method to achieve the optimal solution of the proposed model. Evaluating the experimental results on several synthetic datasets and real-world datasets shows the effectiveness and superiority of the proposed nonlinear feature selection model.
Collapse
|
6
|
A joint multiobjective optimization of feature selection and classifier design for high-dimensional data classification. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
7
|
Wang Y, Wang J. Neurodynamics-driven holistic approaches to semi-supervised feature selection. Neural Netw 2022; 157:377-386. [DOI: 10.1016/j.neunet.2022.10.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 10/25/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022]
|
8
|
Zheng J, Qu H, Li Z, Li L, Tang X, Guo F. A novel autoencoder approach to feature extraction with linear separability for high-dimensional data. PeerJ Comput Sci 2022; 8:e1061. [PMID: 37547057 PMCID: PMC10403198 DOI: 10.7717/peerj-cs.1061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 07/18/2022] [Indexed: 08/08/2023]
Abstract
Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the searching of those features scattered in subspaces. As such, it is a tricky task for feature extraction from the data upon a high-dimensional space. To address this issue, this article proposes a novel autoencoder method using Mahalanobis distance metric of rescaling transformation. The key idea of the method is that by implementing Mahalanobis distance metric of rescaling transformation, the difference between the reconstructed distribution and the original distribution can be reduced, so as to improve the ability of feature extraction to the autoencoder. Results show that the proposed approach wins the state-of-the-art methods in terms of both the accuracy of feature extraction and the linear separabilities of the extracted features. We indicate that distance metric-based methods are more suitable for extracting those features with linear separabilities from high-dimensional data than feature selection-based methods. In a high-dimensional space, evaluating feature similarity is relatively easier than evaluating feature importance, so that distance metric methods by evaluating feature similarity gain advantages over feature selection methods by assessing feature importance for feature extraction, while evaluating feature importance is more computationally efficient than evaluating feature similarity.
Collapse
Affiliation(s)
- Jian Zheng
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
| | - Hongchun Qu
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Zhaoni Li
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
| | - Lin Li
- College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
| | - Xiaoming Tang
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Fei Guo
- College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
9
|
Gong X, Yu L, Wang J, Zhang K, Bai X, Pal NR. Unsupervised feature selection via adaptive autoencoder with redundancy control. Neural Netw 2022; 150:87-101. [DOI: 10.1016/j.neunet.2022.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 01/21/2022] [Accepted: 03/03/2022] [Indexed: 10/18/2022]
|
10
|
Multi-classification for high-dimensional data using probabilistic neural networks. JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES 2022. [DOI: 10.1016/j.jrras.2022.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Shu L, Huang K, Jiang W, Wu W, Liu H. Feature selection using autoencoders with Bayesian methods to high-dimensional data. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-211348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
It is easy to lead to poor generalization in machine learning tasks using real-world data directly, since such data is usually high-dimensional dimensionality and limited. Through learning the low dimensional representations of high-dimensional data, feature selection can retain useful features for machine learning tasks. Using these useful features effectively trains machine learning models. Hence, it is a challenge for feature selection from high-dimensional data. To address this issue, in this paper, a hybrid approach consisted of an autoencoder and Bayesian methods is proposed for a novel feature selection. Firstly, Bayesian methods are embedded in the proposed autoencoder as a special hidden layer. This of doing is to increase the precision during selecting non-redundant features. Then, the other hidden layers of the autoencoder are used for non-redundant feature selection. Finally, compared with the mainstream approaches for feature selection, the proposed method outperforms them. We find that the way consisted of autoencoders and probabilistic correction methods is more meaningful than that of stacking architectures or adding constraints to autoencoders as regards feature selection. We also demonstrate that stacked autoencoders are more suitable for large-scale feature selection, however, sparse autoencoders are beneficial for a smaller number of feature selection. We indicate that the value of the proposed method provides a theoretical reference to analyze the optimality of feature selection.
Collapse
Affiliation(s)
- Lei Shu
- Chongqing Aerospace Polytechnic, Chongqing, China
| | - Kun Huang
- Urban Vocational College of Sichuan, P.R. China
| | - Wenhao Jiang
- Chongqing Aerospace Polytechnic, Chongqing, China
| | - Wenming Wu
- Chongqing Aerospace Polytechnic, Chongqing, China
| | - Hongling Liu
- Chongqing Aerospace Polytechnic, Chongqing, China
| |
Collapse
|
12
|
Zhang S, Dang X, Nguyen D, Wilkins D, Chen Y. Estimating Feature-Label Dependence Using Gini Distance Statistics. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1947-1963. [PMID: 31869782 DOI: 10.1109/tpami.2019.2960358] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Identifying statistical dependence between the features and the label is a fundamental problem in supervised learning. This paper presents a framework for estimating dependence between numerical features and a categorical label using generalized Gini distance, an energy distance in reproducing kernel Hilbert spaces (RKHS). Two Gini distance based dependence measures are explored: Gini distance covariance and Gini distance correlation. Unlike Pearson covariance and correlation, which do not characterize independence, the above Gini distance based measures define dependence as well as independence of random variables. The test statistics are simple to calculate and do not require probability density estimation. Uniform convergence bounds and asymptotic bounds are derived for the test statistics. Comparisons with distance covariance statistics are provided. It is shown that Gini distance statistics converge faster than distance covariance statistics in the uniform convergence bounds, hence tighter upper bounds on both Type I and Type II errors. Moreover, the probability of Gini distance covariance statistic under-performing the distance covariance statistic in Type II error decreases to 0 exponentially with the increase of the sample size. Extensive experimental results are presented to demonstrate the performance of the proposed method.
Collapse
|
13
|
Wang Y, Wang J, Che H. Two-timescale neurodynamic approaches to supervised feature selection based on alternative problem formulations. Neural Netw 2021; 142:180-191. [PMID: 34020085 DOI: 10.1016/j.neunet.2021.04.038] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 04/21/2021] [Accepted: 04/29/2021] [Indexed: 10/21/2022]
Abstract
Feature selection is a crucial step in data processing and machine learning. While many greedy and sequential feature selection approaches are available, a holistic neurodynamics approach to supervised feature selection is recently developed via fractional programming by minimizing feature redundancy and maximizing relevance simultaneously. In view that the gradient of the fractional objective function is also fractional, alternative problem formulations are desirable to obviate the fractional complexity. In this paper, the fractional programming problem formulation is equivalently reformulated as bilevel and bilinear programming problems without using any fractional function. Two two-timescale projection neural networks are adapted for solving the reformulated problems. Experimental results on six benchmark datasets are elaborated to demonstrate the global convergence and high classification performance of the proposed neurodynamic approaches in comparison with six mainstream feature selection approaches.
Collapse
Affiliation(s)
- Yadi Wang
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, 475004, China; Institute of Data and Knowledge Engineering, School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China.
| | - Jun Wang
- Department of Computer Science and School of Data Science, City University of Hong Kong, Kowloon, Hong Kong; Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong, China.
| | - Hangjun Che
- College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China; Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing 400715, China.
| |
Collapse
|
14
|
Wang J, Zhang H, Wang J, Pu Y, Pal NR. Feature Selection Using a Neural Network With Group Lasso Regularization and Controlled Redundancy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1110-1123. [PMID: 32396104 DOI: 10.1109/tnnls.2020.2980383] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose a neural network-based feature selection (FS) scheme that can control the level of redundancy in the selected features by integrating two penalties into a single objective function. The Group Lasso penalty aims to produce sparsity in features in a grouped manner. The redundancy-control penalty, which is defined based on a measure of dependence among features, is utilized to control the level of redundancy among the selected features. Both the penalty terms involve the L2,1 -norm of weight matrix between the input and hidden layers. These penalty terms are nonsmooth at the origin, and hence, one simple but efficient smoothing technique is employed to overcome this issue. The monotonicity and convergence of the proposed algorithm are specified and proved under suitable assumptions. Then, extensive experiments are conducted on both artificial and real data sets. Empirical results explicitly demonstrate the ability of the proposed FS scheme and its effectiveness in controlling redundancy. The empirical simulations are observed to be consistent with the theoretical results.
Collapse
|
15
|
Wang Y, Li X, Wang J. A neurodynamic optimization approach to supervised feature selection via fractional programming. Neural Netw 2021; 136:194-206. [PMID: 33497995 DOI: 10.1016/j.neunet.2021.01.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 12/04/2020] [Accepted: 01/07/2021] [Indexed: 11/25/2022]
Abstract
Feature selection is an important issue in machine learning and data mining. Most existing feature selection methods are greedy in nature thus are prone to sub-optimality. Though some global feature selection methods based on unsupervised redundancy minimization can potentiate clustering performance improvements, their efficacy for classification may be limited. In this paper, a neurodynamics-based holistic feature selection approach is proposed via feature redundancy minimization and relevance maximization. An information-theoretic similarity coefficient matrix is defined based on multi-information and entropy to measure feature redundancy with respect to class labels. Supervised feature selection is formulated as a fractional programming problem based on the similarity coefficients. A neurodynamic approach based on two one-layer recurrent neural networks is developed for solving the formulated feature selection problem. Experimental results with eight benchmark datasets are discussed to demonstrate the global convergence of the neural networks and superiority of the proposed neurodynamic approach to several existing feature selection methods in terms of classification accuracy, precision, recall, and F-measure.
Collapse
Affiliation(s)
- Yadi Wang
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, 475004, China; Institute of Data and Knowledge Engineering, School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China; School of Computer Science and Engineering, Southeast University, Nanjing, 211189, China.
| | - Xiaoping Li
- School of Computer Science and Engineering, Southeast University, Nanjing, 211189, China; Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing, 211189, China.
| | - Jun Wang
- Department of Computer Science and School of Data Science, City University of Hong Kong, Kowloon, Hong Kong.
| |
Collapse
|
16
|
Xie X, Zhang H, Wang J, Chang Q, Wang J, Pal NR. Learning Optimized Structure of Neural Networks by Hidden Node Pruning With L 1 Regularization. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1333-1346. [PMID: 31765323 DOI: 10.1109/tcyb.2019.2950105] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose three different methods to determine the optimal number of hidden nodes based on L1 regularization for a multilayer perceptron network. The first two methods, respectively, use a set of multiplier functions and multipliers for the hidden-layer nodes and implement the L1 regularization on those, while the third method equipped with the same multipliers uses a smoothing approximation of the L1 regularization. Each of these methods begins with a given number of hidden nodes, then the network is trained to obtain an optimal architecture discarding redundant hidden nodes using the multiplier functions or multipliers. A simple and generic method, namely, the matrix-based convergence proving method (MCPM), is introduced to prove the weak and strong convergence of the presented smoothing algorithms. The performance of the three pruning methods has been tested on 11 different classification datasets. The results demonstrate the efficient pruning abilities and competitive generalization by the proposed methods. The theoretical results are also validated by the results.
Collapse
|
17
|
Conjugate gradient-based Takagi-Sugeno fuzzy neural network parameter identification and its convergence analysis. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.035] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
18
|
Fully complex conjugate gradient-based neural networks using Wirtinger calculus framework: Deterministic convergence and its application. Neural Netw 2019; 115:50-64. [DOI: 10.1016/j.neunet.2019.02.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 01/15/2019] [Accepted: 02/28/2019] [Indexed: 11/16/2022]
|
19
|
Gao S, Zhou M, Wang Y, Cheng J, Yachi H, Wang J. Dendritic Neuron Model With Effective Learning Algorithms for Classification, Approximation, and Prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:601-614. [PMID: 30004892 DOI: 10.1109/tnnls.2018.2846646] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An artificial neural network (ANN) that mimics the information processing mechanisms and procedures of neurons in human brains has achieved a great success in many fields, e.g., classification, prediction, and control. However, traditional ANNs suffer from many problems, such as the hard understanding problem, the slow and difficult training problems, and the difficulty to scale them up. These problems motivate us to develop a new dendritic neuron model (DNM) by considering the nonlinearity of synapses, not only for a better understanding of a biological neuronal system, but also for providing a more useful method for solving practical problems. To achieve its better performance for solving problems, six learning algorithms including biogeography-based optimization, particle swarm optimization, genetic algorithm, ant colony optimization, evolutionary strategy, and population-based incremental learning are for the first time used to train it. The best combination of its user-defined parameters has been systemically investigated by using the Taguchi's experimental design method. The experiments on 14 different problems involving classification, approximation, and prediction are conducted by using a multilayer perceptron and the proposed DNM. The results suggest that the proposed learning algorithms are effective and promising for training DNM and thus make DNM more powerful in solving classification, approximation, and prediction problems.
Collapse
|
20
|
Du X, Nie F, Wang W, Yang Y, Zhou X. Exploiting Combination Effect for Unsupervised Feature Selection by l 2,0 Norm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:201-214. [PMID: 29994229 DOI: 10.1109/tnnls.2018.2837100] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In learning applications, exploring the cluster structures of the high dimensional data is an important task. It requires projecting or visualizing the cluster structures into a low dimensional space. The challenges are: 1) how to perform the projection or visualization with less information loss and 2) how to preserve the interpretability of the original data. Recent methods address these challenges simultaneously by unsupervised feature selection. They learn the cluster indicators based on the k nearest neighbor similarity graph, then select the features highly correlated with these indicators. Under this direction, many techniques, such as local discriminative analysis, nonnegative spectral analysis, nonnegative matrix factorization, etc., have been successfully introduced to make the selection more accurate. In this paper, we focus on enhancing the unsupervised feature selection in another perspective, namely, making the selection exploit the combination effect of the features. Given the expected feature amount, previous works operate on the whole features then select those of high coefficients one by one as the output. Our proposed method, instead, operates on a group of features initially then update the selection when a better group appears. Compared to the previous methods, the proposed method exploits the combination effect of the features by l2,0 norm. It improves the selection accuracy where the cluster structures are strongly related to a group of features. We conduct the experiments on six open access data sets from different domains. The experimental results show that our proposed method is more accurate than the recent methods which do not specially consider the combination effect of the features.
Collapse
|
21
|
Armanfard N, Reilly JP, Komeili M. Logistic Localized Modeling of the Sample Space for Feature Selection and Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1396-1413. [PMID: 28333643 DOI: 10.1109/tnnls.2017.2676101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Conventional feature selection algorithms assign a single common feature set to all regions of the sample space. In contrast, this paper proposes a novel algorithm for localized feature selection for which each region of the sample space is characterized by its individual distinct feature subset that may vary in size and membership. This approach can therefore select an optimal feature subset that adapts to local variations of the sample space, and hence offer the potential for improved performance. Feature subsets are computed by choosing an optimal coordinate space so that, within a localized region, within-class distances and between-class distances are, respectively, minimized and maximized. Distances are measured using a logistic function metric within the corresponding region. This enables the optimization process to focus on a localized region within the sample space. A local classification approach is utilized for measuring the similarity of a new input data point to each class. The proposed logistic localized feature selection (lLFS) algorithm is invariant to the underlying probability distribution of the data; hence, it is appropriate when the data are distributed on a nonlinear or disjoint manifold. lLFS is efficiently formulated as a joint convex/increasing quasi-convex optimization problem with a unique global optimum point. The method is most applicable when the number of available training samples is small. The performance of the proposed localized method is successfully demonstrated on a large variety of data sets. We demonstrate that the number of features selected by the lLFS method saturates at the number of available discriminative features. In addition, we have shown that the Vapnik-Chervonenkis dimension of the localized classifier is finite. Both these factors suggest that the lLFS method is insensitive to the overfitting issue, relative to other methods.
Collapse
|
22
|
Sun J, Zhou A, Keates S, Liao S. Simultaneous Bayesian Clustering and Feature Selection Through Student's ${t}$ Mixtures Model. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1187-1199. [PMID: 28362615 DOI: 10.1109/tnnls.2016.2619061] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we proposed a generative model for feature selection under the unsupervised learning context. The model assumes that data are independently and identically sampled from a finite mixture of Student's distributions, which can reduce the sensitiveness to outliers. Latent random variables that represent the features' salience are included in the model for the indication of the relevance of features. As a result, the model is expected to simultaneously realize clustering, feature selection, and outlier detection. Inference is carried out by a tree-structured variational Bayes algorithm. Full Bayesian treatment is adopted in the model to realize automatic model selection. Controlled experimental studies showed that the developed model is capable of modeling the data set with outliers accurately. Furthermore, experiment results showed that the developed algorithm compares favorably against existing unsupervised probability model-based Bayesian feature selection algorithms on artificial and real data sets. Moreover, the application of the developed algorithm on real leukemia gene expression data indicated that it is able to identify the discriminating genes successfully.
Collapse
|
23
|
Yu K, Wu X, Ding W, Mu Y, Wang H. Markov Blanket Feature Selection Using Representative Sets. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2775-2788. [PMID: 28113384 DOI: 10.1109/tnnls.2016.2602365] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
It has received much attention in recent years to use Markov blankets in a Bayesian network for feature selection. The Markov blanket of a class attribute in a Bayesian network is a unique yet minimal feature subset for optimal feature selection if the probability distribution of a data set can be faithfully represented by this Bayesian network. However, if a data set violates the faithful condition, Markov blankets of a class attribute may not be unique. To tackle this issue, in this paper, we propose a new concept of representative sets and then design the selection via group alpha-investing (SGAI) algorithm to perform Markov blanket feature selection with representative sets for classification. Using a comprehensive set of real data, our empirical studies have demonstrated that SGAI outperforms the state-of-the-art Markov blanket feature selectors and other well-established feature selection methods.It has received much attention in recent years to use Markov blankets in a Bayesian network for feature selection. The Markov blanket of a class attribute in a Bayesian network is a unique yet minimal feature subset for optimal feature selection if the probability distribution of a data set can be faithfully represented by this Bayesian network. However, if a data set violates the faithful condition, Markov blankets of a class attribute may not be unique. To tackle this issue, in this paper, we propose a new concept of representative sets and then design the selection via group alpha-investing (SGAI) algorithm to perform Markov blanket feature selection with representative sets for classification. Using a comprehensive set of real data, our empirical studies have demonstrated that SGAI outperforms the state-of-the-art Markov blanket feature selectors and other well-established feature selection methods.
Collapse
Affiliation(s)
- Kui Yu
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA, Australia
| | - Xindong Wu
- School of Computing and Informatics, University of Louisiana, Lafayette, LA, USA
| | - Wei Ding
- Department of Computer Science, University of Massachusetts Boston, Boston, MA, USA
| | - Yang Mu
- Department of Computer Science, University of Massachusetts Boston, Boston, MA, USA
| | - Hao Wang
- Department of Computer Science, Hefei University of Technology, Hefei, China
| |
Collapse
|
24
|
|
25
|
Sun K, Huang SH, Wong DSH, Jang SS. Design and Application of a Variable Selection Method for Multilayer Perceptron Neural Network With LASSO. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1386-1396. [PMID: 28113826 DOI: 10.1109/tnnls.2016.2542866] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
In this paper, a novel variable selection method for neural network that can be applied to describe nonlinear industrial processes is developed. The proposed method is an iterative two-step approach. First, a multilayer perceptron is constructed. Second, the least absolute shrinkage and selection operator is introduced to select the input variables that are truly essential to the model with the shrinkage parameter is determined using a cross-validation method. Then, variables whose input weights are zero are eliminated from the data set. The algorithm is repeated until there is no improvement in the model accuracy. Simulation examples as well as an industrial application in a crude distillation unit are used to validate the proposed algorithm. The results show that the proposed approach can be used to construct a more compressed model, which incorporates a higher level of prediction accuracy than other existing methods.
Collapse
|
26
|
Affiliation(s)
- JinXing Che
- School of Mathematics and Statistics, Xidian University, Xi'an, People's Republic of China
| | - YouLong Yang
- School of Mathematics and Statistics, Xidian University, Xi'an, People's Republic of China
| |
Collapse
|
27
|
Li Z, Tang J. Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:5343-5355. [PMID: 26394422 DOI: 10.1109/tip.2015.2479560] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In many image processing and pattern recognition problems, visual contents of images are currently described by high-dimensional features, which are often redundant and noisy. Toward this end, we propose a novel unsupervised feature selection scheme, namely, nonnegative spectral analysis with constrained redundancy, by jointly leveraging nonnegative spectral clustering and redundancy analysis. The proposed method can directly identify a discriminative subset of the most useful and redundancy-constrained features. Nonnegative spectral analysis is developed to learn more accurate cluster labels of the input images, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables to select the most discriminative features. Row-wise sparse models with a general ℓ(2, p)-norm (0 < p ≤ 1) are leveraged to make the proposed model suitable for feature selection and robust to noise. Besides, the redundancy between features is explicitly exploited to control the redundancy of the selected subset. The proposed problem is formulated as an optimization problem with a well-defined objective function solved by the developed simple yet efficient iterative algorithm. Finally, we conduct extensive experiments on nine diverse image benchmarks, including face data, handwritten digit data, and object image data. The proposed method achieves encouraging the experimental results in comparison with several representative algorithms, which demonstrates the effectiveness of the proposed algorithm for unsupervised feature selection.
Collapse
|