1
|
Self-paced and Bayes-decision-rule linear KNN prediction. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01593-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
2
|
Bian Z, Vong CM, Wong PK, Wang S. Fuzzy KNN Method With Adaptive Nearest Neighbors. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5380-5393. [PMID: 33232252 DOI: 10.1109/tcyb.2020.3031610] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Due to its strong performance in handling uncertain and ambiguous data, the fuzzy k -nearest-neighbor method (FKNN) has realized substantial success in a wide variety of applications. However, its classification performance would be heavily deteriorated if the number k of nearest neighbors was unsuitably fixed for each testing sample. This study examines the feasibility of using only one fixed k value for FKNN on each testing sample. A novel FKNN-based classification method, namely, fuzzy KNN method with adaptive nearest neighbors (A-FKNN), is devised for learning a distinct optimal k value for each testing sample. In the training stage, after applying a sparse representation method on all training samples for reconstruction, A-FKNN learns the optimal k value for each training sample and builds a decision tree (namely, A-FKNN tree) from all training samples with new labels (the learned optimal k values instead of the original labels), in which each leaf node stores the corresponding optimal k value. In the testing stage, A-FKNN identifies the optimal k value for each testing sample by searching the A-FKNN tree and runs FKNN with the optimal k value for each testing sample. Moreover, a fast version of A-FKNN, namely, FA-FKNN, is designed by building the FA-FKNN decision tree, which stores the optimal k value with only a subset of training samples in each leaf node. Experimental results on 32 UCI datasets demonstrate that both A-FKNN and FA-FKNN outperform the compared methods in terms of classification accuracy, and FA-FKNN has a shorter running time.
Collapse
|
3
|
Li X, Wang Y, Ruiz R. A Survey on Sparse Learning Models for Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1642-1660. [PMID: 32386172 DOI: 10.1109/tcyb.2020.2982445] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection is important in both machine learning and pattern recognition. Successfully selecting informative features can significantly increase learning accuracy and improve result comprehensibility. Various methods have been proposed to identify informative features from high-dimensional data by removing redundant and irrelevant features to improve classification accuracy. In this article, we systematically survey existing sparse learning models for feature selection from the perspectives of individual sparse feature selection and group sparse feature selection, and analyze the differences and connections among various sparse learning models. Promising research directions and topics on sparse learning models are analyzed.
Collapse
|
4
|
Chen Y, Cai Z, Shi L, Li W. A fuzzy granular sparse learning model for identifying antigenic variants of influenza viruses. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Zheng L, Chao F, Parthaláin NM, Zhang D, Shen Q. Feature grouping and selection: A graph-based approach. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Gu B, Geng X, Li X, Zheng G. Efficient inexact proximal gradient algorithms for structured sparsity-inducing norm. Neural Netw 2019; 118:352-362. [PMID: 31376633 DOI: 10.1016/j.neunet.2019.06.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 06/21/2019] [Accepted: 06/25/2019] [Indexed: 11/16/2022]
Abstract
Structured-sparsity regularization is popular for sparse learning because of its flexibility of encoding the feature structures. This paper considers a generalized version of structured-sparsity regularization (especially for l1∕l∞ norm) with arbitrary group overlap. Due to the group overlap, it is time-consuming to solve the associated proximal operator. Although Mairal et al. have proposed a network-flow algorithm to solve the proximal operator, it is still time-consuming, especially in the high-dimensional setting. To address this challenge, in this paper, we have developed a more efficient solution for l1∕l∞ group lasso with arbitrary group overlap using inexact proximal gradient method. In each iteration, our algorithm only requires to calculate an inexact solution to the proximal sub-problem, which can be done efficiently. On the theoretic side, the proposed algorithm enjoys the same global convergence rate as the exact proximal methods. Experiments demonstrate that our algorithm is much more efficient than the network-flow algorithm while retaining similar generalization performance.
Collapse
Affiliation(s)
- Bin Gu
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, PR China; School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing, PR China.
| | - Xiang Geng
- School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing, PR China
| | - Xiang Li
- Computer Science Department, University of Western Ontario, London, ON N6A 3K7, Canada
| | - Guansheng Zheng
- School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing, PR China
| |
Collapse
|
7
|
Zhou Q, Zhao Q. Flexible Clustered Multi-Task Learning by Learning Representative Tasks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:266-278. [PMID: 26761733 DOI: 10.1109/tpami.2015.2452911] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Multi-task learning (MTL) methods have shown promising performance by learning multiple relevant tasks simultaneously, which exploits to share useful information across relevant tasks. Among various MTL methods, clustered multi-task learning (CMTL) assumes that all tasks can be clustered into groups and attempts to learn the underlying cluster structure from the training data. In this paper, we present a new approach for CMTL, called flexible clustered multi-task (FCMTL), in which the cluster structure is learned by identifying representative tasks. The new approach allows an arbitrary task to be described by multiple representative tasks, effectively soft-assigning a task to multiple clusters with different weights. Unlike existing counterpart, the proposed approach is more flexible in that (a) it does not require clusters to be disjoint, (b) tasks within one particular cluster do not have to share information to the same extent, and (c) the number of clusters is automatically inferred from data. Computationally, the proposed approach is formulated as a row-sparsity pursuit problem. We validate the proposed FCMTL on both synthetic and real-world data sets, and empirical results demonstrate that it outperforms many existing MTL methods.
Collapse
|
8
|
|
9
|
Bogdan M, van den Berg E, Sabatti C, Su W, Candès EJ. SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. Ann Appl Stat 2015; 9:1103-1140. [PMID: 26709357 DOI: 10.1214/15-aoas842] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to [Formula: see text]where λ1 ≥ λ2 ≥ … ≥ λ p ≥ 0 and [Formula: see text] are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical ℓ1 procedures such as the Lasso. Here, the regularizer is a sorted ℓ1 norm, which penalizes the regression coefficients according to their rank: the higher the rank-that is, stronger the signal-the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B57 (1995) 289-300] procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λ i } is given by the BH critical values [Formula: see text], where q ∈ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.
Collapse
Affiliation(s)
- Małgorzata Bogdan
- Department of Mathematics, Wrocław University of Technology, 50-370 Wrocław, Poland
| | - Ewout van den Berg
- Human Language Technologies, IBM T.J. Watson Research Center, Yorktown Heights, New York 10598, USA
| | - Chiara Sabatti
- Department of Health Research and Policy, Division of Biostatistics, Stanford University, HRP Redwood Building, Stanford, California 94305, USA
| | - Weijie Su
- Department of Statistics, Stanford University, 90 Serra Mall, Sequoia Hall, Stanford, California 94305, USA
| | - Emmanuel J Candès
- Department of Statistics, Stanford University, 390 Serra Mall, Sequoia Hall, Stanford, California 94305, USA
| |
Collapse
|
10
|
|
11
|
|
12
|
Ng MK. Dictionary learning-based subspace structure identification in spectral clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:1188-1199. [PMID: 24808560 DOI: 10.1109/tnnls.2013.2253123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we study dictionary learning (DL) approach to identify the representation of low-dimensional subspaces from high-dimensional and nonnegative data. Such representation can be used to provide an affinity matrix among different subspaces for data clustering. The main contribution of this paper is to consider both nonnegativity and sparsity constraints together in DL such that data can be represented effectively by nonnegative and sparse coding coefficients and nonnegative dictionary bases. In the algorithm, we employ the proximal point technique for the resulting DL and sparsity optimization problem. We make use of coding coefficients to perform spectral clustering (SC) for data partitioning. Extensive experiments on real-world high-dimensional and nonnegative data sets, including text, microarray, and image data demonstrate that the proposed method can discover their subspace structures. Experimental results also show that our algorithm is computationally efficient and effective for obtaining high SC performance and interpreting the clustering results compared with the other testing methods.
Collapse
|
13
|
Kaban A. Fractional norm regularization: learning with very few relevant features. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:953-963. [PMID: 24808476 DOI: 10.1109/tnnls.2013.2247417] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Learning in the presence of a large number of irrelevant features is an important problem in high-dimensional tasks. Previous studies have shown that L1-norm regularization can be effective in such cases while L2-norm regularization is not. Furthermore, work in compressed sensing suggests that regularization by nonconvex (e.g., fractional) semi-norms may outperform L1-regularization. However, for classification it is largely unclear when this may or may not be the case. In addition, the nonconvex problem is harder to solve than the convex L1 problem. In this paper, we provide a more in-depth analysis to elucidate the potential advantages and pitfalls of nonconvex regularization in the context of logistic regression where the regularization term employs the family of Lq semi-norms. First, using results from the phenomenon of concentration of norms and distances in high dimensions, we gain intuition about the working of sparse estimation when the dimensionality is very high. Second, using the probably approximately correct (PAC)-Bayes methodology, we give a data-dependent bound on the generalization error of Lq-regularized logistic regression, which is applicable to any algorithm that implements this model, and may be used to predict its generalization behavior from the training set alone. Third, we demonstrate the usefulness of our approach by experiments and applications, where the PAC-Bayes bound is used to guide the choice of semi-norm in the regularization term. The results support the conclusion that the optimal choice of regularization depends on the relative fraction of relevant versus irrelevant features, and a fractional norm with a small exponent is most suitable when the fraction of relevant features is very small.
Collapse
|
14
|
He R, Zheng WS, Hu BG, Kong XW. Two-stage nonnegative sparse representation for large-scale face recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:35-46. [PMID: 24808205 DOI: 10.1109/tnnls.2012.2226471] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper proposes a novel nonnegative sparse representation approach, called two-stage sparse representation (TSR), for robust face recognition on a large-scale database. Based on the divide and conquer strategy, TSR decomposes the procedure of robust face recognition into outlier detection stage and recognition stage. In the first stage, we propose a general multisubspace framework to learn a robust metric in which noise and outliers in image pixels are detected. Potential loss functions, including L1 , L2,1, and correntropy are studied. In the second stage, based on the learned metric and collaborative representation, we propose an efficient nonnegative sparse representation algorithm to find an approximation solution of sparse representation. According to the L1 ball theory in sparse representation, the approximated solution is unique and can be optimized efficiently. Then a filtering strategy is developed to avoid the computation of the sparse representation on the whole large-scale dataset. Moreover, theoretical analysis also gives the necessary condition for nonnegative least squares technique to find a sparse solution. Extensive experiments on several public databases have demonstrated that the proposed TSR approach, in general, achieves better classification accuracy than the state-of-the-art sparse representation methods. More importantly, a significant reduction of computational costs is reached in comparison with sparse representation classifier; this enables the TSR to be more suitable for robust face recognition on a large-scale dataset.
Collapse
|
15
|
Xiang S, Nie F, Meng G, Pan C, Zhang C. Discriminative least squares regression for multiclass classification and feature selection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1738-54. [PMID: 24808069 DOI: 10.1109/tnnls.2012.2212721] [Citation(s) in RCA: 163] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.
Collapse
|
16
|
Zhong LW, Kwok JT. Efficient sparse modeling with automatic feature grouping. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1436-1447. [PMID: 24807927 DOI: 10.1109/tnnls.2012.2200262] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
For high-dimensional data, it is often desirable to group similar features together during the learning process. This can reduce the estimation variance and improve the stability of feature selection, leading to better generalization. Moreover, it can also help in understanding and interpreting data. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is a recent sparse-modeling approach that uses a l1 -regularizer and a pairwise l∞-regularizer on the feature coefficients to encourage such feature grouping. However, computationally, its optimization procedure is very expensive. In this paper, we propose an efficient solver based on the accelerated gradient method. We show that its key proximal step can be solved by a highly efficient simple iterative group merging algorithm. Given d input features, this reduces the empirical time complexity from O(d(2) ~ d(5)) for the existing solvers to just O(d). Experimental results on a number of toy and real-world datasets demonstrate that OSCAR is a competitive sparse-modeling approach, but with the added ability of automatic feature grouping.
Collapse
|
17
|
Ye J, Liu J. Sparse Methods for Biomedical Data. SIGKDD EXPLORATIONS : NEWSLETTER OF THE SPECIAL INTEREST GROUP (SIG) ON KNOWLEDGE DISCOVERY & DATA MINING 2012; 14:4-15. [PMID: 24076585 PMCID: PMC3783968 DOI: 10.1145/2408736.2408739] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Following recent technological revolutions, the investigation of massive biomedical data with growing scale, diversity, and complexity has taken a center stage in modern data analysis. Although complex, the underlying representations of many biomedical data are often sparse. For example, for a certain disease such as leukemia, even though humans have tens of thousands of genes, only a few genes are relevant to the disease; a gene network is sparse since a regulatory pathway involves only a small number of genes; many biomedical signals are sparse or compressible in the sense that they have concise representations when expressed in a proper basis. Therefore, finding sparse representations is fundamentally important for scientific discovery. Sparse methods based on the [Formula: see text] norm have attracted a great amount of research efforts in the past decade due to its sparsity-inducing property, convenient convexity, and strong theoretical guarantees. They have achieved great success in various applications such as biomarker selection, biological network construction, and magnetic resonance imaging. In this paper, we review state-of-the-art sparse methods and their applications to biomedical data.
Collapse
Affiliation(s)
- Jieping Ye
- Arizona State University Tempe, AZ 85287
| | - Jun Liu
- Siemens Corporate Research Princeton, NJ 08540
| |
Collapse
|
18
|
Yang S, Yuan L, Lai YC, Shen X, Wonka P, Ye J. Feature Grouping and Selection Over an Undirected Graph. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2012:922-930. [PMID: 24014201 PMCID: PMC3763852 DOI: 10.1145/2339530.2339675] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has recently been proposed to facilitate feature selection and graph structure exploitation, when features exhibit certain graph structures. However, the formulation in GFlasso relies on pairwise sample correlations to perform feature grouping, which could introduce additional estimation bias. In this paper, we propose three new feature grouping and selection methods to resolve this issue. The first method employs a convex function to penalize the pairwise l∞ norm of connected regression/classification coefficients, achieving simultaneous feature grouping and selection. The second method improves the first one by utilizing a non-convex function to reduce the estimation bias. The third one is the extension of the second method using a truncated l1 regularization to further reduce the estimation bias. The proposed methods combine feature grouping and feature selection to enhance estimation accuracy. We employ the alternating direction method of multipliers (ADMM) and difference of convex functions (DC) programming to solve the proposed formulations. Our experimental results on synthetic data and two real datasets demonstrate the effectiveness of the proposed methods.
Collapse
Affiliation(s)
- Sen Yang
- Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
| | - Lei Yuan
- Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | - Ying-Cheng Lai
- Electrical Engineering, Arizona State University, Tempe, AZ 85287, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Peter Wonka
- Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
| | - Jieping Ye
- Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|