1
|
Zhang X, Zheng J, Wang D, Tang G, Zhou Z, Lin Z. Structured Sparsity Optimization With Non-Convex Surrogates of l 2,0-Norm: A Unified Algorithmic Framework. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6386-6402. [PMID: 36219668 DOI: 10.1109/tpami.2022.3213716] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, we present a general optimization framework that leverages structured sparsity to achieve superior recovery results. The traditional method for solving the structured sparse objectives based on l2,0-norm is to use the l2,1-norm as a convex surrogate. However, such an approximation often yields a large performance gap. To tackle this issue, we first provide a framework that allows for a wide range of surrogate functions (including non-convex surrogates), which exhibits better performance in harnessing structured sparsity. Moreover, we develop a fixed point algorithm that solves a key underlying non-convex structured sparse recovery optimization problem to global optimality with a guaranteed super-linear convergence rate. Building on this, we consider three specific applications, i.e., outlier pursuit, supervised feature selection, and structured dictionary learning, which can benefit from the proposed structured sparsity optimization framework. In each application, how the optimization problem can be formulated and thus be relaxed under a generic surrogate function is explained in detail. We conduct extensive experiments on both synthetic and real-world data and demonstrate the effectiveness and efficiency of the proposed framework.
Collapse
|
2
|
Liu R, Gao J, Zhang J, Meng D, Lin Z. Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10045-10067. [PMID: 34871167 DOI: 10.1109/tpami.2021.3132674] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. A list of important papers discussed in this survey, corresponding codes, and additional resources on BLOs are publicly available at: https://github.com/vis-opt-group/BLO.
Collapse
|
3
|
Zhan S, Sun W, Du C, Zhong W. Diversity-promoting multi-view graph learning for semi-supervised classification. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01370-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
4
|
Hong D, Hu J, Yao J, Chanussot J, Zhu XX. Multimodal remote sensing benchmark datasets for land cover classification with a shared and specific feature learning model. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING : OFFICIAL PUBLICATION OF THE INTERNATIONAL SOCIETY FOR PHOTOGRAMMETRY AND REMOTE SENSING (ISPRS) 2021; 178:68-80. [PMID: 34433999 PMCID: PMC8336649 DOI: 10.1016/j.isprsjprs.2021.05.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 05/13/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
As remote sensing (RS) data obtained from different sensors become available largely and openly, multimodal data processing and analysis techniques have been garnering increasing interest in the RS and geoscience community. However, due to the gap between different modalities in terms of imaging sensors, resolutions, and contents, embedding their complementary information into a consistent, compact, accurate, and discriminative representation, to a great extent, remains challenging. To this end, we propose a shared and specific feature learning (S2FL) model. S2FL is capable of decomposing multimodal RS data into modality-shared and modality-specific components, enabling the information blending of multi-modalities more effectively, particularly for heterogeneous data sources. Moreover, to better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 - hyperspectral and multispectral data, Berlin - hyperspectral and synthetic aperture radar (SAR) data, Augsburg - hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification. Extensive experiments conducted on the three datasets demonstrate the superiority and advancement of our S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines. Furthermore, the baseline codes and datasets used in this paper will be made available freely at https://github.com/danfenghong/ISPRS_S2FL.
Collapse
Affiliation(s)
- Danfeng Hong
- Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany
| | - Jingliang Hu
- Data Science in Earth Observation, Technical University of Munich, 80333 Munich, Germany
| | - Jing Yao
- Aerospace Information Research Institute, Chinese Academy of Sciences, 100094 Beijing, China
| | - Jocelyn Chanussot
- Aerospace Information Research Institute, Chinese Academy of Sciences, 100094 Beijing, China
- Univ. Grenoble Alpes, INRIA, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Xiao Xiang Zhu
- Remote Sensing Technology Institute, German Aerospace Center, 82234 Wessling, Germany
- Data Science in Earth Observation, Technical University of Munich, 80333 Munich, Germany
| |
Collapse
|
5
|
Zhou P, Lu C, Feng J, Lin Z, Yan S. Tensor Low-Rank Representation for Data Recovery and Clustering. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1718-1732. [PMID: 31751228 DOI: 10.1109/tpami.2019.2954874] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multi-way or tensor data analysis has attracted increasing attention recently, with many important applications in practice. This article develops a tensor low-rank representation (TLRR) method, which is the first approach that can exactly recover the clean data of intrinsic low-rank structure and accurately cluster them as well, with provable performance guarantees. In particular, for tensor data with arbitrary sparse corruptions, TLRR can exactly recover the clean data under mild conditions; meanwhile TLRR can exactly verify their true origin tensor subspaces and hence cluster them accurately. TLRR objective function can be optimized via efficient convex programing with convergence guarantees. Besides, we provide two simple yet effective dictionary construction methods, the simple TLRR (S-TLRR) and robust TLRR (R-TLRR), to handle slightly and severely corrupted data respectively. Experimental results on two computer vision data analysis tasks, image/video recovery and face clustering, clearly demonstrate the superior performance, efficiency and robustness of our developed method over state-of-the-arts including the popular LRR and SSC methods.
Collapse
|
6
|
Zhou P, Yuan XT, Yan S, Feng J. Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:459-472. [PMID: 31398110 DOI: 10.1109/tpami.2019.2933841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
First-order non-convex Riemannian optimization algorithms have gained recent popularity in structured machine learning problems including principal component analysis and low-rank matrix completion. The current paper presents an efficient Riemannian Stochastic Path Integrated Differential EstimatoR (R-SPIDER) algorithm to solve the finite-sum and online Riemannian non-convex minimization problems. At the core of R-SPIDER is a recursive semi-stochastic gradient estimator that can accurately estimate Riemannian gradient under not only exponential mapping and parallel transport, but also general retraction and vector transport operations. Compared with prior Riemannian algorithms, such a recursive gradient estimation mechanism endows R-SPIDER with lower computational cost in first-order oracle complexity. Specifically, for finite-sum problems with n components, R-SPIDER is proved to converge to an ϵ-approximate stationary point within [Formula: see text] stochastic gradient evaluations, beating the best-known complexity [Formula: see text]; for online optimization, R-SPIDER is shown to converge with [Formula: see text] complexity which is, to the best of our knowledge, the first non-asymptotic result for online Riemannian optimization. For the special case of gradient dominated functions, we further develop a variant of R-SPIDER with improved linear rate of convergence. Extensive experimental results demonstrate the advantage of the proposed algorithms over the state-of-the-art Riemannian non-convex optimization methods.
Collapse
|
7
|
Wei X, Shen H, Kleinsteuber M. Trace Quotient with Sparsity Priors for Learning Low Dimensional Image Representations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:3119-3135. [PMID: 31180888 DOI: 10.1109/tpami.2019.2921031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This work studies the problem of learning appropriate low dimensional image representations. We propose a generic algorithmic framework, which leverages two classic representation learning paradigms, i.e., sparse representation and the trace quotient criterion, to disentangle underlying factors of variation in high dimensional images. Specifically, we aim to learn simple representations of low dimensional, discriminant factors by applying the trace quotient criterion to well-engineered sparse representations. We construct a unified cost function, coined as the SPARse LOW dimensional representation (SparLow) function, for jointly learning both a sparsifying dictionary and a dimensionality reduction transformation. The SparLow function is widely applicable for developing various algorithms in three classic machine learning scenarios, namely, unsupervised, supervised, and semi-supervised learning. In order to develop efficient joint learning algorithms for maximizing the SparLow function, we deploy a framework of sparse coding with appropriate convex priors to ensure the sparse representations to be locally differentiable. Moreover, we develop an efficient geometric conjugate gradient algorithm to maximize the SparLow function on its underlying Riemannian manifold. Performance of the proposed SparLow algorithmic framework is investigated on several image processing tasks, such as 3D data visualization, face/digit recognition, and object/scene categorization.
Collapse
|
8
|
Hong D, Yokoya N, Chanussot J, Xu J, Zhu XX. Learning to propagate labels on graphs: An iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING : OFFICIAL PUBLICATION OF THE INTERNATIONAL SOCIETY FOR PHOTOGRAMMETRY AND REMOTE SENSING (ISPRS) 2019; 158:35-49. [PMID: 31853165 PMCID: PMC6894308 DOI: 10.1016/j.isprsjprs.2019.09.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 09/06/2019] [Accepted: 09/12/2019] [Indexed: 05/30/2023]
Abstract
Hyperspectral dimensionality reduction (HDR), an important preprocessing step prior to high-level data analysis, has been garnering growing attention in the remote sensing community. Although a variety of methods, both unsupervised and supervised models, have been proposed for this task, yet the discriminative ability in feature representation still remains limited due to the lack of a powerful tool that effectively exploits the labeled and unlabeled data in the HDR process. A semi-supervised HDR approach, called iterative multitask regression (IMR), is proposed in this paper to address this need. IMR aims at learning a low-dimensional subspace by jointly considering the labeled and unlabeled data, and also bridging the learned subspace with two regression tasks: labels and pseudo-labels initialized by a given classifier. More significantly, IMR dynamically propagates the labels on a learnable graph and progressively refines pseudo-labels, yielding a well-conditioned feedback system. Experiments conducted on three widely-used hyperspectral image datasets demonstrate that the dimension-reduced features learned by the proposed IMR framework with respect to classification or recognition accuracy are superior to those of related state-of-the-art HDR approaches.
Collapse
Affiliation(s)
- Danfeng Hong
- Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, Germany
- Signal Processing in Earth Observation (SiPEO), Technical University of Munich (TUM), Munich, Germany
| | - Naoto Yokoya
- Geoinformatics Unit, RIKEN Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo, Japan
| | | | - Jian Xu
- Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, Germany
| | - Xiao Xiang Zhu
- Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, Germany
- Signal Processing in Earth Observation (SiPEO), Technical University of Munich (TUM), Munich, Germany
| |
Collapse
|
9
|
Hong D, Yokoya N, Ge N, Chanussot J, Zhu XX. Learnable manifold alignment (LeMA): A semi-supervised cross-modality learning framework for land cover and land use classification. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING : OFFICIAL PUBLICATION OF THE INTERNATIONAL SOCIETY FOR PHOTOGRAMMETRY AND REMOTE SENSING (ISPRS) 2019; 147:193-205. [PMID: 30774220 PMCID: PMC6360532 DOI: 10.1016/j.isprsjprs.2018.10.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 08/08/2018] [Accepted: 10/16/2018] [Indexed: 05/30/2023]
Abstract
In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community-can a limited amount of highly-discriminative (e.g., hyperspectral) training data improve the performance of a classification task using a large amount of poorly-discriminative (e.g., multispectral) data? Traditional semi-supervised manifold alignment methods do not perform sufficiently well for such problems, since the hyperspectral data is very expensive to be largely collected in a trade-off between time and efficiency, compared to the multispectral data. To this end, we propose a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA). LeMA learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function. With the learned graph, we can further capture the data distribution by graph-based label propagation, which enables finding a more accurate decision boundary. Additionally, an optimization strategy based on the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. Extensive experiments on two hyperspectral-multispectral datasets demonstrate the superiority and effectiveness of the proposed method in comparison with several state-of-the-art methods.
Collapse
Affiliation(s)
- Danfeng Hong
- Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, Germany
- Signal Processing in Earth Observation (SiPEO), Technical University of Munich (TUM), Munich, Germany
| | - Naoto Yokoya
- Geoinformatics Unit, RIKEN Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo, Japan
| | - Nan Ge
- Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, Germany
| | - Jocelyn Chanussot
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
| | - Xiao Xiang Zhu
- Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), Wessling, Germany
- Signal Processing in Earth Observation (SiPEO), Technical University of Munich (TUM), Munich, Germany
| |
Collapse
|
10
|
Song J, Xie X, Shi G, Dong W. Exploiting class-wise coding coefficients: Learning a discriminative dictionary for pattern classification. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.09.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Hong D, Yokoya N, Chanussot J, Zhu XX. An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:1923-1938. [PMID: 30418901 DOI: 10.1109/tip.2018.2878958] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Hyperspectral imagery collected from airborne or satellite sources inevitably suffers from spectral variability, making it difficult for spectral unmixing to accurately estimate abundance maps. The classical unmixing model, the linear mixing model (LMM), generally fails to handle this sticky issue effectively. To this end, we propose a novel spectral mixture model, called the augmented linear mixing model (ALMM), to address spectral variability by applying a data-driven learning strategy in inverse problems of hyperspectral unmixing. The proposed approach models the main spectral variability (i.e., scaling factors) generated by variations in illumination or typography separately by means of the endmember dictionary. It then models other spectral variabilities caused by environmental conditions (e.g., local temperature and humidity, atmospheric effects) and instrumental configurations (e.g., sensor noise), as well as material nonlinear mixing effects, by introducing a spectral variability dictionary. To effectively run the data-driven learning strategy, we also propose a reasonable prior knowledge for the spectral variability dictionary, whose atoms are assumed to be low-coherent with spectral signatures of endmembers, which leads to a well-known low-coherence dictionary learning problem. Thus, a dictionary learning technique is embedded in the framework of spectral unmixing so that the algorithm can learn the spectral variability dictionary and estimate the abundance maps simultaneously. Extensive experiments on synthetic and real datasets are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.
Collapse
|
12
|
Zhang X, Sun J, Ma S, Lin Z, Zhang J, Wang S, Gao W. Globally Variance-Constrained Sparse Representation and Its Application in Image Set Coding. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3753-3765. [PMID: 29698207 DOI: 10.1109/tip.2018.2823546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sparse representation leads to an efficient way to approximately recover a signal by the linear composition of a few bases from a learnt dictionary based on which various successful applications have been achieved. However, in the scenario of data compression, its efficiency and popularity are hindered. It is because of the fact that encoding sparsely distributed coefficients may consume more bits for representing the index of nonzero coefficients. Therefore, introducing an accurate rate constraint in sparse coding and dictionary learning becomes meaningful, which has not been fully exploited in the context of sparse representation. According to the Shannon entropy inequality, the variance of Gaussian distributed data bound its entropy, indicating the actual bitrate can be well estimated by its variance. Hence, a globally variance-constrained sparse representation (GVCSR) model is proposed in this paper, where a variance-constrained rate term is introduced to the optimization process. Specifically, we employ the alternating direction method of multipliers (ADMMs) to solve the non-convex optimization problem for sparse coding and dictionary learning, both of them have shown the state-of-the-art rate-distortion performance for image representation. Furthermore, we investigate the potential of applying the GVCSR algorithm in the practical image set compression, where the optimized dictionary is trained to efficiently represent the images captured in similar scenarios by implicitly utilizing inter-image correlations. Experimental results have demonstrated superior rate-distortion performance against the state-of-the-art methods.
Collapse
|
13
|
Zhou P, Lu C, Lin Z, Zhang C. Tensor Factorization for Low-Rank Tensor Completion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1152-1163. [PMID: 29028199 DOI: 10.1109/tip.2017.2762595] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recently, a tensor nuclear norm (TNN) based method was proposed to solve the tensor completion problem, which has achieved state-of-the-art performance on image and video inpainting tasks. However, it requires computing tensor singular value decomposition (t-SVD), which costs much computation and thus cannot efficiently handle tensor data, due to its natural large scale. Motivated by TNN, we propose a novel low-rank tensor factorization method for efficiently solving the 3-way tensor completion problem. Our method preserves the low-rank structure of a tensor by factorizing it into the product of two tensors of smaller sizes. In the optimization process, our method only needs to update two smaller tensors, which can be more efficiently conducted than computing t-SVD. Furthermore, we prove that the proposed alternating minimization algorithm can converge to a Karush-Kuhn-Tucker point. Experimental results on the synthetic data recovery, image and video inpainting tasks clearly demonstrate the superior performance and efficiency of our developed method over state-of-the-arts including the TNN and matricization methods.
Collapse
|
14
|
Li J, Chang H, Yang J, Luo W, Fu Y. Visual Representation and Classification by Learning Group Sparse Deep Stacking Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:464-476. [PMID: 29989968 DOI: 10.1109/tip.2017.2765833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Deep stacking networks (DSNs) have been successfully applied in classification tasks. Its architecture builds upon blocks of simplified neural network modules (SNNM). The hidden units are assumed to be independent in the SNNM module. However, this assumption prevents SNNM from learning the local dependencies between hidden units to better capture the information in the input data for the classification task. In addition, the hidden representations of input data in each class can be expectantly split into a group in real-world classification applications. Therefore, we propose two kinds of group sparse SNNM modules by mixing -norm and -norm. The first module learns the local dependencies among hidden units by dividing them into non-overlapping groups. The second module splits the representations of samples in different classes into separate groups to cluster the samples in each class. A group sparse DSN (GS-DSN) is constructed by stacking the group sparse SNNM modules. Experimental results further verify that our GS-DSN model outperforms the relevant classification methods. Particularly, GS-DSN achieves the state-of-the-art performance (99.1%) on 15-Scene.
Collapse
|