1
|
Yuan L, Wang T, Zhang X, Tay FEH, Jie Z, Tian Y, Liu W, Feng J. Learnable Central Similarity Quantization for Efficient Image and Video Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18717-18730. [PMID: 38090871 DOI: 10.1109/tnnls.2023.3321148] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Data-dependent hashing methods aim to learn hash functions from the pairwise or triplet relationships among the data, which often lead to low efficiency and low collision rate by only capturing the local distribution of the data. To solve the limitation, we propose central similarity, in which the hash codes of similar data pairs are encouraged to approach a common center and those of dissimilar pairs to converge to different centers. As a new global similarity metric, central similarity can improve the efficiency and retrieval accuracy of hash learning. By introducing a new concept, hash centers, we principally formulate the computation of the proposed central similarity metric, in which the hash centers refer to a set of points scattered in the Hamming space with a sufficient mutual distance between each other. To construct well-separated hash centers, we provide two efficient methods: 1) leveraging the Hadamard matrix and Bernoulli distributions to generate data-independent hash centers and 2) learning data-dependent hash centers from data representations. Based on the proposed similarity metric and hash centers, we propose central similarity quantization (CSQ) that optimizes the central similarity between data points with respect to their hash centers instead of optimizing the local similarity to generate a high-quality deep hash function. We also further improve the CSQ with data-dependent hash centers, dubbed as CSQ with learnable center (CSQLC). The proposed CSQ and CSQLC are generic and applicable to image and video hashing scenarios. We conduct extensive experiments on large-scale image and video retrieval tasks, and the proposed CSQ yields noticeably boosted retrieval performance, i.e., 3%-20% in mean average precision (mAP) over the previous state-of-the-art methods, which also demonstrates that our methods can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs.
Collapse
|
2
|
Nie F, Zhang C, Wang Z, Wang R, Li X. Local Embedding Learning via Landmark-Based Dynamic Connections. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9481-9492. [PMID: 36107894 DOI: 10.1109/tnnls.2022.3203014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Linear discriminant analysis (LDA) is one of the most effective and popular methods to reduce the dimensionality of data with Gaussian assumption. However, LDA cannot handle non-Gaussian data because the center point is incompetent to represent the distribution of data. Some existing methods based on graph embedding focus on exploring local structures via pairwise relationships of data for addressing the non-Gaussian issue. Due to massive pairwise relationships, the computational complexity is high as well as the locally optimal solution is hard to find. To address these issues, we propose a novel and efficient local embedding learning via landmark-based dynamic connections (LDC) in which we leverage several landmarks to represent different subclusters in the same class and establish the connections between each point and landmark. Furthermore, in order to explore the relationship of landmarks pairwise more precisely, the relationship between each point and their corresponding neighbor landmarks are found in the optimal subspace, rather than the original space, which can avoid the negative influence of the noises. We also propose an efficient iterative algorithm to deal with the proposed ratio minimization problem. Extensive experiments conducted on several real-world datasets have demonstrated the advantages of the proposed method.
Collapse
|
3
|
Yousefian A, Shayegh F, Maleki Z. Detection of autism spectrum disorder using graph representation learning algorithms and deep neural network, based on fMRI signals. Front Syst Neurosci 2023; 16:904770. [PMID: 36817947 PMCID: PMC9932324 DOI: 10.3389/fnsys.2022.904770] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 12/28/2022] [Indexed: 02/05/2023] Open
Abstract
Introduction Can we apply graph representation learning algorithms to identify autism spectrum disorder (ASD) patients within a large brain imaging dataset? ASD is mainly identified by brain functional connectivity patterns. Attempts to unveil the common neural patterns emerged in ASD are the essence of ASD classification. We claim that graph representation learning methods can appropriately extract the connectivity patterns of the brain, in such a way that the method can be generalized to every recording condition, and phenotypical information of subjects. These methods can capture the whole structure of the brain, both local and global properties. Methods The investigation is done for the worldwide brain imaging multi-site database known as ABIDE I and II (Autism Brain Imaging Data Exchange). Among different graph representation techniques, we used AWE, Node2vec, Struct2vec, multi node2vec, and Graph2Img. The best approach was Graph2Img, in which after extracting the feature vectors representative of the brain nodes, the PCA algorithm is applied to the matrix of feature vectors. The classifier adapted to the features embedded in graphs is an LeNet deep neural network. Results and discussion Although we could not outperform the previous accuracy of 10-fold cross-validation in the identification of ASD versus control patients in this dataset, for leave-one-site-out cross-validation, we could obtain better results (our accuracy: 80%). The result is that graph embedding methods can prepare the connectivity matrix more suitable for applying to a deep network.
Collapse
Affiliation(s)
| | - Farzaneh Shayegh
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | | |
Collapse
|
4
|
Wan M, Chen X, Zhao C, Zhan T, Yang G. A new weakly supervised discrete discriminant hashing for robust data representation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
Qin J, Fei L, Zhang Z, Wen J, Xu Y, Zhang D. Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5343-5358. [PMID: 35925845 DOI: 10.1109/tip.2022.3195059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the dramatic increase in the amount of multimedia data, cross-modal similarity retrieval has become one of the most popular yet challenging problems. Hashing offers a promising solution for large-scale cross-modal data searching by embedding the high-dimensional data into the low-dimensional similarity preserving Hamming space. However, most existing cross-modal hashing usually seeks a semantic representation shared by multiple modalities, which cannot fully preserve and fuse the discriminative modal-specific features and heterogeneous similarity for cross-modal similarity searching. In this paper, we propose a joint specifics and consistency hash learning method for cross-modal retrieval. Specifically, we introduce an asymmetric learning framework to fully exploit the label information for discriminative hash code learning, where 1) each individual modality can be better converted into a meaningful subspace with specific information, 2) multiple subspaces are semantically connected to capture consistent information, and 3) the integration complexity of different subspaces is overcome so that the learned collaborative binary codes can merge the specifics with consistency. Then, we introduce an alternatively iterative optimization to tackle the specifics and consistency hashing learning problem, making it scalable for large-scale cross-modal retrieval. Extensive experiments on five widely used benchmark databases clearly demonstrate the effectiveness and efficiency of our proposed method on both one-cross-one and one-cross-two retrieval tasks.
Collapse
|
6
|
Shi Y, Nie X, Liu X, Zou L, Yin Y. Supervised Adaptive Similarity Matrix Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2755-2766. [PMID: 35320101 DOI: 10.1109/tip.2022.3158092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Compact hash codes can facilitate large-scale multimedia retrieval, significantly reducing storage and computation. Most hashing methods learn hash functions based on the data similarity matrix, which is predefined by supervised labels or a distance metric type. However, this predefined similarity matrix cannot accurately reflect the real similarity relationship among images, which results in poor retrieval performance of hashing methods, especially in multi-label datasets and zero-shot datasets that are highly dependent on similarity relationships. Toward this end, this study proposes a new supervised hashing method called supervised adaptive similarity matrix hashing (SASH) via feature-label space consistency. SASH not only learns the similarity matrix adaptively, but also extracts the label correlations by maintaining consistency between the feature and the label space. This correlation information is then used to optimize the similarity matrix. The experiments on three large normal benchmark datasets (including two multi-label datasets) and three large zero-shot benchmark datasets show that SASH has an excellent performance compared with several state-of-the-art techniques.
Collapse
|
7
|
Feng H, Wang N, Tang J. Deep Weibull hashing with maximum mean discrepancy quantization for image retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
8
|
Tian X, Ng WWY, Wang H. Concept Preserving Hashing for Semantic Image Retrieval With Concept Drift. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5184-5197. [PMID: 31841431 DOI: 10.1109/tcyb.2019.2955130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Current hashing-based image retrieval methods mostly assume that the database of images is static. However, this assumption is not true in cases where the databases are constantly updated (e.g., on the Internet) and there exists the problem of concept drift. The online (also known as incremental) hashing methods have been proposed recently for image retrieval where the database is not static. However, they have not considered the concept drift problem. Moreover, they update hash functions dynamically by generating new hash codes for all accumulated data over time which is clearly uneconomical. In order to solve these two problems, concept preserving hashing (CPH) is proposed. In contrast to the existing methods, CPH preserves the original concept, that is, the set of hash codes representing a concept is preserved over time, by learning a new set of hash functions to yield the same set of hash codes for images (old and new) of a concept. The objective function of CPH learning consists of three components: 1) isomorphic similarity; 2) hash codes partition balancing; and 3) heterogeneous similarity fitness. The experimental results on 11 concept drift scenarios show that CPH yields better retrieval precisions than the existing methods and does not need to update hash codes of previously stored images.
Collapse
|
9
|
|
10
|
|
11
|
Yu J, Wu XJ, Zhang D. Unsupervised Multi-modal Hashing for Cross-Modal Retrieval. Cognit Comput 2021. [DOI: 10.1007/s12559-021-09847-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
|
13
|
Lu X, Chen Y, Li X. Siamese Dilated Inception Hashing With Intra-Group Correlation Enhancement for Image Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3032-3046. [PMID: 31514159 DOI: 10.1109/tnnls.2019.2935118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
For large-scale image retrieval, hashing has been extensively explored in approximate nearest neighbor search methods due to its low storage and high computational efficiency. With the development of deep learning, deep hashing methods have made great progress in image retrieval. Most existing deep hashing methods cannot fully consider the intra-group correlation of hash codes, which leads to the correlation decrease problem of similar hash codes and ultimately affects the retrieval results. In this article, we propose an end-to-end siamese dilated inception hashing (SDIH) method that takes full advantage of multi-scale contextual information and category-level semantics to enhance the intra-group correlation of hash codes for hash codes learning. First, a novel siamese inception dilated network architecture is presented to generate hash codes with the intra-group correlation enhancement by exploiting multi-scale contextual information and category-level semantics simultaneously. Second, we propose a new regularized term, which can force the continuous values to approximate discrete values in hash codes learning and eventually reduces the discrepancy between the Hamming distance and the Euclidean distance. Finally, experimental results in five public data sets demonstrate that SDIH can outperform other state-of-the-art hashing algorithms.
Collapse
|
14
|
|
15
|
Lan X, Ye M, Zhang S, Zhou H, Yuen PC. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.10.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
16
|
Ng WWY, Tian X, Pedrycz W, Wang X, Yeung DS. Incremental Hash-Bit Learning for Semantic Image Retrieval in Nonstationary Environments. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3844-3858. [PMID: 29994699 DOI: 10.1109/tcyb.2018.2846760] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Images are uploaded to the Internet over time which makes concept drifting and distribution change in semantic classes unavoidable. Current hashing methods being trained using a given static database may not be suitable for nonstationary semantic image retrieval problems. Moreover, directly retraining a whole hash table to update knowledge coming from new arriving image data may not be efficient. Therefore, this paper proposes a new incremental hash-bit learning method. At the arrival of new data, hash bits are selected from both existing and newly trained hash bits by an iterative maximization of a 3-component objective function. This objective function is also used to weight selected hash bits to re-rank retrieved images for better semantic image retrieval results. The three components evaluate a hash bit in three different angles: 1) information preservation; 2) partition balancing; and 3) bit angular difference. The proposed method combines knowledge retained from previously trained hash bits and new semantic knowledge learned from the new data by training new hash bits. In comparison to table-based incremental hashing, the proposed method automatically adjusts the number of bits from old data and new data according to the concept drifting in the given data via the maximization of the objective function. Experimental results show that the proposed method outperforms existing stationary hashing methods, table-based incremental hashing, and online hashing methods in 15 different simulated nonstationary data environments.
Collapse
|
17
|
Li J, Ng WWY, Tian X, Kwong S, Wang H. Weighted multi-deep ranking supervised hashing for efficient image retrieval. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01026-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
18
|
Triplet Deep Hashing with Joint Supervised Loss Based on Deep Neural Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2019; 2019:8490364. [PMID: 31687007 PMCID: PMC6811991 DOI: 10.1155/2019/8490364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 07/08/2019] [Accepted: 07/14/2019] [Indexed: 11/18/2022]
Abstract
In recent years, with the explosion of multimedia data from search engines, social media, and e-commerce platforms, there is an urgent need for fast retrieval methods for massive big data. Hashing is widely used in large-scale and high-dimensional data search because of its low storage cost and fast query speed. Thanks to the great success of deep learning in many fields, the deep learning method has been introduced into hashing retrieval, and it uses a deep neural network to learn image features and hash codes simultaneously. Compared with the traditional hashing methods, it has better performance. However, existing deep hashing methods have some limitations; for example, most methods consider only one kind of supervised loss, which leads to insufficient utilization of supervised information. To address this issue, we proposed a triplet deep hashing method with joint supervised loss based on the convolutional neural network (JLTDH) in this work. The proposed method JLTDH combines triplet likelihood loss and linear classification loss; moreover, the triplet supervised label is adopted, which contains richer supervised information than that of the pointwise and pairwise labels. At the same time, in order to overcome the cubic increase in the number of triplets and make triplet training more effective, we adopt a novel triplet selection method. The whole process is divided into two stages: In the first stage, taking the triplets generated by the triplet selection method as the input of the CNN, the three CNNs with shared weights are used for image feature learning, and the last layer of the network outputs a preliminary hash code. In the second stage, relying on the hash code of the first stage and the joint loss function, the neural network model is further optimized so that the generated hash code has higher query precision. We perform extensive experiments on the three public benchmark datasets CIFAR-10, NUS-WIDE, and MS-COCO. Experimental results demonstrate that the proposed method outperforms the compared methods, and the method is also superior to all previous deep hashing methods based on the triplet label.
Collapse
|
19
|
Ding G, Guo Y, Chen K, Chu C, Han J, Dai Q. DECODE: Deep Confidence Network for Robust Image Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3752-3765. [PMID: 30835225 DOI: 10.1109/tip.2019.2902115] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Recent years have witnessed the success of deep convolutional neural networks for image classification and many related tasks. It should be pointed out that the existing training strategies assume that there is a clean dataset for model learning. In elaborately constructed benchmark datasets, deep network has yielded promising performance under the assumption. However, in real-world applications, it is burdensome and expensive to collect sufficient clean training samples. On the other hand, collecting noisy labeled samples is very economical and practical, especially with the rapidly increasing amount of visual data in the web. Unfortunately, the accuracy of current deep models may drop dramatically even with 5%-10% label noise. Therefore, enabling label noise resistant classification has become a crucial issue in the data driven deep learning approaches. In this paper, we propose a DEep COnfiDEnce network (DECODE) to address this issue. In particular, based on the distribution of mislabeled data, we adopt a confidence evaluation module that is able to determine the confidence that a sample is mislabeled. With the confidence, we further use a weighting strategy to assign different weights to different samples so that the model pays less attention to low confidence data, which is more likely to be noise. In this way, the deep model is more robust to label noise. DECODE is designed to be general, such that it can be easily combined with existing studies. We conduct extensive experiments on several datasets, and the results validate that DECODE can improve the accuracy of deep models trained with noisy data.
Collapse
|
20
|
Jiang QY, Li WJ. Discrete Latent Factor Model for Cross-Modal Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3490-3501. [PMID: 30735997 DOI: 10.1109/tip.2019.2897944] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to its storage and retrieval efficiency, cross-modal hashing (CMH) has been widely used for cross-modal similarity search in many multimedia applications. According to the training strategy, existing CMH methods can be mainly divided into two categories: relaxation-based continuous methods and discrete methods. In general, the training of relaxation-based continuous methods is faster than that of discrete methods, but the accuracy of relaxation-based continuous methods is not satisfactory. On the contrary, the accuracy of discrete methods is typically better than that of the relaxation-based continuous methods, but the training of discrete methods is very time-consuming. In this paper, we propose a novel CMH method, called Discrete Latent Factor model-based cross-modal Hashing (DLFH), for cross modal similarity search. DLFH is a discrete method which can directly learn the binary hash codes for CMH. At the same time, the training of DLFH is efficient. Experiments show that the DLFH can achieve significantly better accuracy than existing methods, and the training time of DLFH is comparable to that of the relaxation-based continuous methods which are much faster than the existing discrete methods.
Collapse
|
21
|
Zhang L, Yao Y, Lu Z, Shao L. Aesthetics-Guided Graph Clustering With Absent Modalities Imputation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3462-3476. [PMID: 30735995 DOI: 10.1109/tip.2019.2897940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurately clustering Internet-scale Internet users into multiple communities according to their aesthetic styles is a useful technique in image modeling and data mining. In this paper, we present a novel partially supervised model which seeks a sparse representation to capture photo aesthetics. It optimally fuzes multi-channel features, i.e., human gaze behavior, quality scores, and semantic tags, each of which could be absent. Afterward, by leveraging the KL-divergence to distinguish the aesthetic distributions between photo sets, a large-scale graph is constructed to describe the aesthetic correlations between users. Finally, a dense subgraph mining algorithm which intrinsically supports outliers (i.e., unique users not belong to any community) is adopted to detect aesthetic communities. The comprehensive experimental results on a million-scale image set grabbed from Flickr have demonstrated the superiority of our method. As a byproduct, the discovered aesthetic communities can enhance photo retargeting and video summarization substantially.
Collapse
|
22
|
Shen Y, Liu L, Shao L. Unsupervised Binary Representation Learning with Deep Variational Networks. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01166-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
23
|
Zheng J, Cao X, Zhang B, Zhen X, Su X. Deep Ensemble Machine for Video Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:553-565. [PMID: 29994406 DOI: 10.1109/tnnls.2018.2844464] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Video classification has been extensively researched in computer vision due to its wide spread applications. However, it remains an outstanding task because of the great challenges in effective spatial-temporal feature extraction and efficient classification with high-dimensional video representations. To address these challenges, in this paper, we propose an end-to-end learning framework called deep ensemble machine (DEM) for video classification. Specifically, to establish effective spatio-temporal features, we propose using two deep convolutional neural networks (CNNs), i.e., vision and graphics group and C3-D to extract heterogeneous spatial and temporal features for complementary representations. To achieve efficient classification, we propose ensemble learning based on random projections aiming to transform high-dimensional features into a set of lower dimensional compact features in subspaces; an ensemble of classifiers is trained on the subspaces and combined with a weighting layer during the backpropagation. To further enhance the performance, we introduce rectified linear encoding (RLE) inspired from error-correcting output coding to encode the initial outputs of classifiers, followed by a softmax layer to produce the final classification results. DEM combines the strengths of deep CNNs and ensemble learning, which establishes a new end-to-end learning architecture for more accurate and efficient video classification. We show the great effectiveness of DEM by extensive experiments on four data sets for diverse video classification tasks including action recognition and dynamic scene classification. Results have shown that DEM achieves high performance on all tasks with an improvement of up to 13% on CIFAR10 data set over the baseline model.
Collapse
|
24
|
|
25
|
Wang Y, Liang J, Cao D, Sun Z. Local Semantic-aware Deep Hashing with Hamming-isometric Quantization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:2665-2679. [PMID: 30582539 DOI: 10.1109/tip.2018.2889269] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Hashing has attracted increasing attention due to its tremendous potential for efficient image retrieval and data storage. Compared with conventional hashing methods with a handcrafted feature, emerging deep hashing approaches employ deep neural networks to learn feature representations as well as hash functions, which have already been proved to be more powerful and robust in real-world applications. Currently, most of the existing deep hashing methods construct pairwise or triplet-wise constraint to obtain similar binary codes between similar data pair or relative similar binary codes within a triplet. However, some critical local structures of the data are lack of exploiting, thus the effectiveness of hash learning is not fully shown. To address this limitation, we propose a novel deep hashing method named local semantic-aware deep hashing with Hamming-isometric quantization (LSDH), where local similarity of the data is intentionally integrated into hash learning. Specifically, in the Hamming space, we exploit the potential semantic relation of the data to robustly preserve their local similarity. In addition to reducing the error introduced by binary quantizing, we further develop a Hamming-isometric objective to maximize the consistency of similarity between the pairwise binary-like feature and its binary codes pair, which is shown to be able to enhance the quality of binary codes. Extensive experimental results on several benchmark datasets, including three singlelabel datasets (i.e., CIFAR-10, CIFAR-20, and SUN397) and one multi-label dataset (NUS-WIDE), demonstrate that the proposed LSDH achieves superior performance over the latest state-of-theart hashing methods.
Collapse
|
26
|
Jiang QY, Cui X, Li WJ. Deep Discrete Supervised Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:5996-6009. [PMID: 30106725 DOI: 10.1109/tip.2018.2864894] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Hashing has been widely used for large-scale search due to its low storage cost and fast query speed. By using supervised information, supervised hashing can significantly outperform unsupervised hashing. Recently, discrete supervised hashing and feature learning based deep hashing are two representative progresses in supervised hashing. On one hand, hashing is essentially a discrete optimization problem. Hence, utilizing supervised information to directly guide discrete (binary) coding procedure can avoid sub-optimal solution and improve the accuracy. On the other hand, feature learning based deep hashing, which integrates deep feature learning and hash-code learning into an end-to-end architecture, can enhance the feedback between feature learning and hash-code learning. The key in discrete supervised hashing is to adopt supervised information to directly guide the discrete coding procedure in hashing. The key in deep hashing is to adopt the supervised information to directly guide the deep feature learning procedure. However, most deep supervised hashing methods cannot use the supervised information to directly guide both discrete (binary) coding procedure and deep feature learning procedure in the same framework. In this paper, we propose a novel deep hashing method, called deep discrete supervised hashing (DDSH). DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure and thus enhance the feedback between these two important procedures. Experiments on four real datasets show that DDSH can outperform other state-of-the-art baselines, including both discrete hashing and deep hashing baselines, for image retrieval.
Collapse
|
27
|
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen HT. Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:3034-3044. [PMID: 29993420 DOI: 10.1109/tpami.2018.2789887] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recent vision and learning studies show that learning compact hash codes can facilitate massive data processing with significantly reduced storage and computation. Particularly, learning deep hash functions has greatly improved the retrieval performance, typically under the semantic supervision. In contrast, current unsupervised deep hashing algorithms can hardly achieve satisfactory performance due to either the relaxed optimization or absence of similarity-sensitive objective. In this work, we propose a simple yet effective unsupervised hashing framework, named Similarity-Adaptive Deep Hashing (SADH), which alternatingly proceeds over three training modules: deep hash model training, similarity graph updating and binary code optimization. The key difference from the widely-used two-step hashing method is that the output representations of the learned deep model help update the similarity graph matrix, which is then used to improve the subsequent code optimization. In addition, for producing high-quality binary codes, we devise an effective discrete optimization algorithm which can directly handle the binary constraints with a general hashing loss. Extensive experiments validate the efficacy of SADH, which consistently outperforms the state-of-the-arts by large gaps.
Collapse
|
28
|
Li X, Cui G, Dong Y. Discriminative and Orthogonal Subspace Constraints-Based Nonnegative Matrix Factorization. ACM T INTEL SYST TEC 2018. [DOI: 10.1145/3229051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Nonnegative matrix factorization (NMF) is one widely used feature extraction technology in the tasks of image clustering and image classification. For the former task, various unsupervised NMF methods based on the data distribution structure information have been proposed. While for the latter task, the label information of the dataset is one very important guiding. However, most previous proposed supervised NMF methods emphasis on imposing the discriminant constraints on the coefficient matrix. When dealing with new coming samples, the transpose or the pseudoinverse of the basis matrix is used to project these samples to the low dimension space. In this way, the label influence to the basis matrix is indirect. Although, there are also some methods trying to constrain the basis matrix in NMF framework, either they only restrict within-class samples or impose improper constraint on the basis matrix. To address these problems, in this article a novel NMF framework named discriminative and orthogonal subspace constraints-based nonnegative matrix factorization (DOSNMF) is proposed. In DOSNMF, the discriminative constraints are imposed on the projected subspace instead of the directly learned representation. In this manner, the discriminative information is directly connected with the projected subspace. At the same time, an orthogonal term is incorporated in DOSNMF to adjust the orthogonality of the learned basis matrix, which can ensure the orthogonality of the learned subspace and improve the sparseness of the basis matrix at the same time. This framework can be implemented in two ways. The first way is based on the manifold learning theory. In this way, two graphs, i.e., the intrinsic graph and the penalty graph, are constructed to capture the intra-class structure and the inter-class distinctness. With this design, both the manifold structure information and the discriminative information of the dataset are utilized. For convenience, we name this method as the name of the framework, i.e., DOSNMF. The second way is based on the Fisher’s criterion, we name it Fisher’s criterion-based DOSNMF (FDOSNMF). The objective functions of DOSNMF and FDOSNMF can be easily optimized using multiplicative update (MU) rules. The new methods are tested on five datasets and compared with several supervised and unsupervised variants of NMF. The experimental results reveal the effectiveness of the proposed methods.
Collapse
Affiliation(s)
- Xuelong Li
- Chinese Academy of Sciences, Shaanxi, P. R. China
| | - Guosheng Cui
- Chinese Academy of Sciences, Shaanxi, P. R. China
| | | |
Collapse
|
29
|
Wu G, Han J, Guo Y, Liu L, Ding G, Ni Q, Shao L. Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:1993-2007. [PMID: 30452370 DOI: 10.1109/tip.2018.2882155] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper proposes a deep hashing framework, namely Unsupervised Deep Video Hashing (UDVH), for largescale video similarity search with the aim to learn compact yet effective binary codes. Our UDVH produces the hash codes in a self-taught manner by jointly integrating discriminative video representation with optimal code learning, where an efficient alternating approach is adopted to optimize the objective function. The key differences from most existing video hashing methods lie in 1) UDVH is an unsupervised hashing method that generates hash codes by cooperatively utilizing feature clustering and a specifically-designed binarization with the original neighborhood structure preserved in the binary space; 2) a specific rotation is developed and applied onto video features such that the variance of each dimension can be balanced, thus facilitating the subsequent quantization step. Extensive experiments performed on three popular video datasets show that UDVH is overwhelmingly better than the state-of-the-arts in terms of various evaluation metrics, which makes it practical in real-world applications.
Collapse
|
30
|
Zhao X, Wang N, Zhang Y, Du S, Gao Y, Sun J. Beyond Pairwise Matching: Person Reidentification via High-Order Relevance Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3701-3714. [PMID: 28880193 DOI: 10.1109/tnnls.2017.2736640] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Person reidentification has attracted extensive research efforts in recent years. It is challenging due to the varied visual appearance from illumination, view angle, background, and possible occlusions, leading to the difficulties when measuring the relevance, i.e., similarities, between probe and gallery images. Existing methods mainly focus on pairwise distance metric learning for person reidentification. In practice, pairwise image matching may limit the data for comparison (just the probe and one gallery subject) and yet lead to suboptimal results. The correlation among gallery data can be also helpful for the person reidentification task. In this paper, we propose to investigate the high-order correlation among the probe and gallery data, not the pairwise matching, to jointly learn the relevance of gallery data to the probe. Recalling recent progresses on feature representation in person reidentification, it is difficult to select the best feature and each type of feature can benefit person description from different aspects. Under such circumstances, we propose a multihypergraph joint learning algorithm to learn the relevance in corporation with multiple features of the imaging data. More specifically, one hypergraph is constructed using one type of feature and multiple hypergraphs can be generated accordingly. Then, the learning process is conducted on the multihypergraph structure, and the identity of a probe is determined by its relevance to each gallery data. The merit of the proposed scheme is twofold. First, different from pairwise image matching, the proposed method jointly explores the relationships among different images. Second, multimodal data, i.e., different features, can be formulated in the multihypergraph structure, which can convey more information in the learning process and can be easily extended. We note that the proposed method is a general framework to incorporate with any combination of features, and thus is flexible in practice. Experimental results and comparisons with the state-of-the-art methods on three public benchmarking data sets demonstrate the superiority of the proposed method.
Collapse
|
31
|
Koutaki G, Shirai K, Ambai M. Hadamard Coding for Supervised Discrete Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:5378-5392. [PMID: 30010571 DOI: 10.1109/tip.2018.2855427] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose a learning-based supervised discrete hashing method. Binary hashing is widely used for large-scale image retrieval as well as video and document searches because the compact binary code representation is essential for data storage and reasonable for query searches using bit-operations. The recently proposed supervised discrete hashing (SDH) method efficiently solves mixed-integer programming problems by alternating optimization and the discrete cyclic coordinate descent (DCC) method. Based on some preliminary experiments, we show that the SDH method can be simplified without performance degradation. We analyze the simplified model and provide a mathematically exact solution thereof; we reveal that the exact binary code is provided by a "Hadamard matrix." Therefore, we named our method Hadamard codedsupervised discrete hashing (HC-SDH). In contrast to SDH, our model does not require an alternating optimization algorithm and does not depend on initial values. HC-SDH is also easier to implement than iterative quantization (ITQ). Experimental results involving a large-scale database show that Hadamard coding outperforms conventional SDH in terms of precision, recall, and computational time. On the large datasets SUN-397 and ImageNet, HC-SDH provides a superior mean average of precision (mAP) and top-accuracy compared to the conventional SDH methods with the same code length and FastHash. The training time of HC-SDH is 170 times faster than conventional SDH and the testing time including the encoding time is seven times faster than FastHash which encodes using a binary-tree.
Collapse
|
32
|
Yuen PC, Chellappa R. Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2022-2037. [PMID: 29989985 DOI: 10.1109/tip.2017.2777183] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
Collapse
|
33
|
Duan LY, Wu Y, Huang Y, Wang Z, Yuan J, Gao W. Minimizing Reconstruction Bias Hashing via Joint Projection Learning and Quantization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3127-3141. [PMID: 29994150 DOI: 10.1109/tip.2018.2818008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Hashing, a widely-studied solution to the approximate nearest neighbor (ANN) search, aims to map data points in the high-dimensional Euclidean space to the low-dimensional Hamming space while preserving the similarity between original points. As directly learning binary codes can be NP-hard due to discrete constraints, a two-stage scheme, namely "projection and quantization", has already become a standard paradigm for learning similarity-preserving hash codes. However, most existing hashing methods typically separate these two stages and thus fail to investigate complementary effects of both stages. In this paper, we systematically study the relationship between "projection and quantization", and propose a novel minimal reconstruction bias hashing (MRH) method to learn compact binary codes, in which the projection learning and quantization optimizing are jointly performed. By introducing a lower bound analysis, we design an effective ternary search algorithm to solve the corresponding optimization problem. Furthermore, we conduct some insightful discussions on the proposed MRH approach, including the theoretical proof, and computational complexity. Distinct from previous works, MRH can adaptively adjust the projection dimensionality to balance the information loss between projection and quantization. The proposed framework not only provides a unique perspective to view traditional hashing methods but also evokes some other researches, e.g., guiding the design of the loss functions in deep networks. Extensive experiment results have shown that the proposed MRH significantly outperforms a variety of state-of-the-art methods over eight widely used benchmarks.
Collapse
|
34
|
Liu H, Han J, Hou S, Shao L, Ruan Y. Single image super-resolution using a deep encoder–decoder symmetrical network with iterative back projection. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
35
|
Perina A. Latent Constrained Correlation Filter. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1038-1048. [PMID: 29990103 DOI: 10.1109/tip.2017.2775060] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Correlation filters are special classifiers designed for shift-invariant object recognition, which are robust to pattern distortions. The recent literature shows that combining a set of sub-filters trained based on a single or a small group of images obtains the best performance. The idea is equivalent to estimating variable distribution based on the data sampling (bagging), which can be interpreted as finding solutions (variable distribution approximation) directly from sampled data space. However, this methodology fails to account for the variations existed in the data. In this paper, we introduce an intermediate step-solution sampling-after the data sampling step to form a subspace, in which an optimal solution can be estimated. More specifically, we propose a new method, named latent constrained correlation filters (LCCF), by mapping the correlation filters to a given latent subspace, and develop a new learning framework in the latent subspace that embeds distribution-related constraints into the original problem. To solve the optimization problem, we introduce a subspace-based alternating direction method of multipliers, which is proven to converge at the saddle point. Our approach is successfully applied to three different tasks, including eye localization, car detection, and object tracking. Extensive experiments demonstrate that LCCF outperforms the state-of-the-art methods.11 .
Collapse
|
36
|
Guo Y, Ding G, Han J. Robust Quantization for General Similarity Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:949-963. [PMID: 29757738 DOI: 10.1109/tip.2017.2766445] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The recent years have witnessed the emerging of vector quantization (VQ) techniques for efficient similarity search. VQ partitions the feature space into a set of codewords and encodes data points as integer indices using the codewords. Then the distance between data points can be efficiently approximated by simple memory lookup operations. By the compact quantization, the storage cost, and searching complexity are significantly reduced, thereby facilitating efficient large-scale similarity search. However, the performance of several celebrated VQ approaches degrades significantly when dealing with noisy data. In addition, it can barely facilitate a wide range of applications as the distortion measurement only limits to ℓ2 norm. To address the shortcomings of the squared Euclidean (ℓ2,2 norm) loss function employed by the VQ approaches, in this paper, we propose a novel robust and general VQ framework, named RGVQ, to enhance both robustness and generalization of VQ approaches. Specifically, a ℓp,q-norm loss function is proposed to conduct the ℓp-norm similarity search, rather than the ℓ2 norm search, and the q-th order loss is used to enhance the robustness. Despite the fact that changing the loss function to ℓp,q norm makes VQ approaches more robust and generic, it brings us a challenge that a non-smooth and non-convex orthogonality constrained ℓp,q-norm function has to be minimized. To solve this problem, we propose a novel and efficient optimization scheme and specify it to VQ approaches and theoretically prove its convergence. Extensive experiments on benchmark data sets demonstrate that the proposed RGVQ is better than the original VQ for several approaches, especially when searching similarity in noisy data.
Collapse
|
37
|
Cheng G, Zhou P, Han J. Duplex Metric Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:281-292. [PMID: 28991740 DOI: 10.1109/tip.2017.2760512] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Image set classification has attracted much attention because of its broad applications. Despite the success made so far, the problems of intra-class diversity and inter-class similarity still remain two major challenges. To explore a possible solution to these challenges, this paper proposes a novel approach, termed duplex metric learning (DML), for image set classification. The proposed DML consists of two progressive metric learning stages with different objectives used for feature learning and image classification, respectively. The metric learning regularization is not only used to learn powerful feature representations but also well explored to train an effective classifier. At the first stage, we first train a discriminative stacked autoencoder (DSAE) by layer-wisely imposing a metric learning regularization term on the neurons in the hidden layers and meanwhile minimizing the reconstruction error to obtain new feature mappings in which similar samples are mapped closely to each other and dissimilar samples are mapped farther apart. At the second stage, we discriminatively train a classifier and simultaneously fine-tune the DSAE by optimizing a new objective function, which consists of a classification error term and a metric learning regularization term. Finally, two simple voting strategies are devised for image set classification based on the learnt classifier. In the experiments, we extensively evaluate the proposed framework for the tasks of face recognition, object recognition, and face verification on several commonly-used data sets and state-of-the-art results are achieved in comparison with existing methods.
Collapse
|
38
|
Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J. Large-scale image retrieval with Sparse Embedded Hashing. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.055] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
39
|
Qian X, Lu D, Wang Y, Zhu L, Tang YY, Wang M. Image Re-Ranking Based on Topic Diversity. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3734-3747. [PMID: 28463199 DOI: 10.1109/tip.2017.2699623] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Social media sharing Websites allow users to annotate images with free tags, which significantly contribute to the development of the web image retrieval. Tag-based image search is an important method to find images shared by users in social networks. However, how to make the top ranked result relevant and with diversity is challenging. In this paper, we propose a topic diverse ranking approach for tag-based image retrieval with the consideration of promoting the topic coverage performance. First, we construct a tag graph based on the similarity between each tag. Then, the community detection method is conducted to mine the topic community of each tag. After that, inter-community and intra-community ranking are introduced to obtain the final retrieved results. In the inter-community ranking process, an adaptive random walk model is employed to rank the community based on the multi-information of each topic community. Besides, we build an inverted index structure for images to accelerate the searching process. Experimental results on Flickr data set and NUS-Wide data sets show the effectiveness of the proposed approach.
Collapse
|
40
|
Guo Y, Ding G, Han J, Gao Y. Zero-Shot Learning With Transferred Samples. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3277-3290. [PMID: 28436875 DOI: 10.1109/tip.2017.2696747] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
By transferring knowledge from the abundant labeled samples of known source classes, zero-shot learning (ZSL) makes it possible to train recognition models for novel target classes that have no labeled samples. Conventional ZSL approaches usually adopt a two-step recognition strategy, in which the test sample is projected into an intermediary space in the first step, and then the recognition is carried out by considering the similarity between the sample and target classes in the intermediary space. Due to this redundant intermediate transformation, information loss is unavoidable, thus degrading the performance of overall system. Rather than adopting this two-step strategy, in this paper, we propose a novel one-step recognition framework that is able to perform recognition in the original feature space by using directly trained classifiers. To address the lack of labeled samples for training supervised classifiers for the target classes, we propose to transfer samples from source classes with pseudo labels assigned, in which the transferred samples are selected based on their transferability and diversity. Moreover, to account for the unreliability of pseudo labels of transferred samples, we modify the standard support vector machine formulation such that the unreliable positive samples can be recognized and suppressed in the training phase. The entire framework is fairly general with the possibility of further extensions to several common ZSL settings. Extensive experiments on four benchmark data sets demonstrate the superiority of the proposed framework, compared with the state-of-the-art approaches, in various settings.
Collapse
|