1
|
Yuan L, Wang T, Zhang X, Tay FEH, Jie Z, Tian Y, Liu W, Feng J. Learnable Central Similarity Quantization for Efficient Image and Video Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18717-18730. [PMID: 38090871 DOI: 10.1109/tnnls.2023.3321148] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Data-dependent hashing methods aim to learn hash functions from the pairwise or triplet relationships among the data, which often lead to low efficiency and low collision rate by only capturing the local distribution of the data. To solve the limitation, we propose central similarity, in which the hash codes of similar data pairs are encouraged to approach a common center and those of dissimilar pairs to converge to different centers. As a new global similarity metric, central similarity can improve the efficiency and retrieval accuracy of hash learning. By introducing a new concept, hash centers, we principally formulate the computation of the proposed central similarity metric, in which the hash centers refer to a set of points scattered in the Hamming space with a sufficient mutual distance between each other. To construct well-separated hash centers, we provide two efficient methods: 1) leveraging the Hadamard matrix and Bernoulli distributions to generate data-independent hash centers and 2) learning data-dependent hash centers from data representations. Based on the proposed similarity metric and hash centers, we propose central similarity quantization (CSQ) that optimizes the central similarity between data points with respect to their hash centers instead of optimizing the local similarity to generate a high-quality deep hash function. We also further improve the CSQ with data-dependent hash centers, dubbed as CSQ with learnable center (CSQLC). The proposed CSQ and CSQLC are generic and applicable to image and video hashing scenarios. We conduct extensive experiments on large-scale image and video retrieval tasks, and the proposed CSQ yields noticeably boosted retrieval performance, i.e., 3%-20% in mean average precision (mAP) over the previous state-of-the-art methods, which also demonstrates that our methods can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs.
Collapse
|
2
|
Liu H, Zhou W, Zhang H, Li G, Zhang S, Li X. Bit Reduction for Locality-Sensitive Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12470-12481. [PMID: 37037245 DOI: 10.1109/tnnls.2023.3263195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Locality-sensitive hashing (LSH) has gained ever-increasing popularity in similarity search for large-scale data. It has competitive search performance when the number of generated hash bits is large, reversely bringing adverse dilemmas for its wide applications. The first purpose of this work is to introduce a novel hash bit reduction schema for hashing techniques to derive shorter binary codes, which has not yet received sufficient concerns. To briefly show how the reduction schema works, the second purpose is to present an effective bit reduction method for LSH under the reduction schema. Specifically, after the hash bits are generated by LSH, they will be put into bit pool as candidates. Then mutual information and data labels are exploited to measure the correlation and structural properties between the hash bits, respectively. Eventually, highly correlated and redundant hash bits can be distinguished and then removed accordingly, without deteriorating the performance greatly. The advantages of our reduction method include that it can not only reduce the number of hash bits effectively but also boost retrieval performance of LSH, making it more appealing and practical in real-world applications. Comprehensive experiments were conducted on three public real-world datasets. The experimental results with representative bit selection methods and the state-of-the-art hashing algorithms demonstrate that the proposed method has encouraging and competitive performance.
Collapse
|
3
|
Chen X, Li Y, Chen C. An Online Hashing Algorithm for Image Retrieval Based on Optical-Sensor Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:2576. [PMID: 36904780 PMCID: PMC10007520 DOI: 10.3390/s23052576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/20/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
Online hashing is a valid storage and online retrieval scheme, which is meeting the rapid increase in data in the optical-sensor network and the real-time processing needs of users in the era of big data. Existing online-hashing algorithms rely on data tags excessively to construct the hash function, and ignore the mining of the structural features of the data itself, resulting in a serious loss of the image-streaming features and the reduction in retrieval accuracy. In this paper, an online hashing model that fuses global and local dual semantics is proposed. First, to preserve the local features of the streaming data, an anchor hash model, which is based on the idea of manifold learning, is constructed. Second, a global similarity matrix, which is used to constrain hash codes is built by the balanced similarity between the newly arrived data and previous data, which makes hash codes retain global data features as much as possible. Then, under a unified framework, an online hash model that integrates global and local dual semantics is learned, and an effective discrete binary-optimization solution is proposed. A large number of experiments on three datasets, including CIFAR10, MNIST and Places205, show that our proposed algorithm improves the efficiency of image retrieval effectively, compared with several existing advanced online-hashing algorithms.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yanlong Li
- Department of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin University of Electronic Technology, Guilin 541004, China
| | - Chen Chen
- Department of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
| |
Collapse
|
4
|
Williams-Lekuona M, Cosma G, Phillips I. A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. J Imaging 2022; 8:jimaging8120328. [PMID: 36547493 PMCID: PMC9785405 DOI: 10.3390/jimaging8120328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022] Open
Abstract
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has been on training with paired data. Paired data refers to samples with one-to-one correspondence across modalities, e.g., image and text pairs where the text sample describes the image. However, real-world applications produce unpaired data that cannot be utilised by most current CMH methods during the training process. Models that can learn from unpaired data are crucial for real-world applications such as cross-modal neural information retrieval where paired data is limited or not available to train the model. This paper provides (1) an overview of the CMH methods when applied to unpaired datasets, (2) proposes a framework that enables pairwise-constrained CMH methods to train with unpaired samples, and (3) evaluates the performance of state-of-the-art CMH methods across different pairing scenarios.
Collapse
|
5
|
Li Q, Tian X, Ng WW, Pelillo M. Hashing-based affinity matrix for dominant set clustering. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Shi W, Gong Y, Chen B, Hei X. Transductive Semisupervised Deep Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3713-3726. [PMID: 33544678 DOI: 10.1109/tnnls.2021.3054386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep hashing methods have shown their superiority to traditional ones. However, they usually require a large amount of labeled training data for achieving high retrieval accuracies. We propose a novel transductive semisupervised deep hashing (TSSDH) method which is effective to train deep convolutional neural network (DCNN) models with both labeled and unlabeled training samples. TSSDH method consists of the following four main ingredients. First, we extend the traditional transductive learning (TL) principle to make it applicable to DCNN-based deep hashing. Second, we introduce confidence levels for unlabeled samples to reduce adverse effects from uncertain samples. Third, we employ a Gaussian likelihood loss for hash code learning to sufficiently penalize large Hamming distances for similar sample pairs. Fourth, we design the large-margin feature (LMF) regularization to make the learned features satisfy that the distances of similar sample pairs are minimized and the distances of dissimilar sample pairs are larger than a predefined margin. Comprehensive experiments show that the TSSDH method can produce superior image retrieval accuracies compared to the representative semisupervised deep hashing methods under the same number of labeled training samples.
Collapse
|
7
|
Karambakhsh A, Sheng B, Li P, Li H, Kim J, Jung Y, Chen CLP. SparseVoxNet: 3-D Object Recognition With Sparsely Aggregation of 3-D Dense Blocks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:532-546. [PMID: 35613068 DOI: 10.1109/tnnls.2022.3175775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Automatic recognition of 3-D objects in a 3-D model by convolutional neural network (CNN) methods has been successfully applied to various tasks, e.g., robotics and augmented reality. Three-dimensional object recognition is mainly performed by analyzing the object using multi-view images, depth images, graphs, or volumetric data. In some cases, using volumetric data provides the most promising results. However, existing recognition techniques on volumetric data have many drawbacks, such as losing object details on converting points to voxels and the large size of the input volume data that leads to substantial 3-D CNNs. Using point clouds could also provide very promising results; however, point-cloud-based methods typically need sparse data entry and time-consuming training stages. Thus, using volumetric could be a more efficient and flexible recognizer for our special case in the School of Medicine, Shanghai Jiao Tong University. In this article, we propose a novel solution to 3-D object recognition from volumetric data using a combination of three compact CNN models, low-cost SparseNet, and feature representation technique. We achieve an optimized network by estimating extra geometrical information comprising the surface normal and curvature into two separated neural networks. These two models provide supplementary information to each voxel data that consequently improve the results. The primary network model takes advantage of all the predicted features and uses these features in Random Forest (RF) for recognition purposes. Our method outperforms other methods in training speed in our experiments and provides an accurate result as good as the state-of-the-art.
Collapse
|
8
|
Yu G, Liu X, Wang J, Domeniconi C, Zhang X. Flexible Cross-Modal Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:304-314. [PMID: 33052870 DOI: 10.1109/tnnls.2020.3027729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hashing has been widely adopted for large-scale data retrieval in many domains due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities is readily available. This assumption is unrealistic in practical applications. In addition, existing methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (FlexCMH) to learn effective hashing codes from weakly paired data, whose correspondence across modalities is partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the structure of each cluster and, thus, to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes the potential correspondence, the cross-modal hashing functions derived from the correspondence, and a hashing quantitative loss in a unified objective function. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions and reinforce the reciprocal effects of the two objectives. Experiments on public multimodal data sets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it, indeed, offers a high degree of flexibility for practical cross-modal hashing tasks.
Collapse
|
9
|
|
10
|
Jin Y, Sheng B, Li P, Chen CLP. Broad Colorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2330-2343. [PMID: 32614774 DOI: 10.1109/tnnls.2020.3004634] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The scribble- and example-based colorization methods have fastidious requirements for users, and the training process of deep neural networks for colorization is quite time-consuming. We instead proposed an automatic colorization approach with no dependence on user input and no need to endure long training time, which combines local features and global features of the input gray-scale images. Low-, mid-, and high-level features are united as local features representing cues existed in the gray-scale image. The global feature is regarded as data prior to guiding the colorization process. The local broad learning system is trained for getting the chrominance value of each pixel from the local features, which could be expressed as a chrominance map according to the position of pixels. Then, the global broad learning system is trained to refine the chrominance map. There are no requirements for users in our approach, and the training time of our framework is an order of magnitude faster than the traditional methods based on deep neural networks. To increase the user's subjective initiative, our system allows users to increase training data without retraining the system. Substantial experimental results have shown that our approach outperforms state-of-the-art methods.
Collapse
|
11
|
Liu H, Li X, Zhang S, Tian Q. Adaptive Hashing With Sparse Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4318-4329. [PMID: 31899436 DOI: 10.1109/tnnls.2019.2954856] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing offers a desirable and effective solution for efficiently retrieving the nearest neighbors from large-scale data because of its low storage and computation costs. One of the most appealing techniques for hashing learning is matrix factorization. However, most hashing methods focus only on building the mapping relationships between the Euclidean and Hamming spaces and, unfortunately, underestimate the naturally sparse structures of the data. In addition, parameter tuning is always a challenging and head-scratching problem for sparse hashing learning. To address these problems, in this article, we propose a novel hashing method termed adaptively sparse matrix factorization hashing (SMFH), which exploits sparse matrix factorization to explore the parsimonious structures of the data. Moreover, SMFH adopts an orthogonal transformation to minimize the quantization loss while deriving the binary codes. The most distinguished property of SMFH is that it is adaptive and parameter-free, that is, SMFH can automatically generate sparse representations and does not require human involvement to tune the regularization parameters for the sparse models. Empirical studies on four publicly available benchmark data sets show that the proposed method can achieve promising performance and is competitive with a variety of state-of-the-art hashing methods.
Collapse
|
12
|
Deng C, Yang E, Liu T, Tao D. Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2189-2201. [PMID: 31514156 DOI: 10.1109/tnnls.2019.2929068] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. Recent supervised hashing research has shown that deep learning-based methods can significantly outperform nondeep methods. Most existing supervised deep hashing methods exploit supervisory signals to generate similar and dissimilar image pairs for training. However, natural images can have large intraclass and small interclass variations, which may degrade the accuracy of hash codes. To address this problem, we propose a novel two-stream ConvNet architecture, which learns hash codes with class-specific representation centers. Our basic idea is that if we can learn a unified binary representation for each class as a center and encourage hash codes of images to be close to the corresponding centers, the intraclass variation will be greatly reduced. Accordingly, we design a neural network that leverages label information and outputs a unified binary representation for each class. Moreover, we also design an image network to learn hash codes from images and force these hash codes to be close to the corresponding class-specific centers. These two neural networks are then seamlessly incorporated to create a unified, end-to-end trainable framework. Extensive experiments on three popular benchmarks corroborate that our proposed method outperforms current state-of-the-art methods.
Collapse
|
13
|
Lin M, Ji R, Chen S, Sun X, Lin CW. Similarity-Preserving Linkage Hashing for Online Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5289-5300. [PMID: 32217477 DOI: 10.1109/tip.2020.2981879] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Online image hashing aims to update hash functions on-the-fly along with newly arriving data streams, which has found broad applications in computer vision and beyond. To this end, most existing methods update hash functions simply using discrete labels or pairwise similarity to explore intra-class relationships, which, however, often deteriorates search performance when facing a domain gap or semantic shift. One reason is that they ignore the particular semantic relationships among different classes, which should be taken into account in updating hash functions. Besides, the common characteristics between the label vectors (can be regarded as a sort of binary codes) and to-be-learned binary hash codes have left unexploited. In this paper, we present a novel online hashing method, termed Similarity Preserving Linkage Hashing (SPLH), which not only utilizes pairwise similarity to learn the intra-class relationships, but also fully exploits a latent linkage space to capture the inter-class relationships and the common characteristics between label vectors and to-be-learned hash codes. Specifically, SPLH first maps the independent discrete label vectors and binary hash codes into a linkage space, through which the relative semantic distance between data points can be assessed precisely. As a result, the pairwise similarities within the newly arriving data stream are exploited to learn the latent semantic space to benefit binary code learning. To learn the model parameters effectively, we further propose an alternating optimization algorithm. Extensive experiments conducted on three widely-used datasets demonstrate the superior performance of SPLH over several state-of-the-art online hashing methods.
Collapse
|
14
|
Da C, Meng G, Xiang S, Ding K, Xu S, Yang Q, Pan C. Nonlinear Asymmetric Multi-Valued Hashing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:2660-2676. [PMID: 30176580 DOI: 10.1109/tpami.2018.2867866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Most existing hashing methods resort to binary codes for large scale similarity search, owing to the high efficiency of computation and storage. However, binary codes lack enough capability in similarity preservation, resulting in less desirable performance. To address this issue, we propose Nonlinear Asymmetric Multi-Valued Hashing (NAMVH) supported by two distinct non-binary embeddings. Specifically, a real-valued embedding is used for representing the newly-coming query by an ideally nonlinear transformation. Besides, a multi-integer-embedding is employed for compressing the whole database, which is modeled by Binary Sparse Representation (BSR) with fixed sparsity. With these two non-binary embeddings, NAMVH preserves more precise similarities between data points and enables access to the incremental extension with database samples evolving dynamically. To perform meaningful asymmetric similarity computation for efficient semantic search, these embeddings are jointly learnt by preserving the pairwise label-based similarity. Technically, this results in a mixed integer programming problem, which is efficiently solved by a well-designed alternative optimization method. Extensive experiments on seven large scale datasets demonstrate that our approach not only outperforms the existing binary hashing methods in search accuracy, but also retains their query and storage efficiency.
Collapse
|
15
|
Yang E, Deng C, Li C, Liu W, Li J, Tao D. Shared Predictive Cross-Modal Deep Quantization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5292-5303. [PMID: 29994640 DOI: 10.1109/tnnls.2018.2793863] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
With explosive growth of data volume and ever-increasing diversity of data modalities, cross-modal similarity search, which conducts nearest neighbor search across different modalities, has been attracting increasing interest. This paper presents a deep compact code learning solution for efficient cross-modal similarity search. Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search. In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search. Our approach, dubbed shared predictive deep quantization (SPDQ), explicitly formulates a shared subspace across different modalities and two private subspaces for individual modalities, and representations in the shared subspace and the private subspaces are learned simultaneously by embedding them to a reproducing kernel Hilbert space, where the mean embedding of different modality distributions can be explicitly compared. In addition, in the shared subspace, a quantizer is learned to produce the semantics preserving compact codes with the help of label alignment. Thanks to this novel network architecture in cooperation with supervised quantization training, SPDQ can preserve intramodal and intermodal similarities as much as possible and greatly reduce quantization error. Experiments on two popular benchmarks corroborate that our approach outperforms state-of-the-art methods.
Collapse
|