1
|
Jiang K, Wong WK, Fang X, Li J, Qin J, Xie S. Random Online Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:677-691. [PMID: 38048245 DOI: 10.1109/tnnls.2023.3330975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Abstract
In the past decades, supervised cross-modal hashing methods have attracted considerable attentions due to their high searching efficiency on large-scale multimedia databases. Many of these methods leverage semantic correlations among heterogeneous modalities by constructing a similarity matrix or building a common semantic space with the collective matrix factorization method. However, the similarity matrix may sacrifice the scalability and cannot preserve more semantic information into hash codes in the existing methods. Meanwhile, the matrix factorization methods cannot embed the main modality-specific information into hash codes. To address these issues, we propose a novel supervised cross-modal hashing method called random online hashing (ROH) in this article. ROH proposes a linear bridging strategy to simplify the pair-wise similarities factorization problem into a linear optimization one. Specifically, a bridging matrix is introduced to establish a bidirectional linear relation between hash codes and labels, which preserves more semantic similarities into hash codes and significantly reduces the semantic distances between hash codes of samples with similar labels. Additionally, a novel maximum eigenvalue direction (MED) embedding method is proposed to identify the direction of maximum eigenvalue for the original features and preserve critical information into modality-specific hash codes. Eventually, to handle real-time data dynamically, an online structure is adopted to solve the problem of dealing with new arrival data chunks without considering pairwise constraints. Extensive experimental results on three benchmark datasets demonstrate that the proposed ROH outperforms several state-of-the-art cross-modal hashing methods.
Collapse
|
2
|
Fan W, Zhang C, Li H, Jia X, Wang G. Three-Stage Semisupervised Cross-Modal Hashing With Pairwise Relations Exploitation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:260-273. [PMID: 37023166 DOI: 10.1109/tnnls.2023.3263221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.
Collapse
|
3
|
Liu H, Zhou W, Zhang H, Li G, Zhang S, Li X. Bit Reduction for Locality-Sensitive Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12470-12481. [PMID: 37037245 DOI: 10.1109/tnnls.2023.3263195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Locality-sensitive hashing (LSH) has gained ever-increasing popularity in similarity search for large-scale data. It has competitive search performance when the number of generated hash bits is large, reversely bringing adverse dilemmas for its wide applications. The first purpose of this work is to introduce a novel hash bit reduction schema for hashing techniques to derive shorter binary codes, which has not yet received sufficient concerns. To briefly show how the reduction schema works, the second purpose is to present an effective bit reduction method for LSH under the reduction schema. Specifically, after the hash bits are generated by LSH, they will be put into bit pool as candidates. Then mutual information and data labels are exploited to measure the correlation and structural properties between the hash bits, respectively. Eventually, highly correlated and redundant hash bits can be distinguished and then removed accordingly, without deteriorating the performance greatly. The advantages of our reduction method include that it can not only reduce the number of hash bits effectively but also boost retrieval performance of LSH, making it more appealing and practical in real-world applications. Comprehensive experiments were conducted on three public real-world datasets. The experimental results with representative bit selection methods and the state-of-the-art hashing algorithms demonstrate that the proposed method has encouraging and competitive performance.
Collapse
|
4
|
Liang X, Yang E, Yang Y, Deng C. Multi-Relational Deep Hashing for Cross-Modal Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3009-3020. [PMID: 38625760 DOI: 10.1109/tip.2024.3385656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Deep cross-modal hashing retrieval has recently made significant progress. However, existing methods generally learn hash functions with pairwise or triplet supervisions, which involves learning the relevant information by splicing partial similarity between data pairs; notably, this approach only captures the data similarity locally and incompletely, resulting in sub-optimal retrieval performance. In this paper, we propose a novel Multi-Relational Deep Hashing (MRDH) approach, which can fully bridge the modality gap by comprehensively modeling the similarity relationship between data in different modalities. In more detail, to investigate the inter-modal relationships, we constrain the consistency of cross-modal pairwise similarities to maintain the semantic similarity across modalities. Moreover, to further capture complete similarity information, we design a new similarity metric, which we term cross-modal global similarity, by encouraging hash codes of similar data pairs from different modalities to approach a common center and hash codes for dissimilar pairs to converge to different centers. Adopting this approach enables our model to generate more discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate the superiority of our method on cross-modal hashing retrieval.
Collapse
|
5
|
Bai C, Zeng C, Ma Q, Zhang J. Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4756-4767. [PMID: 35604998 DOI: 10.1109/tnnls.2022.3174970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the rapid development of deep neural networks, cross-modal hashing has made great progress. However, the information of different types of data is asymmetrical, that is to say, if the resolution of an image is high enough, it can reproduce almost 100% of the real-world scenes. However, text usually carries personal emotion and it is not objective enough, so we generally think that the information of image will be much richer than text. Although most of the existing methods unify the semantic feature extraction and hash function learning modules for end-to-end learning, they ignore this issue and do not use information-rich modalities to support information-poor modalities, leading to suboptimal results, although they unify the semantic feature extraction and hash function learning modules for end-to-end learning. Furthermore, previous methods learn hash functions in a relaxed way that causes nontrivial quantization losses. To address these issues, we propose a new method called graph convolutional network (GCN) discrete hashing. This method uses a GCN to bridge the information gap between different types of data. The GCN can represent each label as word embedding, with the embedding regarded as a set of interdependent object classifiers. From these classifiers, we can obtain predicted labels to enhance feature representations across modalities. In addition, we use an efficient discrete optimization strategy to learn the discrete binary codes without relaxation. Extensive experiments conducted on three commonly used datasets demonstrate that our proposed method graph convolutional network-based discrete hashing (GCDH) outperforms the current state-of-the-art cross-modal hashing methods.
Collapse
|
6
|
Li B, Li Z. Large-Scale Cross-Modal Hashing with Unified Learning and Multi-Object Regional Correlation Reasoning. Neural Netw 2024; 171:276-292. [PMID: 38103437 DOI: 10.1016/j.neunet.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/31/2023] [Accepted: 12/11/2023] [Indexed: 12/19/2023]
Abstract
To explore the rich information contained in multi-modal data and take into account efficiency, deep cross-modal hash retrieval (DCMHR) is a wise solution. But currently, most DCMHR methods have two key limitations, one is that the recommended classification of DCMHR models is conditioned only on the objects in different regions, respectively. Another flaw is that these methods either do not learn the unified hash codes in training or cannot design an efficient training process. To solve these two problems, this paper designs Large-Scale Cross-Modal Hashing with Unified Learning and Multi-Object Regional Correlation Reasoning (HUMOR). For the proposed related labels classified by ImgNet, HUMOR uses Multiple Instance Learning (MIL) to reason the correlation of these labels. When regional correlation reasoning is low, these labels will be through "reduce-add" to rectification from max-to-min (global precedence) or min-to-max (regional precedence). Then, HUMOR conducts unified learning on hash loss and classification loss, adopts the four-step iterative algorithm to optimize the unified hash codes, and reduces bias in the model. Experiments on two baseline datasets show that the average performance of this method is higher than most of the DCMHR methods. The results demonstrate the effectiveness and innovation of our method.
Collapse
Affiliation(s)
- Bo Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, China; School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin 541004, China.
| | - Zhixin Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China; Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, China.
| |
Collapse
|
7
|
Peng SJ, He Y, Liu X, Cheung YM, Xu X, Cui Z. Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2194-2207. [PMID: 35830398 DOI: 10.1109/tnnls.2022.3188569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fine-grained image-text retrieval has been a hot research topic to bridge the vision and languages, and its main challenge is how to learn the semantic correspondence across different modalities. The existing methods mainly focus on learning the global semantic correspondence or intramodal relation correspondence in separate data representations, but which rarely consider the intermodal relation that interactively provide complementary hints for fine-grained semantic correlation learning. To address this issue, we propose a relation-aggregated cross-graph (RACG) model to explicitly learn the fine-grained semantic correspondence by aggregating both intramodal and intermodal relations, which can be well utilized to guide the feature correspondence learning process. More specifically, we first build semantic-embedded graph to explore both fine-grained objects and their relations of different media types, which aim not only to characterize the object appearance in each modality, but also to capture the intrinsic relation information to differentiate intramodal discrepancies. Then, a cross-graph relation encoder is newly designed to explore the intermodal relation across different modalities, which can mutually boost the cross-modal correlations to learn more precise intermodal dependencies. Besides, the feature reconstruction module and multihead similarity alignment are efficiently leveraged to optimize the node-level semantic correspondence, whereby the relation-aggregated cross-modal embeddings between image and text are discriminatively obtained to benefit various image-text retrieval tasks with high retrieval performance. Extensive experiments evaluated on benchmark datasets quantitatively and qualitatively verify the advantages of the proposed framework for fine-grained image-text retrieval and show its competitive performance with the state of the arts.
Collapse
|
8
|
Zhang M, Li J, Zheng X. Semantic embedding based online cross-modal hashing method. Sci Rep 2024; 14:736. [PMID: 38184671 PMCID: PMC10771426 DOI: 10.1038/s41598-023-50242-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/17/2023] [Indexed: 01/08/2024] Open
Abstract
Hashing has been extensively utilized in cross-modal retrieval due to its high efficiency in handling large-scale, high-dimensional data. However, most existing cross-modal hashing methods operate as offline learning models, which learn hash codes in a batch-based manner and prove to be inefficient for streaming data. Recently, several online cross-modal hashing methods have been proposed to address the streaming data scenario. Nevertheless, these methods fail to fully leverage the semantic information and accurately optimize hashing in a discrete fashion. As a result, both the accuracy and efficiency of online cross-modal hashing methods are not ideal. To address these issues, this paper introduces the Semantic Embedding-based Online Cross-modal Hashing (SEOCH) method, which integrates semantic information exploitation and online learning into a unified framework. To exploit the semantic information, we map the semantic labels to a latent semantic space and construct a semantic similarity matrix to preserve the similarity between new data and existing data in the Hamming space. Moreover, we employ a discrete optimization strategy to enhance the efficiency of cross-modal retrieval for online hashing. Through extensive experiments on two publicly available multi-label datasets, we demonstrate the superiority of the SEOCH method.
Collapse
Affiliation(s)
- Meijia Zhang
- School of Data Science and Computer Science, Shandong Women's University, Jinan, 250300, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, 250022, China
| | - Junzheng Li
- Network Information Management Center, Shandong Management University, Jinan, 250357, China
| | - Xiyuan Zheng
- School of Data Science and Computer Science, Shandong Women's University, Jinan, 250300, China.
| |
Collapse
|
9
|
Sun Y, Wang X, Peng D, Ren Z, Shen X. Hierarchical Hashing Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1732-1744. [PMID: 37028051 DOI: 10.1109/tip.2023.3251025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With the development of video network, image set classification (ISC) has received a lot of attention and can be used for various practical applications, such as video based recognition, action recognition, and so on. Although the existing ISC methods have obtained promising performance, they often have extreme high complexity. Due to the superiority in storage space and complexity cost, learning to hash becomes a powerful solution scheme. However, existing hashing methods often ignore complex structural information and hierarchical semantics of the original features. They usually adopt a single-layer hashing strategy to transform high-dimensional data into short-length binary codes in one step. This sudden drop of dimension could result in the loss of advantageous discriminative information. In addition, they do not take full advantage of intrinsic semantic knowledge from whole gallery sets. To tackle these problems, in this paper, we propose a novel Hierarchical Hashing Learning (HHL) for ISC. Specifically, a coarse-to-fine hierarchical hashing scheme is proposed that utilizes a two-layer hash function to gradually refine the beneficial discriminative information in a layer-wise fashion. Besides, to alleviate the effects of redundant and corrupted features, we impose the $\ell _{2,1}$ norm on the layer-wise hash function. Moreover, we adopt a bidirectional semantic representation with the orthogonal constraint to keep intrinsic semantic information of all samples in whole image sets adequately. Comprehensive experiments demonstrate HHL acquires significant improvements in accuracy and running time. We will release the demo code on https://github.com/sunyuan-cs.
Collapse
|
10
|
Xie Y, Zeng X, Wang T, Yi Y. Online deep hashing for both uni-modal and cross-modal retrieval. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Lin L, Shu X. Gaussian similarity preserving for cross-modal hashing. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|