1
|
Shen X, Chen Y, Liu W, Zheng Y, Sun QS, Pan S. Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7997-8009. [PMID: 39028597 DOI: 10.1109/tnnls.2024.3421583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/21/2024]
Abstract
Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g., "ocean" and "cloud," often co-occur. However, existing cross-modal hashing methods overlook label dependency that is crucial for improving performance. To fulfill this gap, this article proposes graph convolutional multi-label hashing (GCMLH) for effective multi-label cross-modal retrieval. Specifically, GCMLH first generates word embedding of each label and develops label encoder to learn highly correlated label embedding via graph convolutional network (GCN). In addition, GCMLH develops feature encoder for each modality, and feature fusion module to generate highly semantic feature via GCN. GCMLH uses teacher-student learning scheme to transfer knowledge from the teacher modules, i.e., label encoder and feature fusion module, to the student module, i.e., feature encoder, such that learned hash code can well exploit multi-label dependency and multimodal semantic structure. Extensive empirical results on several benchmarks demonstrate the superiority of the proposed method over existing state-of-the-arts.
Collapse
|
2
|
Fan W, Zhang C, Li H, Jia X, Wang G. Three-Stage Semisupervised Cross-Modal Hashing With Pairwise Relations Exploitation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:260-273. [PMID: 37023166 DOI: 10.1109/tnnls.2023.3263221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.
Collapse
|
3
|
Liang X, Yang E, Yang Y, Deng C. Multi-Relational Deep Hashing for Cross-Modal Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3009-3020. [PMID: 38625760 DOI: 10.1109/tip.2024.3385656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Deep cross-modal hashing retrieval has recently made significant progress. However, existing methods generally learn hash functions with pairwise or triplet supervisions, which involves learning the relevant information by splicing partial similarity between data pairs; notably, this approach only captures the data similarity locally and incompletely, resulting in sub-optimal retrieval performance. In this paper, we propose a novel Multi-Relational Deep Hashing (MRDH) approach, which can fully bridge the modality gap by comprehensively modeling the similarity relationship between data in different modalities. In more detail, to investigate the inter-modal relationships, we constrain the consistency of cross-modal pairwise similarities to maintain the semantic similarity across modalities. Moreover, to further capture complete similarity information, we design a new similarity metric, which we term cross-modal global similarity, by encouraging hash codes of similar data pairs from different modalities to approach a common center and hash codes for dissimilar pairs to converge to different centers. Adopting this approach enables our model to generate more discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate the superiority of our method on cross-modal hashing retrieval.
Collapse
|
4
|
Hoang T, Do TT, Nguyen TV, Cheung NM. Multimodal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6289-6302. [PMID: 34982698 DOI: 10.1109/tnnls.2021.3135420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed cross-modal info-max hashing (CMIMH). First, to learn informative representations that can preserve both intramodal and intermodal similarities, we leverage the recent advances in estimating variational lower bound of MI to maximizing the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modeled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intramodal and intermodal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence, balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
Collapse
|
5
|
EDMH: Efficient discrete matrix factorization hashing for multi-modal similarity retrieval. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2023.103301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
6
|
Hu P, Zhu H, Lin J, Peng D, Zhao YP, Peng X. Unsupervised Contrastive Cross-Modal Hashing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3877-3889. [PMID: 35617190 DOI: 10.1109/tpami.2022.3177356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.
Collapse
|
7
|
CCAH: A CLIP-Based Cycle Alignment Hashing Method for Unsupervised Vision-Text Retrieval. INT J INTELL SYST 2023. [DOI: 10.1155/2023/7992047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Due to the advantages of low storage cost and fast retrieval efficiency, deep hashing methods are widely used in cross-modal retrieval. Images are usually accompanied by corresponding text descriptions rather than labels. Therefore, unsupervised methods have been widely concerned. However, due to the modal divide and semantic differences, existing unsupervised methods cannot adequately bridge the modal differences, leading to suboptimal retrieval results. In this paper, we propose CLIP-based cycle alignment hashing for unsupervised vision-text retrieval (CCAH), which aims to exploit the semantic link between the original features of modalities and the reconstructed features. Firstly, we design a modal cyclic interaction method that aligns semantically within intramodality, where one modal feature reconstructs another modal feature, thus taking full account of the semantic similarity between intramodal and intermodal relationships. Secondly, introducing GAT into cross-modal retrieval tasks. We consider the influence of text neighbour nodes and add attention mechanisms to capture the global features of text modalities. Thirdly, Fine-grained extraction of image features using the CLIP visual coder. Finally, hash encoding is learned through hash functions. The experiments demonstrate on three widely used datasets that our proposed CCAH achieves satisfactory results in total retrieval accuracy. Our code can be found at: https://github.com/CQYIO/CCAH.git.
Collapse
|
8
|
Deep Feature Pyramid Hashing for Efficient Image Retrieval. INFORMATION 2022. [DOI: 10.3390/info14010006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Thanks to the success of deep learning, deep hashing has recently evolved as a leading method for large-scale image retrieval. Most existing hashing methods use the last layer to extract semantic information from the input image. However, these methods have deficiencies because semantic features extracted from the last layer lack local information, which might impact the global system’s performance. To this end, a Deep Feature Pyramid Hashing DFPH is proposed in this study, which can fully utilize images’ multi-level visual and semantic information. Our architecture applies a new feature pyramid network designed for deep hashing to the VGG-19 model, so the model becomes able to learn the hash codes from various feature scales and then fuse them to create final binary hash codes. The experimental results performed on two widely used image retrieval datasets demonstrate the superiority of our method.
Collapse
|
9
|
Zhang D, Wu XJ, Chen G. ONION: Online Semantic Autoencoder Hashing for Cross-Modal Retrieval. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3572032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Cross-modal hashing (CMH) has recently received increasing attention with the merit of speed and storage in performing large-scale cross-media similarity search. However, most existing cross-media approaches utilize the batch-based mode to update hash functions, without the ability to efficiently handle the online streaming multimedia data. Online hashing can effectively address the above issue by using the online learning scheme to incrementally update the hash functions. Nevertheless, the existing online CMH approaches still suffer from several challenges
e.g.
, 1) how to efficiently and effectively utilize the supervision information. 2) how to learn more powerful hash functions, 3) how to solve the binary constraints. To mitigate these limitations, we present a novel online hashing approach named
ON
line Semant
I
c Aut
O
encoder Hashi
N
g (ONION). Specifically, it leverages the semantic autoencoder scheme to establish the correlations between binary codes and labels, delivering the power to obtain more discriminative hash codes. Besides, the proposed ONION directly utilizes the label inner product to build the connection between existing data and newly coming data. Therefore, the optimization is less sensitive to the newly arriving data. Equipping a discrete optimization scheme designed to solve the binary constraints, the quantization errors can be dramatically reduced. Furthermore, the hash functions are learned by the proposed autoencoder strategy, making the hash functions more powerful. Extensive experiments on three large-scale databases demonstrate that the performance of our ONION is superior to several recent competitive online and offline cross-media algorithms.
Collapse
Affiliation(s)
| | - Xiao-Jun Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, China
| | | |
Collapse
|
10
|
Yang F, Zhang QX, Ding XJ, Ma FM, Cao J, Tong DY. Semantic preserving asymmetric discrete hashing for cross-modal retrieval. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04282-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Liu X, Wang X, Cheung YM. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6306-6320. [PMID: 33979294 DOI: 10.1109/tnnls.2021.3076684] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-modal hashing, favored for its effectiveness and efficiency, has received wide attention to facilitating efficient retrieval across different modalities. Nevertheless, most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes while often involving time-consuming training procedure for handling the large-scale dataset. To tackle these issues, we formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data, so as to minimize the quantization loss of mapping such data to hamming space and propose an efficient fast discriminative discrete hashing (FDDH) approach for large-scale cross-modal retrieval. More specifically, FDDH introduces an orthogonal basis to regress the targeted hash codes of training examples to their corresponding semantic labels and utilizes the ε -dragging technique to provide provable large semantic margins. Accordingly, the discriminative power of semantic information can be explicitly captured and maximized. Moreover, an orthogonal transformation scheme is further proposed to map the nonlinear embedding data into the semantic subspace, which can well guarantee the semantic consistency between the data feature and its semantic representation. Consequently, an efficient closed-form solution is derived for discriminative hash code learning, which is very computationally efficient. In addition, an effective and stable online learning strategy is presented for optimizing modality-specific projection functions, featuring adaptivity to different training sizes and streaming data. The proposed FDDH approach theoretically approximates the bi-Lipschitz continuity, runs sufficiently fast, and also significantly improves the retrieval performance over the state-of-the-art methods. The source code is released at https://github.com/starxliu/FDDH.
Collapse
|
12
|
Wang Y, Chen ZD, Luo X, Li R, Xu XS. Fast Cross-Modal Hashing With Global and Local Similarity Embedding. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10064-10077. [PMID: 33750723 DOI: 10.1109/tcyb.2021.3059886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, supervised cross-modal hashing has attracted much attention and achieved promising performance. To learn hash functions and binary codes, most methods globally exploit the supervised information, for example, preserving an at-least-one pairwise similarity into hash codes or reconstructing the label matrix with binary codes. However, due to the hardness of the discrete optimization problem, they are usually time consuming on large-scale datasets. In addition, they neglect the class correlation in supervised information. From another point of view, they only explore the global similarity of data but overlook the local similarity hidden in the data distribution. To address these issues, we present an efficient supervised cross-modal hashing method, that is, fast cross-modal hashing (FCMH). It leverages not only global similarity information but also the local similarity in a group. Specifically, training samples are partitioned into groups; thereafter, the local similarity in each group is extracted. Moreover, the class correlation in labels is also exploited and embedded into the learning of binary codes. In addition, to solve the discrete optimization problem, we further propose an efficient discrete optimization algorithm with a well-designed group updating scheme, making its computational complexity linear to the size of the training set. In light of this, it is more efficient and scalable to large-scale datasets. Extensive experiments on three benchmark datasets demonstrate that FCMH outperforms some state-of-the-art cross-modal hashing approaches in terms of both retrieval accuracy and learning efficiency.
Collapse
|
13
|
Shu Z, Yong K, Yu J, Gao S, Mao C, Yu Z. Discrete asymmetric zero-shot hashing with application to cross-modal retrieval. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
14
|
|
15
|
Hou C, Li Z, Wu J. Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02804-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
16
|
|
17
|
Cao Z, Zhang Y, Guan J, Zhou S, Chen G. Link Weight Prediction Using Weight Perturbation and Latent Factor. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1785-1797. [PMID: 32525807 DOI: 10.1109/tcyb.2020.2995595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Link weight prediction is an important subject in network science and machine learning. Its applications to social network analysis, network modeling, and bioinformatics are ubiquitous. Although this subject has attracted considerable attention recently, the performance and interpretability of existing prediction models have not been well balanced. This article focuses on an unsupervised mixed strategy for link weight prediction. Here, the target attribute is the link weight, which represents the correlation or strength of the interaction between a pair of nodes. The input of the model is the weighted adjacency matrix without any preprocessing, as widely adopted in the existing models. Extensive observations on a large number of networks show that the new scheme is competitive to the state-of-the-art algorithms concerning both root-mean-square error and Pearson correlation coefficient metrics. Analytic and simulation results suggest that combining the weight consistency of the network and the link weight-associated latent factors of the nodes is a very effective way to solve the link weight prediction problem.
Collapse
|
18
|
Yu E, Ma J, Sun J, Chang X, Zhang H, Hauptmann AG. Deep Discrete Cross-Modal Hashing with Multiple Supervision. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.11.035] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
19
|
Zhang D, Wu XJ, Yin HF, Kittler J. MOON: Multi-hash codes joint learning for cross-media retrieval. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.07.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
20
|
Hu P, Peng X, Zhu H, Lin J, Zhen L, Peng D. Joint Versus Independent Multiview Hashing for Cross-View Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4982-4993. [PMID: 33119532 DOI: 10.1109/tcyb.2020.3027614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.
Collapse
|
21
|
|
22
|
Yang Z, Yang L, Huang W, Sun L, Long J. Enhanced Deep Discrete Hashing with semantic-visual similarity for image retrieval. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2021.102648] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
23
|
Li M, Li Q, Tang L, Peng S, Ma Y, Yang D. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5107034. [PMID: 34326867 PMCID: PMC8310450 DOI: 10.1155/2021/5107034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/08/2021] [Indexed: 11/18/2022]
Abstract
Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.
Collapse
Affiliation(s)
- Mingyong Li
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Qiqi Li
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Lirong Tang
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Shuang Peng
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Yan Ma
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Degang Yang
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| |
Collapse
|
24
|
|
25
|
|
26
|
Yang Z, Yang L, Raymond OI, Zhu L, Huang W, Liao Z, Long J. NSDH: A Nonlinear Supervised Discrete Hashing framework for large-scale cross-modal retrieval. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106818] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Fang Y, Li B, Li X, Ren Y. Unsupervised cross-modal similarity via Latent Structure Discrete Hashing Factorization. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106857] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
28
|
Meng M, Wang H, Yu J, Chen H, Wu J. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:986-1000. [PMID: 33232233 DOI: 10.1109/tip.2020.3038365] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hashing-based techniques have provided attractive solutions to cross-modal similarity search when addressing vast quantities of multimedia data. However, existing cross-modal hashing (CMH) methods face two critical limitations: 1) there is no previous work that simultaneously exploits the consistent or modality-specific information of multi-modal data; 2) the discriminative capabilities of pairwise similarity is usually neglected due to the computational cost and storage overhead. Moreover, to tackle the discrete constraints, relaxation-based strategy is typically adopted to relax the discrete problem to the continuous one, which severely suffers from large quantization errors and leads to sub-optimal solutions. To overcome the above limitations, in this article, we present a novel supervised CMH method, namely Asymmetric Supervised Consistent and Specific Hashing (ASCSH). Specifically, we explicitly decompose the mapping matrices into the consistent and modality-specific ones to sufficiently exploit the intrinsic correlation between different modalities. Meanwhile, a novel discrete asymmetric framework is proposed to fully explore the supervised information, in which the pairwise similarity and semantic labels are jointly formulated to guide the hash code learning process. Unlike existing asymmetric methods, the discrete asymmetric structure developed is capable of solving the binary constraint problem discretely and efficiently without any relaxation. To validate the effectiveness of the proposed approach, extensive experiments on three widely used datasets are conducted and encouraging results demonstrate the superiority of ASCSH over other state-of-the-art CMH methods.
Collapse
|
29
|
Scalable deep asymmetric hashing via unequal-dimensional embeddings for image similarity search. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.036] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
30
|
Qiang H, Wan Y, Liu Z, Xiang L, Meng X. Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106188] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
31
|
Hoang T, Do TT, Nguyen TV, Cheung NM. Unsupervised Deep Cross-modality Spectral Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8391-8406. [PMID: 32784139 DOI: 10.1109/tip.2020.3014727] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods.
Collapse
|
32
|
|
33
|
Yao T, Han Y, Wang R, Kong X, Yan L, Fu H, Tian Q. Efficient discrete supervised hashing for large-scale cross-modal retrieval. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.086] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
34
|
Ma L, Li H, Meng F, Wu Q, Ngan KN. Discriminative deep metric learning for asymmetric discrete hashing. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Affiliation(s)
- Qing Xie
- Department of Management and Marketing, Faculty of Economics and Management, University Putra Malaysia, Serdang, Malaysia
- School of Economics and Management, Anshun University, Anshun, China
| |
Collapse
|