1
|
Jiang K, Wong WK, Fang X, Li J, Qin J, Xie S. Random Online Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:677-691. [PMID: 38048245 DOI: 10.1109/tnnls.2023.3330975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/06/2023]
Abstract
In the past decades, supervised cross-modal hashing methods have attracted considerable attentions due to their high searching efficiency on large-scale multimedia databases. Many of these methods leverage semantic correlations among heterogeneous modalities by constructing a similarity matrix or building a common semantic space with the collective matrix factorization method. However, the similarity matrix may sacrifice the scalability and cannot preserve more semantic information into hash codes in the existing methods. Meanwhile, the matrix factorization methods cannot embed the main modality-specific information into hash codes. To address these issues, we propose a novel supervised cross-modal hashing method called random online hashing (ROH) in this article. ROH proposes a linear bridging strategy to simplify the pair-wise similarities factorization problem into a linear optimization one. Specifically, a bridging matrix is introduced to establish a bidirectional linear relation between hash codes and labels, which preserves more semantic similarities into hash codes and significantly reduces the semantic distances between hash codes of samples with similar labels. Additionally, a novel maximum eigenvalue direction (MED) embedding method is proposed to identify the direction of maximum eigenvalue for the original features and preserve critical information into modality-specific hash codes. Eventually, to handle real-time data dynamically, an online structure is adopted to solve the problem of dealing with new arrival data chunks without considering pairwise constraints. Extensive experimental results on three benchmark datasets demonstrate that the proposed ROH outperforms several state-of-the-art cross-modal hashing methods.
Collapse
|
2
|
Fan W, Zhang C, Li H, Jia X, Wang G. Three-Stage Semisupervised Cross-Modal Hashing With Pairwise Relations Exploitation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:260-273. [PMID: 37023166 DOI: 10.1109/tnnls.2023.3263221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.
Collapse
|
3
|
Zhang B, Zhang Y, Li J, Chen J, Akutsu T, Cheung YM, Cai H. Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal Retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:387-399. [PMID: 39316491 DOI: 10.1109/tpami.2024.3467130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Hashing technology has exhibited great cross-modal retrieval potential due to its appealing retrieval efficiency and storage effectiveness. Most current supervised cross-modal retrieval methods heavily rely on accurate semantic supervision, which is intractable for annotations with ever-growing sample sizes. By comparison, the existing unsupervised methods rely on accurate sample similarity preservation strategies with intensive computational costs to compensate for the lack of semantic guidance, which causes these methods to lose the power to bridge the semantic gap. Furthermore, both kinds of approaches need to search for the nearest samples among all samples in a large search space, whose process is laborious. To address these issues, this paper proposes an unsupervised dual deep hashing (UDDH) method with semantic-index and content-code for cross-modal retrieval. Deep hashing networks are utilized to extract deep features and jointly encode the dual hashing codes in a collaborative manner with a common semantic index and modality content codes to simultaneously bridge the semantic and heterogeneous gaps for cross-modal retrieval. The dual deep hashing architecture, comprising the head code on semantic index and tail codes on modality content, enhances the efficiency for cross-modal retrieval. A query sample only needs to search for the retrieved samples with the same semantic index, thus greatly shrinking the search space and achieving superior retrieval efficiency. UDDH integrates the learning processes of deep feature extraction, binary optimization, common semantic index, and modality content code within a unified model, allowing for collaborative optimization to enhance the overall performance. Extensive experiments are conducted to demonstrate the retrieval superiority of the proposed approach over the state-of-the-art baselines.
Collapse
|
4
|
Zou Q, Cheng S, Du A, Chen J. Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval. ENTROPY (BASEL, SWITZERLAND) 2024; 26:911. [PMID: 39593856 PMCID: PMC11592578 DOI: 10.3390/e26110911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/19/2024] [Accepted: 10/25/2024] [Indexed: 11/28/2024]
Abstract
Deep hashing technology, known for its low-cost storage and rapid retrieval, has become a focal point in cross-modal retrieval research as multimodal data continue to grow. However, existing supervised methods often overlook noisy labels and multiscale features in different modal datasets, leading to higher information entropy in the generated hash codes and features, which reduces retrieval performance. The variation in text annotation information across datasets further increases the information entropy during text feature extraction, resulting in suboptimal outcomes. Consequently, reducing the information entropy in text feature extraction, supplementing text feature information, and enhancing the retrieval efficiency of large-scale media data are critical challenges in cross-modal retrieval research. To tackle these, this paper introduces the Text-Enhanced Graph Attention Hashing for Cross-Modal Retrieval (TEGAH) framework. TEGAH incorporates a deep text feature extraction network and a multiscale label region fusion network to minimize information entropy and optimize feature extraction. Additionally, a Graph-Attention-based modal feature fusion network is designed to efficiently integrate multimodal information, enhance the affinity of the network for different modes, and retain more semantic information. Extensive experiments on three multilabel datasets demonstrate that the TEGAH framework significantly outperforms state-of-the-art cross-modal hashing methods.
Collapse
Affiliation(s)
| | - Shuli Cheng
- College of Computer Science and Technology, Xinjiang University, Urumqi 830046, China; (Q.Z.); (A.D.); (J.C.)
| | | | | |
Collapse
|
5
|
Cui J, He Z, Huang Q, Fu Y, Li Y, Wen J. Structure-aware contrastive hashing for unsupervised cross-modal retrieval. Neural Netw 2024; 174:106211. [PMID: 38447425 DOI: 10.1016/j.neunet.2024.106211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/25/2023] [Accepted: 02/23/2024] [Indexed: 03/08/2024]
Abstract
Cross-modal hashing has attracted a lot of attention and achieved remarkable success in large-scale cross-media similarity retrieval applications because of its superior computational efficiency and low storage overhead. However, constructing similarity relationship among samples in cross-modal unsupervised hashing is challenging because of the lack of manual annotation. Most existing unsupervised methods directly use the representations extracted from the backbone of their respective modality to construct instance similarity matrices, leading to inaccurate similarity matrices and resulting in suboptimal hash codes. To address this issue, a novel unsupervised hashing model, named Structure-aware Contrastive Hashing for Unsupervised Cross-modal Retrieval (SACH), is proposed in this paper. Specifically, we concurrently employ both high-dimensional representations and discriminative representations learned by the network to construct a more informative semantic correlative matrix across modalities. Moreover, we design a multimodal structure-aware alignment network to minimize heterogeneous gap in the high-order semantic space of each modality, effectively reducing disparities within heterogeneous data sources and enhancing the consistency of semantic information across modalities. Extensive experimental results on two widely utilized datasets demonstrate the superiority of our proposed SACH method in cross-modal retrieval tasks over existing state-of-the-art methods.
Collapse
Affiliation(s)
- Jinrong Cui
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
| | - Zhipeng He
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
| | - Qiong Huang
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China; Guangzhou Key Laboratory of Intelligent Agricuture, Guangzhou, China
| | - Yulu Fu
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
| | - Yuting Li
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
| | - Jie Wen
- Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China.
| |
Collapse
|
6
|
Peng SJ, He Y, Liu X, Cheung YM, Xu X, Cui Z. Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2194-2207. [PMID: 35830398 DOI: 10.1109/tnnls.2022.3188569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Fine-grained image-text retrieval has been a hot research topic to bridge the vision and languages, and its main challenge is how to learn the semantic correspondence across different modalities. The existing methods mainly focus on learning the global semantic correspondence or intramodal relation correspondence in separate data representations, but which rarely consider the intermodal relation that interactively provide complementary hints for fine-grained semantic correlation learning. To address this issue, we propose a relation-aggregated cross-graph (RACG) model to explicitly learn the fine-grained semantic correspondence by aggregating both intramodal and intermodal relations, which can be well utilized to guide the feature correspondence learning process. More specifically, we first build semantic-embedded graph to explore both fine-grained objects and their relations of different media types, which aim not only to characterize the object appearance in each modality, but also to capture the intrinsic relation information to differentiate intramodal discrepancies. Then, a cross-graph relation encoder is newly designed to explore the intermodal relation across different modalities, which can mutually boost the cross-modal correlations to learn more precise intermodal dependencies. Besides, the feature reconstruction module and multihead similarity alignment are efficiently leveraged to optimize the node-level semantic correspondence, whereby the relation-aggregated cross-modal embeddings between image and text are discriminatively obtained to benefit various image-text retrieval tasks with high retrieval performance. Extensive experiments evaluated on benchmark datasets quantitatively and qualitatively verify the advantages of the proposed framework for fine-grained image-text retrieval and show its competitive performance with the state of the arts.
Collapse
|
7
|
Zhang Z, Peng Q, Fu S, Wang W, Cheung YM, Zhao Y, Yu S, You X. A Componentwise Approach to Weakly Supervised Semantic Segmentation Using Dual-Feedback Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7541-7554. [PMID: 35120009 DOI: 10.1109/tnnls.2022.3144194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent weakly supervised semantic segmentation methods generate pseudolabels to recover the lost position information in weak labels for training the segmentation network. Unfortunately, those pseudolabels often contain mislabeled regions and inaccurate boundaries due to the incomplete recovery of position information. It turns out that the result of semantic segmentation becomes determinate to a certain degree. In this article, we decompose the position information into two components: high-level semantic information and low-level physical information, and develop a componentwise approach to recover each component independently. Specifically, we propose a simple yet effective pseudolabels updating mechanism to iteratively correct mislabeled regions inside objects to precisely refine high-level semantic information. To reconstruct low-level physical information, we utilize a customized superpixel-based random walk mechanism to trim the boundaries. Finally, we design a novel network architecture, namely, a dual-feedback network (DFN), to integrate the two mechanisms into a unified model. Experiments on benchmark datasets show that DFN outperforms the existing state-of-the-art methods in terms of intersection-over-union (mIoU).
Collapse
|
8
|
Zheng C, Zhuang Q, Peng SJ. Efficient motion capture data recovery via relationship-aggregated graph network and temporal pattern reasoning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:11313-11327. [PMID: 37322983 DOI: 10.3934/mbe.2023501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Human motion capture (mocap) data is of crucial importance to the realistic character animation, and the missing optical marker problem caused by marker falling off or occlusions often limit its performance in real-world applications. Although great progress has been made in mocap data recovery, it is still a challenging task primarily due to the articulated complexity and long-term dependencies in movements. To tackle these concerns, this paper proposes an efficient mocap data recovery approach by using Relationship-aggregated Graph Network and Temporal Pattern Reasoning (RGN-TPR). The RGN is comprised of two tailored graph encoders, local graph encoder (LGE) and global graph encoder (GGE). By dividing the human skeletal structure into several parts, LGE encodes the high-level semantic node features and their semantic relationships in each local part, while the GGE aggregates the structural relationships between different parts for whole skeletal data representation. Further, TPR utilizes self-attention mechanism to exploit the intra-frame interactions, and employs temporal transformer to capture long-term dependencies, whereby the discriminative spatio-temporal features can be reasonably obtained for efficient motion recovery. Extensive experiments tested on public datasets qualitatively and quantitatively verify the superiorities of the proposed learning framework for mocap data recovery, and show its improved performance with the state-of-the-arts.
Collapse
Affiliation(s)
- Chuanqin Zheng
- Information Center, Xiamen Medical College, Xiamen, China
| | - Qingshuang Zhuang
- Information Center, Xiamen Medical College, Xiamen, China
- Department of Artificial Intelligence, Huaqiao University, Xiamen, China
| | - Shu-Juan Peng
- Department of Artificial Intelligence, Huaqiao University, Xiamen, China
| |
Collapse
|
9
|
Qian S, Xue D, Fang Q, Xu C. Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4794-4811. [PMID: 35788462 DOI: 10.1109/tpami.2022.3188547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the growing amount of multimodal data, cross-modal retrieval has attracted more and more attention and become a hot research topic. To date, most of the existing techniques mainly convert multimodal data into a common representation space where similarities in semantics between samples can be easily measured across multiple modalities. However, these approaches may suffer from the following limitations: 1) They overcome the modality gap by introducing loss in the common representation space, which may not be sufficient to eliminate the heterogeneity of various modalities; 2) They treat labels as independent entities and ignore label relationships, which is not conducive to establishing semantic connections across multimodal data; 3) They ignore the non-binary values of label similarity in multi-label scenarios, which may lead to inefficient alignment of representation similarity with label similarity. To tackle these problems, in this article, we propose two models to learn discriminative and modality-invariant representations for cross-modal retrieval. First, the dual generative adversarial networks are built to project multimodal data into a common representation space. Second, to model label relation dependencies and develop inter-dependent classifiers, we employ multi-hop graph neural networks (consisting of Probabilistic GNN and Iterative GNN), where the layer aggregation mechanism is suggested for using propagation information of various hops. Third, we propose a novel soft multi-label contrastive loss for cross-modal retrieval, with the soft positive sampling probability, which can align the representation similarity and the label similarity. Additionally, to adapt to incomplete-modal learning, which can have wider applications, we propose a modal reconstruction mechanism to generate missing features. Extensive experiments on three widely used benchmark datasets, i.e., NUS-WIDE, MIRFlickr, and MS-COCO, show the superiority of our proposed method.
Collapse
|
10
|
Sun Y, Wang X, Peng D, Ren Z, Shen X. Hierarchical Hashing Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1732-1744. [PMID: 37028051 DOI: 10.1109/tip.2023.3251025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With the development of video network, image set classification (ISC) has received a lot of attention and can be used for various practical applications, such as video based recognition, action recognition, and so on. Although the existing ISC methods have obtained promising performance, they often have extreme high complexity. Due to the superiority in storage space and complexity cost, learning to hash becomes a powerful solution scheme. However, existing hashing methods often ignore complex structural information and hierarchical semantics of the original features. They usually adopt a single-layer hashing strategy to transform high-dimensional data into short-length binary codes in one step. This sudden drop of dimension could result in the loss of advantageous discriminative information. In addition, they do not take full advantage of intrinsic semantic knowledge from whole gallery sets. To tackle these problems, in this paper, we propose a novel Hierarchical Hashing Learning (HHL) for ISC. Specifically, a coarse-to-fine hierarchical hashing scheme is proposed that utilizes a two-layer hash function to gradually refine the beneficial discriminative information in a layer-wise fashion. Besides, to alleviate the effects of redundant and corrupted features, we impose the $\ell _{2,1}$ norm on the layer-wise hash function. Moreover, we adopt a bidirectional semantic representation with the orthogonal constraint to keep intrinsic semantic information of all samples in whole image sets adequately. Comprehensive experiments demonstrate HHL acquires significant improvements in accuracy and running time. We will release the demo code on https://github.com/sunyuan-cs.
Collapse
|
11
|
Hu P, Zhu H, Lin J, Peng D, Zhao YP, Peng X. Unsupervised Contrastive Cross-Modal Hashing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3877-3889. [PMID: 35617190 DOI: 10.1109/tpami.2022.3177356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.
Collapse
|
12
|
Discrete matrix factorization cross-modal hashing with multi-similarity consistency. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00950-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
AbstractRecently, matrix factorization-based hashing has gained wide attention because of its strong subspace learning ability and high search efficiency. However, some problems need to be further addressed. First, uniform hash codes can be generated by collective matrix factorization, but they often cause serious loss, degrading the quality of hash codes. Second, most of them preserve the absolute similarity simply in hash codes, failing to capture the inherent semantic affinity among training data. To overcome these obstacles, we propose a Discrete Multi-similarity Consistent Matrix Factorization Hashing (DMCMFH). Specifically, an individual subspace is first learned by matrix factorization and multi-similarity consistency for each modality. Then, the subspaces are aligned by a shared semantic space to generate homogenous hash codes. Finally, an iterative-based discrete optimization scheme is presented to reduce the quantization loss. We conduct quantitative experiments on three datasets, MSCOCO, Mirflickr25K and NUS-WIDE. Compared with supervised baseline methods, DMCMFH achieves increases of $$0.22\%$$
0.22
%
, $$3.00\%$$
3.00
%
and $$0.79\%$$
0.79
%
on the image-query-text tasks for three datasets respectively, and achieves increases of $$0.21\%$$
0.21
%
, $$1.62\%$$
1.62
%
and $$0.50\%$$
0.50
%
on the text-query-image tasks for three datasets respectively.
Collapse
|
13
|
Williams-Lekuona M, Cosma G, Phillips I. A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. J Imaging 2022; 8:jimaging8120328. [PMID: 36547493 PMCID: PMC9785405 DOI: 10.3390/jimaging8120328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022] Open
Abstract
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has been on training with paired data. Paired data refers to samples with one-to-one correspondence across modalities, e.g., image and text pairs where the text sample describes the image. However, real-world applications produce unpaired data that cannot be utilised by most current CMH methods during the training process. Models that can learn from unpaired data are crucial for real-world applications such as cross-modal neural information retrieval where paired data is limited or not available to train the model. This paper provides (1) an overview of the CMH methods when applied to unpaired datasets, (2) proposes a framework that enables pairwise-constrained CMH methods to train with unpaired samples, and (3) evaluates the performance of state-of-the-art CMH methods across different pairing scenarios.
Collapse
|
14
|
Xie Y, Zeng X, Wang T, Yi Y, Xu L. Deep online cross-modal hashing by a co-training mechanism. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
15
|
Liu M, Yang Z, Li L, Li Z, Xie S. Auto-weighted collective matrix factorization with graph dual regularization for multi-view clustering. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
16
|
Fang X, Jiang K, Han N, Teng S, Zhou G, Xie S. Average Approximate Hashing-Based Double Projections Learning for Cross-Modal Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11780-11793. [PMID: 34106872 DOI: 10.1109/tcyb.2021.3081615] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-modal retrieval has attracted considerable attention for searching in large-scale multimedia databases because of its efficiency and effectiveness. As a powerful tool of data analysis, matrix factorization is commonly used to learn hash codes for cross-modal retrieval, but there are still many shortcomings. First, most of these methods only focus on preserving locality of data but they ignore other factors such as preserving reconstruction residual of data during matrix factorization. Second, the energy loss of data is not considered when the data of cross-modal are projected into a common semantic space. Third, the data of cross-modal are directly projected into a unified semantic space which is not reasonable since the data from different modalities have different properties. This article proposes a novel method called average approximate hashing (AAH) to address these problems by: 1) integrating the locality and residual preservation into a graph embedding framework by using the label information; 2) projecting data from different modalities into different semantic spaces and then making the two spaces approximate to each other so that a unified hash code can be obtained; and 3) introducing a principal component analysis (PCA)-like projection matrix into the graph embedding framework to guarantee that the projected data can preserve the main energy of data. AAH obtains the final hash codes by using an average approximate strategy, that is, using the mean of projected data of different modalities as the hash codes. Experiments on standard databases show that the proposed AAH outperforms several state-of-the-art cross-modal hashing methods.
Collapse
|
17
|
Liu X, Wang X, Cheung YM. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6306-6320. [PMID: 33979294 DOI: 10.1109/tnnls.2021.3076684] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-modal hashing, favored for its effectiveness and efficiency, has received wide attention to facilitating efficient retrieval across different modalities. Nevertheless, most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes while often involving time-consuming training procedure for handling the large-scale dataset. To tackle these issues, we formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data, so as to minimize the quantization loss of mapping such data to hamming space and propose an efficient fast discriminative discrete hashing (FDDH) approach for large-scale cross-modal retrieval. More specifically, FDDH introduces an orthogonal basis to regress the targeted hash codes of training examples to their corresponding semantic labels and utilizes the ε -dragging technique to provide provable large semantic margins. Accordingly, the discriminative power of semantic information can be explicitly captured and maximized. Moreover, an orthogonal transformation scheme is further proposed to map the nonlinear embedding data into the semantic subspace, which can well guarantee the semantic consistency between the data feature and its semantic representation. Consequently, an efficient closed-form solution is derived for discriminative hash code learning, which is very computationally efficient. In addition, an effective and stable online learning strategy is presented for optimizing modality-specific projection functions, featuring adaptivity to different training sizes and streaming data. The proposed FDDH approach theoretically approximates the bi-Lipschitz continuity, runs sufficiently fast, and also significantly improves the retrieval performance over the state-of-the-art methods. The source code is released at https://github.com/starxliu/FDDH.
Collapse
|
18
|
Online supervised collective matrix factorization hashing for cross-modal retrieval. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04189-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
19
|
Shu Z, Yong K, Yu J, Gao S, Mao C, Yu Z. Discrete asymmetric zero-shot hashing with application to cross-modal retrieval. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
20
|
|
21
|
Yang F, Ding X, Liu Y, Ma F, Cao J. Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
22
|
Liu M, Yang Z, Han W, Chen J, Sun W. Semi-supervised multi-view binary learning for large-scale image clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03205-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
|
24
|
Hu P, Peng X, Zhu H, Lin J, Zhen L, Peng D. Joint Versus Independent Multiview Hashing for Cross-View Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4982-4993. [PMID: 33119532 DOI: 10.1109/tcyb.2020.3027614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.
Collapse
|
25
|
Liu X, Cheung YM, Hu Z, He Y, Zhong B. Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2020.3007143] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
26
|
Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images. REMOTE SENSING 2021. [DOI: 10.3390/rs13132524] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.
Collapse
|
27
|
Fang Y, Li B, Li X, Ren Y. Unsupervised cross-modal similarity via Latent Structure Discrete Hashing Factorization. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106857] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
28
|
Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval. ENTROPY 2020; 22:e22111266. [PMID: 33287034 PMCID: PMC7712897 DOI: 10.3390/e22111266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/02/2020] [Accepted: 11/05/2020] [Indexed: 11/16/2022]
Abstract
Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.
Collapse
|
29
|
Zhang D, Wu XJ, Yu J. Learning latent hash codes with discriminative structure preserving for cross-modal retrieval. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00893-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|