1
|
Wang Z, Gao Z, Yang Y, Wang G, Jiao C, Shen HT. Geometric Matching for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5509-5521. [PMID: 38652629 DOI: 10.1109/tnnls.2024.3381347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Despite its significant progress, cross-modal retrieval still suffers from one-to-many matching cases, where the multiplicity of semantic instances in another modality could be acquired by a given query. However, existing approaches usually map heterogeneous data into the learned space as deterministic point vectors. In spite of their remarkable performance in matching the most similar instance, such deterministic point embedding suffers from the insufficient representation of rich semantics in one-to-many correspondence. To address the limitations, we intuitively extend a deterministic point into a closed geometry and develop geometric representation learning methods for cross-modal retrieval. Thus, a set of points inside such a geometry could be semantically related to many candidates, and we could effectively capture the semantic uncertainty. We then introduce two types of geometric matching for one-to-many correspondence, i.e., point-to-rectangle matching (dubbed P2RM) and rectangle-to-rectangle matching (termed R2RM). The former treats all retrieved candidates as rectangles with zero volume (equivalent to points) and the query as a box, while the latter encodes all heterogeneous data into rectangles. Therefore, we could evaluate semantic similarity among heterogeneous data by the Euclidean distance from a point to a rectangle or the volume of intersection between two rectangles. Additionally, both strategies could be easily employed for off-the-self approaches and further improve the retrieval performance of baselines. Under various evaluation metrics, extensive experiments and ablation studies on several commonly used datasets, two for image-text matching and two for video-text retrieval, demonstrate our effectiveness and superiority.
Collapse
|
2
|
Fan W, Zhang C, Li H, Jia X, Wang G. Three-Stage Semisupervised Cross-Modal Hashing With Pairwise Relations Exploitation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:260-273. [PMID: 37023166 DOI: 10.1109/tnnls.2023.3263221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.
Collapse
|
3
|
Jin L, Li Z, Pan Y, Tang J. Relational Consistency Induced Self-Supervised Hashing for Image Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1482-1494. [PMID: 37995167 DOI: 10.1109/tnnls.2023.3333294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
This article proposes a new hashing framework named relational consistency induced self-supervised hashing (RCSH) for large-scale image retrieval. To capture the potential semantic structure of data, RCSH explores the relational consistency between data samples in different spaces, which learns reliable data relationships in the latent feature space and then preserves the learned relationships in the Hamming space. The data relationships are uncovered by learning a set of prototypes that group similar data samples in the latent feature space. By uncovering the semantic structure of the data, meaningful data-to-prototype and data-to-data relationships are jointly constructed. The data-to-prototype relationships are captured by constraining the prototype assignments generated from different augmented views of an image to be the same. Meanwhile, these data-to-prototype relationships are preserved to learn informative compact hash codes by matching them with these reliable prototypes. To accomplish this, a novel dual prototype contrastive loss is proposed to maximize the agreement of prototype assignments in the latent feature space and Hamming space. The data-to-data relationships are captured by enforcing the distribution of pairwise similarities in the latent feature space and Hamming space to be consistent, which makes the learned hash codes preserve meaningful similarity relationships. Extensive experimental results on four widely used image retrieval datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods. Besides, the proposed method achieves promising performance in out-of-domain retrieval tasks, which shows its good generalization ability. The source code and models are available at https://github.com/IMAG-LuJin/RCSH.
Collapse
|
4
|
Sun Y, Zhao Z, Tong H, Sun B, Liu Y, Ren N, You S. Machine Learning Models for Inverse Design of the Electrochemical Oxidation Process for Water Purification. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17990-18000. [PMID: 37189261 DOI: 10.1021/acs.est.2c08771] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
In this study, a machine learning (ML) framework is developed toward target-oriented inverse design of the electrochemical oxidation (EO) process for water purification. The XGBoost model exhibited the best performances for prediction of reaction rate (k) based on training the data set relevant to pollutant characteristics and reaction conditions, indicated by Rext2 of 0.84 and RMSEext of 0.79. Based on 315 data points collected from the literature, the current density, pollutant concentration, and gap energy (Egap) were identified to be the most impactful parameters available for the inverse design of the EO process. In particular, adding reaction conditions as model input features allowed provision of more available information and an increase in the sample size of the data set to improve the model accuracy. The feature importance analysis was performed for revealing the data pattern and feature interpretation by using Shapley additive explanations (SHAP). The ML-based inverse design for the EO process was generalized to a random case for tailoring the optimum conditions with phenol and 2,4-dichlorophenol (2,4-DCP) serving as model pollutants. The resulting predicted k values were close to the experimental k values by experimental verification, accounting for the relative error lower than 5%. This study provides a paradigm shift from conventional trial-and-error mode to data-driven mode for advancing research and development of the EO process by a time-saving, labor-effective, and environmentally friendly target-oriented strategy, which makes electrochemical water purification more efficient, more economic, and more sustainable in the context of global carbon peaking and carbon neutrality.
Collapse
Affiliation(s)
- Ye Sun
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Zhiyuan Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Hailong Tong
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin 150069, P. R. China
| | - Baiming Sun
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin 150069, P. R. China
| | - Yanbiao Liu
- College of Environmental Science and Engineering, Textile Pollution Controlling Engineering Center of the Ministry of Ecology and Environment, Donghua University, Shanghai 201620, China
| | - Nanqi Ren
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Shijie You
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| |
Collapse
|
5
|
Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y. Soft Person Reidentification Network Pruning via Blockwise Adjacent Filter Decaying. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13293-13307. [PMID: 34910650 DOI: 10.1109/tcyb.2021.3130047] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep learning has shown significant successes in person reidentification (re-id) tasks. However, most existing works focus on discriminative feature learning and impose complex neural networks, suffering from low inference efficiency. In fact, feature extraction time is also crucial for real-world applications and lightweight models are needed. Prevailing pruning methods usually pay attention to compact classification models. However, these methods are suboptimal for compacting re-id models, which usually produce continuous features and are sensitive to network pruning. The key point of pruning re-id models is how to retain the original filter distribution in continuous features as much as possible. In this work, we propose a blockwise adjacent filter decaying method to fill this gap. Specifically, given a trained model, we first evaluate the redundancy of filters based on the adjacency relationships to preserve the original filter distribution. Second, previous layerwise pruning methods ignore that discriminative information is enhanced block-by-block. Therefore, we propose a blockwise filter pruning strategy to better utilize the block relations in the pretrained model. Third, we propose a novel filter decaying policy to progressively reduce the scale of redundant filters. Different from conventional soft filter pruning that directly sets the filter values as zeros, the proposed filter decaying can keep the pretrained knowledge as much as possible. We evaluate our method on three popular person reidentification datasets, that is: 1) Market-1501; 2) DukeMTMC-reID; and 3) MSMT17_V1. The proposed method shows superior performance to the existing state-of-the-art pruning methods. After pruning over 91.9% parameters on DukeMTMC-reID, the Rank-1 accuracy only drops 3.7%, demonstrating its effectiveness for compacting person reidentification.
Collapse
|
6
|
Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu XJ. Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08006-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Fast unsupervised consistent and modality-specific hashing for multimedia retrieval. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08008-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Yang F, Zhang QX, Ding XJ, Ma FM, Cao J, Tong DY. Semantic preserving asymmetric discrete hashing for cross-modal retrieval. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04282-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Zhang G, Wei S, Pang H, Qiu S, Zhao Y. Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5976-5988. [PMID: 36094980 DOI: 10.1109/tip.2022.3204213] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Composed image retrieval aims at retrieving the desired images, given a reference image and a text piece. To handle this task, two important subprocesses should be modeled reasonably. One is to erase irrelated details of the reference image against the text piece, and the other is to replenish the desired details in the image against the text piece. Nowadays, the existing methods neglect to distinguish between the two subprocesses and implicitly put them together to solve the composed image retrieval task. To explicitly and orderly model the two subprocesses of the task, we propose a novel composed image retrieval method which contains three key components, i.e., Multi-semantic Dynamic Suppression module (MDS), Text-semantic Complementary Selection module (TCS), and Semantic Space Alignment constraints (SSA). Concretely, MDS is to erase irrelated details of the reference image by suppressing its semantic features. TCS aims to select and enhance the semantic features of the text piece and then replenish them to the reference image. In the end, to facilitate the erasure and replenishment subprocesses, SSA aligns the semantics of the two modality features in the final space. Extensive experiments on three benchmark datasets (Shoes, FashionIQ, and Fashion200K) show the superior performance of our approach against state-of-the-art methods.
Collapse
|
10
|
Qin J, Fei L, Zhang Z, Wen J, Xu Y, Zhang D. Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5343-5358. [PMID: 35925845 DOI: 10.1109/tip.2022.3195059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the dramatic increase in the amount of multimedia data, cross-modal similarity retrieval has become one of the most popular yet challenging problems. Hashing offers a promising solution for large-scale cross-modal data searching by embedding the high-dimensional data into the low-dimensional similarity preserving Hamming space. However, most existing cross-modal hashing usually seeks a semantic representation shared by multiple modalities, which cannot fully preserve and fuse the discriminative modal-specific features and heterogeneous similarity for cross-modal similarity searching. In this paper, we propose a joint specifics and consistency hash learning method for cross-modal retrieval. Specifically, we introduce an asymmetric learning framework to fully exploit the label information for discriminative hash code learning, where 1) each individual modality can be better converted into a meaningful subspace with specific information, 2) multiple subspaces are semantically connected to capture consistent information, and 3) the integration complexity of different subspaces is overcome so that the learned collaborative binary codes can merge the specifics with consistency. Then, we introduce an alternatively iterative optimization to tackle the specifics and consistency hashing learning problem, making it scalable for large-scale cross-modal retrieval. Extensive experiments on five widely used benchmark databases clearly demonstrate the effectiveness and efficiency of our proposed method on both one-cross-one and one-cross-two retrieval tasks.
Collapse
|
11
|
Zhang Z, Li Z, Wei K, Pan S, Deng C. A survey on multimodal-guided visual content synthesis. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Li Q, Tian X, Ng WW, Pelillo M. Hashing-based affinity matrix for dominant set clustering. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Li X, Pu B. Regularized supervised novelty detection and its application in activity monitoring. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03782-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
14
|
Jing T, Xia H, Hamm J, Ding Z. Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3657-3668. [PMID: 35576409 DOI: 10.1109/tip.2022.3173815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Zero-shot sketch-based image retrieval (ZS-SBIR) has attracted great attention recently, due to the potential application of sketch-based retrieval under zero-shot scenarios, where the categories of query sketches and gallery photos are not observed in the training stage. However, it is still under insufficient exploration for the general and practical scenario when the query sketches and gallery photos contain both seen and unseen categories. Such a problem is defined as generalized zero-shot sketch-based image retrieval (GZS-SBIR), which is the focus of this work. To this end, we propose a novel Augmented Multi-modality Fusion (AMF) framework to generalize seen concepts to unobserved ones efficiently. Specifically, a novel knowledge discovery module named cross-domain augmentation is designed in both visual and semantic space to mimic novel knowledge unseen from the training stage, which is the key to handling the GZS-SBIR challenge. Moreover, a triplet domain alignment module is proposed to couple the cross-domain distribution between photo and sketch in visual space. To enhance the robustness of our model, we explore embedding propagation to refine both visual and semantic features by removing undesired noise. Eventually, visual-semantic fusion representations are concatenated for further domain discrimination and task-specific recognition, which tend to trigger the cross-domain alignment in both visual and semantic feature space. Experimental evaluations are conducted on popular ZS-SBIR benchmarks as well as a new evaluation protocol designed for GZS-SBIR from DomainNet dataset with more diverse sub-domains, and the promising results demonstrate the superiority of the proposed solution over other baselines. The source code is available at https://github.com/scottjingtt/AMF_GZS_SBIR.git.
Collapse
|
15
|
|
16
|
Zhu H, Li L, Wu J, Zhao S, Ding G, Shi G. Personalized Image Aesthetics Assessment via Meta-Learning With Bilevel Gradient Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1798-1811. [PMID: 32525805 DOI: 10.1109/tcyb.2020.2984670] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Typical image aesthetics assessment (IAA) is modeled for the generic aesthetics perceived by an "average" user. However, such generic aesthetics models neglect the fact that users' aesthetic preferences vary significantly depending on their unique preferences. Therefore, it is essential to tackle the issue for personalized IAA (PIAA). Since PIAA is a typical small sample learning (SSL) problem, existing PIAA models are usually built by fine-tuning the well-established generic IAA (GIAA) models, which are regarded as prior knowledge. Nevertheless, this kind of prior knowledge based on "average aesthetics" fails to incarnate the aesthetic diversity of different people. In order to learn the shared prior knowledge when different people judge aesthetics, that is, learn how people judge image aesthetics, we propose a PIAA method based on meta-learning with bilevel gradient optimization (BLG-PIAA), which is trained using individual aesthetic data directly and generalizes to unknown users quickly. The proposed approach consists of two phases: 1) meta-training and 2) meta-testing. In meta-training, the aesthetics assessment of each user is regarded as a task, and the training set of each task is divided into two sets: 1) support set and 2) query set. Unlike traditional methods that train a GIAA model based on average aesthetics, we train an aesthetic meta-learner model by bilevel gradient updating from the support set to the query set using many users' PIAA tasks. In meta-testing, the aesthetic meta-learner model is fine-tuned using a small amount of aesthetic data of a target user to obtain the PIAA model. The experimental results show that the proposed method outperforms the state-of-the-art PIAA metrics, and the learned prior model of BLG-PIAA can be quickly adapted to unseen PIAA tasks.
Collapse
|
17
|
Dual-ISM: Duality-Based Image Sequence Matching for Similar Image Search. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, we propose the duality-based image sequence matching method, which is called Dual-ISM, a subsequence matching method for searching for similar images. We first extract feature points from the given image data and configure the feature vectors as one data sequence. Next, the feature vectors are configured in the form of a disjoint window, and a low-dimensional transformation is carried out. Subsequently, the query image that is entered to construct the candidate set is similarly subjected to a low-dimensional transformation, and the low-dimensional transformed window of the data sequence and window that are less than the allowable value, ε, is regarded as the candidate set using a distance calculation. Finally, similar images are searched in the candidate set using the distance calculation that are based on the original feature vector.
Collapse
|
18
|
Zhu J, Shu Y, Zhang J, Wang X, Wu S. Triplet-object loss for large scale deep image retrieval. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-021-01330-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
19
|
Xiang X, Zhang Y, Jin L, Li Z, Tang J. Sub-Region Localized Hashing for Fine-Grained Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:314-326. [PMID: 34871171 DOI: 10.1109/tip.2021.3131042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image hashing is challenging due to the difficulties of capturing discriminative local information to generate hash codes. On the one hand, existing methods usually extract local features with the dense attention mechanism by focusing on dense local regions, which cannot contain diverse local information for fine-grained hashing. On the other hand, hash codes of the same class suffer from large intra-class variation of fine-grained images. To address the above problems, this work proposes a novel sub-Region Localized Hashing (sRLH) to learn intra-class compact and inter-class separable hash codes that also contain diverse subtle local information for efficient fine-grained image retrieval. Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map. Different from localizing dense local regions, these peaks can guide the sub-region localization module to capture multifarious local discriminative information by paying close attention to dispersive local regions. To mitigate intra-class variations, hash codes of the same class are enforced to approach one common binary center. Meanwhile, the gram-schmidt orthogonalization is performed on the binary centers to make the hash codes inter-class separable. Extensive experimental results on four widely used fine-grained image retrieval datasets demonstrate the superiority of sRLH to several state-of-the-art methods. The source code of sRLH will be released at https://github.com/ZhangYajie-NJUST/sRLH.git.
Collapse
|
20
|
Unsupervised feature disentanglement for video retrieval in minimally invasive surgery. Med Image Anal 2021; 75:102296. [PMID: 34781159 DOI: 10.1016/j.media.2021.102296] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 10/19/2021] [Accepted: 10/27/2021] [Indexed: 11/23/2022]
Abstract
In this paper, we propose a novel method of Unsupervised Disentanglement of Scene and Motion (UDSM) representations for minimally invasive surgery video retrieval within large databases, which has the potential to advance intelligent and efficient surgical teaching systems. To extract more discriminative video representations, two designed encoders with a triplet ranking loss and an adversarial learning mechanism are established to respectively capture the spatial and temporal information for achieving disentangled features from each frame with promising interpretability. In addition, the long-range temporal dependencies are improved in an integrated video level using a temporal aggregation module and then a set of compact binary codes that carries representative features is yielded to realize fast retrieval. The entire framework is trained in an unsupervised scheme, i.e., purely learning from raw surgical videos without using any annotation. We construct two large-scale minimally invasive surgery video datasets based on the public dataset Cholec80 and our in-house dataset of laparoscopic hysterectomy, to establish the learning process and validate the effectiveness of our proposed method qualitatively and quantitatively on the surgical video retrieval task. Extensive experiments show that our approach significantly outperforms the state-of-the-art video retrieval methods on both datasets, revealing a promising future for injecting intelligence in the next generation of surgical teaching systems.
Collapse
|
21
|
Hu P, Peng X, Zhu H, Lin J, Zhen L, Peng D. Joint Versus Independent Multiview Hashing for Cross-View Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4982-4993. [PMID: 33119532 DOI: 10.1109/tcyb.2020.3027614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.
Collapse
|
22
|
Yuan M, Qin B, Li J, Qian J, Xin Y. Hidden multi-distance loss-based full-convolution hashing. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
23
|
Yang Z, Yang L, Huang W, Sun L, Long J. Enhanced Deep Discrete Hashing with semantic-visual similarity for image retrieval. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2021.102648] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Hu W, Wu L, Jian M, Chen Y, Yu H. Cosine metric supervised deep hashing with balanced similarity. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.093] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
25
|
Quadruplet-Based Deep Cross-Modal Hashing. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9968716. [PMID: 34306059 PMCID: PMC8270718 DOI: 10.1155/2021/9968716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/24/2021] [Accepted: 06/14/2021] [Indexed: 12/02/2022]
Abstract
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.
Collapse
|
26
|
Ma L, Li X, Shi Y, Huang L, Huang Z, Wu J. Learning discrete class-specific prototypes for deep semantic hashing. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.057] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
27
|
Li H, Pang J, Tao D, Yu Z. Cross adversarial consistency self-prediction learning for unsupervised domain adaptation person re-identification. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.01.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
28
|
Gui J, Cao Y, Qi H, Li K, Ye J, Liu C, Xu X. Fast kNN Search in Weighted Hamming Space With Multiple Tables. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3985-3994. [PMID: 33780338 DOI: 10.1109/tip.2021.3066907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hashing methods have been widely used in Approximate Nearest Neighbor (ANN) search for big data due to low storage requirements and high search efficiency. These methods usually map the ANN search for big data into the k -Nearest Neighbor ( k NN) search problem in Hamming space. However, Hamming distance calculation ignores the bit-level distinction, leading to confusing ranking. In order to further increase search accuracy, various bit-level weights have been proposed to rank hash codes in weighted Hamming space. Nevertheless, existing ranking methods in weighted Hamming space are almost based on exhaustive linear scan, which is time consuming and not suitable for large datasets. Although Multi-Index hashing that is a sub-linear search method has been proposed, it relies on Hamming distance rather than weighted Hamming distance. To address this issue, we propose an exact k NN search approach with Multiple Tables in Weighted Hamming space named WHMT, in which the distribution of bit-level weights is incorporated into the multi-index building. By WHMT, we can get the optimal candidate set for exact k NN search in weighted Hamming space without exhaustive linear scan. Experimental results show that WHMT can achieve dramatic speedup up to 69.8 times over linear scan baseline without losing accuracy in weighted Hamming space.
Collapse
|
29
|
Yang Z, Yang L, Raymond OI, Zhu L, Huang W, Liao Z, Long J. NSDH: A Nonlinear Supervised Discrete Hashing framework for large-scale cross-modal retrieval. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106818] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
30
|
Fang Y, Li B, Li X, Ren Y. Unsupervised cross-modal similarity via Latent Structure Discrete Hashing Factorization. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106857] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
31
|
Shao H, Zhong D, Du X. A deep biometric hash learning framework for three advanced hand‐based biometrics. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Huikai Shao
- School of Automation Science and Engineering Xi'an Jiaotong University Xi'an Shaanxi China
| | - Dexing Zhong
- School of Automation Science and Engineering Xi'an Jiaotong University Xi'an Shaanxi China
- State Key Lab. for Novel Software Technology Nanjing University Nanjing China
- Pazhou Lab Guangzhou China
| | - Xuefeng Du
- School of Automation Science and Engineering Xi'an Jiaotong University Xi'an Shaanxi China
| |
Collapse
|
32
|
Xiao X, Chen Y, Gong YJ, Zhou Y. Prior Knowledge Regularized Multiview Self-Representation and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1325-1338. [PMID: 32310792 DOI: 10.1109/tnnls.2020.2984625] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
To learn the self-representation matrices/tensor that encodes the intrinsic structure of the data, existing multiview self-representation models consider only the multiview features and, thus, impose equal membership preference across samples. However, this is inappropriate in real scenarios since the prior knowledge, e.g., explicit labels, semantic similarities, and weak-domain cues, can provide useful insights into the underlying relationship of samples. Based on this observation, this article proposes a prior knowledge regularized multiview self-representation (P-MVSR) model, in which the prior knowledge, multiview features, and high-order cross-view correlation are jointly considered to obtain an accurate self-representation tensor. The general concept of "prior knowledge" is defined as the complement of multiview features, and the core of P-MVSR is to take advantage of the membership preference, which is derived from the prior knowledge, to purify and refine the discovered membership of the data. Moreover, P-MVSR adopts the same optimization procedure to handle different prior knowledge and, thus, provides a unified framework for weakly supervised clustering and semisupervised classification. Extensive experiments on real-world databases demonstrate the effectiveness of the proposed P-MVSR model.
Collapse
|
33
|
Feng H, Wang N, Tang J, Chen J, Chen F. Multi-granularity feature learning network for deep hashing. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
34
|
Meng M, Wang H, Yu J, Chen H, Wu J. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:986-1000. [PMID: 33232233 DOI: 10.1109/tip.2020.3038365] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hashing-based techniques have provided attractive solutions to cross-modal similarity search when addressing vast quantities of multimedia data. However, existing cross-modal hashing (CMH) methods face two critical limitations: 1) there is no previous work that simultaneously exploits the consistent or modality-specific information of multi-modal data; 2) the discriminative capabilities of pairwise similarity is usually neglected due to the computational cost and storage overhead. Moreover, to tackle the discrete constraints, relaxation-based strategy is typically adopted to relax the discrete problem to the continuous one, which severely suffers from large quantization errors and leads to sub-optimal solutions. To overcome the above limitations, in this article, we present a novel supervised CMH method, namely Asymmetric Supervised Consistent and Specific Hashing (ASCSH). Specifically, we explicitly decompose the mapping matrices into the consistent and modality-specific ones to sufficiently exploit the intrinsic correlation between different modalities. Meanwhile, a novel discrete asymmetric framework is proposed to fully explore the supervised information, in which the pairwise similarity and semantic labels are jointly formulated to guide the hash code learning process. Unlike existing asymmetric methods, the discrete asymmetric structure developed is capable of solving the binary constraint problem discretely and efficiently without any relaxation. To validate the effectiveness of the proposed approach, extensive experiments on three widely used datasets are conducted and encouraging results demonstrate the superiority of ASCSH over other state-of-the-art CMH methods.
Collapse
|
35
|
Peng X, Feng J, Zhou JT, Lei Y, Yan S. Deep Subspace Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5509-5521. [PMID: 32078567 DOI: 10.1109/tnnls.2020.2968848] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we propose a deep extension of sparse subspace clustering, termed deep subspace clustering with L1-norm (DSC-L1). Regularized by the unit sphere distribution assumption for the learned deep features, DSC-L1 can infer a new data affinity matrix by simultaneously satisfying the sparsity principle of SSC and the nonlinearity given by neural networks. One of the appealing advantages brought by DSC-L1 is that when original real-world data do not meet the class-specific linear subspace distribution assumption, DSC-L1 can employ neural networks to make the assumption valid with its nonlinear transformations. Moreover, we prove that our neural network could sufficiently approximate the minimizer under mild conditions. To the best of our knowledge, this could be one of the first deep-learning-based subspace clustering methods. Extensive experiments are conducted on four real-world data sets to show that the proposed method is significantly superior to 17 existing methods for subspace clustering on handcrafted features and raw data.
Collapse
|
36
|
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval. Neural Netw 2020; 134:143-162. [PMID: 33310483 DOI: 10.1016/j.neunet.2020.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 11/10/2020] [Accepted: 11/23/2020] [Indexed: 11/23/2022]
Abstract
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap" among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap," the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap" among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap" in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods.
Collapse
|
37
|
Kan S, Cen Y, Cen Y, Vladimir M, Li Y, He Z. Zero-Shot Learning to Index on Semantic Trees for Scalable Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:501-516. [PMID: 33186117 DOI: 10.1109/tip.2020.3036779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this study, we develop a new approach, called zero-shot learning to index on semantic trees (LTI-ST), for efficient image indexing and scalable image retrieval. Our method learns to model the inherent correlation structure between visual representations using a binary semantic tree from training images which can be effectively transferred to new test images from unknown classes. Based on predicted correlation structure, we construct an efficient indexing scheme for the whole test image set. Unlike existing image index methods, our proposed LTI-ST method has the following two unique characteristics. First, it does not need to analyze the test images in the query database to construct the index structure. Instead, it is directly predicted by a network learnt from the training set. This zero-shot capability is critical for flexible, distributed, and scalable implementation and deployment of the image indexing and retrieval services at large scales. Second, unlike the existing distance-based index methods, our index structure is learnt using the LTI-ST deep neural network with binary encoding and decoding on a hierarchical semantic tree. Our extensive experimental results on benchmark datasets and ablation studies demonstrate that the proposed LTI-ST method outperforms existing index methods by a large margin while providing the above new capabilities which are highly desirable in practice.
Collapse
|
38
|
Zeng H, Zhang H, Zhu L. Label consistent locally linear embedding based cross-modal hashing. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2019.102136] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
39
|
|
40
|
|
41
|
Deep multilevel similarity hashing with fine-grained features for multi-label image retrieval. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
42
|
|
43
|
Scalable deep asymmetric hashing via unequal-dimensional embeddings for image similarity search. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.036] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
44
|
Yang E, Yao D, Cao B, Guan H, Yap PT, Shen D, Liu M. Deep Disentangled Hashing with Momentum Triplets for Neuroimage Search. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2020; 12261:191-201. [PMID: 34746936 PMCID: PMC8570551 DOI: 10.1007/978-3-030-59710-8_19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Neuroimaging has been widely used in computer-aided clinical diagnosis and treatment, and the rapid increase of neuroimage repositories introduces great challenges for efficient neuroimage search. Existing image search methods often use triplet loss to capture high-order relationships between samples. However, we find that the traditional triplet loss is difficult to pull positive and negative sample pairs to make their Hamming distance discrepancies larger than a small fixed value. This may reduce the discriminative ability of learned hash code and degrade the performance of image search. To address this issue, in this work, we propose a deep disentangled momentum hashing (DDMH) framework for neuroimage search. Specifically, we first investigate the original triplet loss and find that this loss function can be determined by the inner product of hash code pairs. Accordingly, we disentangle hash code norms and hash code directions and analyze the role of each part. By decoupling the loss function from the hash code norm, we propose a unique disentangled triplet loss, which can effectively push positive and negative sample pairs by desired Hamming distance discrepancies for hash codes with different lengths. We further develop a momentum triplet strategy to address the problem of insufficient triplet samples caused by small batch-size for 3D neuroimages. With the proposed disentangled triplet loss and the momentum triplet strategy, we design an end-to-end trainable deep hashing framework for neuroimage search. Comprehensive empirical evidence on three neuroimage datasets shows that DDMH has better performance in neuroimage search compared to several state-of-the-art methods.
Collapse
Affiliation(s)
- Erkun Yang
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dongren Yao
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, University of Chinese Academy of Sciences, Beijing 100190, China
| | - Bing Cao
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hao Guan
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Pew-Thian Yap
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dinggang Shen
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mingxia Liu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
45
|
Deng C, Xu X, Wang H, Yang M, Tao D. Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8892-8902. [PMID: 32915736 DOI: 10.1109/tip.2020.3020383] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task that involves searching natural images through the use of free-hand sketches under the zero-shot scenario. Most previous methods project the sketch and image features into a low-dimensional common space for efficient retrieval, and meantime align the projected features to their semantic features (e.g., category-level word vectors) in order to transfer knowledge from seen to unseen classes. However, the projection and alignment are always coupled; as a result, there is a lack of alignment that consequently leads to unsatisfactory zero-shot retrieval performance. To address this issue, we propose a novel progressive cross-modal semantic network. More specifically, it first explicitly aligns the sketch and image features to semantic features, then projects the aligned features to a common space for subsequent retrieval. We further employ cross-reconstruction loss to encourage the aligned features to capture complete knowledge about the two modalities, along with multi-modal Euclidean loss that guarantees similarity between the retrieval features from a sketch-image pair. Extensive experiments conducted on two popular large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art competitors to a remarkable extent: by more than 3% on the Sketchy dataset and about 6% on the TU-Berlin dataset in terms of retrieval accuracy.
Collapse
|
46
|
Abstract
The existing learning-based unsupervised hashing method usually uses a pre-trained network to extract features, and then uses the extracted feature vectors to construct a similarity matrix which guides the generation of hash codes through gradient descent. Existing research shows that the algorithm based on gradient descent will cause the hash codes of the paired images to be updated toward each other’s position during the training process. For unsupervised training, this situation will cause large fluctuations in the hash code during training and limit the learning efficiency of the hash code. In this paper, we propose a method named Deep Unsupervised Hashing with Gradient Attention (UHGA) to solve this problem. UHGA mainly includes the following contents: (1) use pre-trained network models to extract image features; (2) calculate the cosine distance of the corresponding features of the pair of images, and construct a similarity matrix through the cosine distance to guide the generation of hash codes; (3) a gradient attention mechanism is added during the training of the hash code to pay attention to the gradient. Experiments on two existing public datasets show that our proposed method can obtain more discriminating hash codes.
Collapse
|
47
|
Li H, He X, Yu Z, Luo J. Noise-robust image fusion with low-rank sparse decomposition guided by external patch prior. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
48
|
Deng C, Yang E, Liu T, Tao D. Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2189-2201. [PMID: 31514156 DOI: 10.1109/tnnls.2019.2929068] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. Recent supervised hashing research has shown that deep learning-based methods can significantly outperform nondeep methods. Most existing supervised deep hashing methods exploit supervisory signals to generate similar and dissimilar image pairs for training. However, natural images can have large intraclass and small interclass variations, which may degrade the accuracy of hash codes. To address this problem, we propose a novel two-stream ConvNet architecture, which learns hash codes with class-specific representation centers. Our basic idea is that if we can learn a unified binary representation for each class as a center and encourage hash codes of images to be close to the corresponding centers, the intraclass variation will be greatly reduced. Accordingly, we design a neural network that leverages label information and outputs a unified binary representation for each class. Moreover, we also design an image network to learn hash codes from images and force these hash codes to be close to the corresponding class-specific centers. These two neural networks are then seamlessly incorporated to create a unified, end-to-end trainable framework. Extensive experiments on three popular benchmarks corroborate that our proposed method outperforms current state-of-the-art methods.
Collapse
|
49
|
|
50
|
|