1
|
Xi Z, Qu Z, Lu W, Luo X, Cao X. Invisible DNN Watermarking Against Model Extraction Attack. IEEE TRANSACTIONS ON CYBERNETICS 2024; PP:800-811. [PMID: 40030655 DOI: 10.1109/tcyb.2024.3514838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Deep neural network (DNN) models are widely used in various fields, such as pattern recognition and natural language processing, and provide considerable commercial value to their owners. Embedding a digital watermark in the model allows the legitimate owner to detect unauthorized use of the model. However, the existing DNN watermarking methods are vulnerable to model extraction attacks since the watermark task and the original model task are independent. In this article, a novel collaborative DNN watermarking framework is proposed to defend against model extraction attacks by establishing cooperation between the watermark generation and embedding. Specifically, the trigger samples are not only imperceptible to ensure perceptual stealth security but also infused with target-label information to guide the following feature associations. In the process of watermark embedding, the feature representation of trigger samples is forced to be similar to that of the task distribution samples via feature coupling. Consequently, the trigger samples from our framework can be recognized in the stolen model as task distribution samples, so that the ownership of the model can be successfully verified. Extensive experiments on CIFAR10, CIFAR100, and ImageNet demonstrate the effectiveness and superior performance of the proposed watermarking framework against various model extraction attacks.
Collapse
|
2
|
Yang E, Deng C, Liu M. Deep Bayesian Quantization for Supervised Neuroimage Search. MACHINE LEARNING IN MEDICAL IMAGING. MLMI (WORKSHOP) 2023; 14349:396-406. [PMID: 38390519 PMCID: PMC10883338 DOI: 10.1007/978-3-031-45676-3_40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Neuroimage retrieval plays a crucial role in providing physicians with access to previous similar cases, which is essential for case-based reasoning and evidence-based medicine. Due to low computation and storage costs, hashing-based search techniques have been widely adopted for establishing image retrieval systems. However, these methods often suffer from nonnegligible quantization loss, which can degrade the overall search performance. To address this issue, this paper presents a compact coding solution namely Deep Bayesian Quantization (DBQ), which focuses on deep compact quantization that can estimate continuous neuroimage representations and achieve superior performance over existing hashing solutions. Specifically, DBQ seamlessly combines the deep representation learning and the representation compact quantization within a novel Bayesian learning framework, where a proxy embedding-based likelihood function is developed to alleviate the sampling issue for traditional similarity supervision. Additionally, a Gaussian prior is employed to reduce the quantization losses. By utilizing pre-computed lookup tables, the proposed DBQ can enable efficient and effective similarity search. Extensive experiments conducted on 2, 008 structural MRI scans from three benchmark neuroimage datasets demonstrate that our method outperforms previous state-of-the-arts.
Collapse
Affiliation(s)
- Erkun Yang
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Xidian University, Xi'an, China
| | | | - Mingxia Liu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
3
|
Xu Q, Yang Z, Zhao Y, Cao X, Huang Q. Rethinking Label Flipping Attack: From Sample Masking to Sample Thresholding. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:7668-7685. [PMID: 37819793 DOI: 10.1109/tpami.2022.3220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Nowadays, machine learning (ML) and deep learning (DL) methods have become fundamental building blocks for a wide range of AI applications. The popularity of these methods also makes them widely exposed to malicious attacks, which may cause severe security concerns. To understand the security properties of the ML/DL methods, researchers have recently started to turn their focus to adversarial attack algorithms that could successfully corrupt the model or clean data owned by the victim with imperceptible perturbations. In this paper, we study the Label Flipping Attack (LFA) problem, where the attacker expects to corrupt an ML/DL model's performance by flipping a small fraction of the labels in the training data. Prior art along this direction adopts combinatorial optimization problems, leading to limited scalability toward deep learning models. To this end, we propose a novel minimax problem which provides an efficient reformulation of the sample selection process in LFA. In the new optimization problem, the sample selection operation could be implemented with a single thresholding parameter. This leads to a novel training algorithm called Sample Thresholding. Since the objective function is differentiable and the model complexity does not depend on the sample size, we can apply Sample Thresholding to attack deep learning models. Moreover, since the victim's behavior is not predictable in a poisonous attack setting, we have to employ surrogate models to simulate the true model employed by the victim model. Seeing the problem, we provide a theoretical analysis of such a surrogate paradigm. Specifically, we show that the performance gap between the true model employed by the victim and the surrogate model is small under mild conditions. On top of this paradigm, we extend Sample Thresholding to the crowdsourced ranking task, where labels collected from the annotators are vulnerable to adversarial attacks. Finally, experimental analyses on three real-world datasets speak to the efficacy of our method.
Collapse
|
4
|
Hu P, Zhu H, Lin J, Peng D, Zhao YP, Peng X. Unsupervised Contrastive Cross-Modal Hashing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3877-3889. [PMID: 35617190 DOI: 10.1109/tpami.2022.3177356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.
Collapse
|
5
|
Chen B, Feng Y, Dai T, Bai J, Jiang Y, Xia ST, Wang X. Adversarial Examples Generation for Deep Product Quantization Networks on Image Retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1388-1404. [PMID: 35380957 DOI: 10.1109/tpami.2022.3165024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep product quantization networks (DPQNs) have been successfully used in image retrieval tasks, due to their powerful feature extraction ability and high efficiency of encoding high-dimensional visual features. Recent studies show that deep neural networks (DNNs) are vulnerable to input with small and maliciously designed perturbations (a.k.a., adversarial examples) for classification. However, little effort has been devoted to investigating how adversarial examples affect DPQNs, which raises the potential safety hazard when deploying DPQNs in a commercial search engine. To this end, we propose an adversarial example generation framework by generating adversarial query images for DPQN-based retrieval systems. Unlike the adversarial generation for the classic image classification task that heavily relies on ground-truth labels, we alternatively perturb the probability distribution of centroids assignments for a clean query, then we can induce effective non-targeted attacks on DPQNs in white-box and black-box settings. Moreover, we further extend the non-targeted attack to a targeted attack by a novel sample space averaging scheme ([Formula: see text]AS), whose theoretical guarantee is also obtained. Extensive experiments show that our methods can create adversarial examples to successfully mislead the target DPQNs. Besides, we found that our methods both significantly degrade the retrieval performance under a wide variety of experimental settings. The source code is available at https://github.com/Kira0096/PQAG.
Collapse
|
6
|
U-Turn: Crafting Adversarial Queries with Opposite-Direction Features. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01737-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
7
|
Chen Y, Gao R, Liu F, Zhao D. ModuleNet: Knowledge-Inherited Neural Architecture Search. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11661-11671. [PMID: 34097629 DOI: 10.1109/tcyb.2021.3078573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Although neural the architecture search (NAS) can bring improvement to deep models, it always neglects precious knowledge of existing models. The computation and time costing property in NAS also means that we should not start from scratch to search, but make every attempt to reuse the existing knowledge. In this article, we discuss what kind of knowledge in a model can and should be used for a new architecture design. Then, we propose a new NAS algorithm, namely, ModuleNet, which can fully inherit knowledge from the existing convolutional neural networks. To make full use of the existing models, we decompose existing models into different modules, which also keep their weights, consisting of a knowledge base. Then, we sample and search for a new architecture according to the knowledge base. Unlike previous search algorithms, and benefiting from inherited knowledge, our method is able to directly search for architectures in the macrospace by the NSGA-II algorithm without tuning parameters in these modules. Experiments show that our strategy can efficiently evaluate the performance of a new architecture even without tuning weights in convolutional layers. With the help of knowledge we inherited, our search results can always achieve better performance on various datasets (CIFAR10, CIFAR100, and ImageNet) over original architectures.
Collapse
|
8
|
Zhu L, Wang T, Li J, Zhang Z, Shen J, Wang X. Efficient Query-based Black-Box Attack against Cross-modal Hashing Retrieval. ACM T INFORM SYST 2022. [DOI: 10.1145/3559758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Deep cross-modal hashing retrieval models inherit the vulnerability of deep neural networks. They are vulnerable to adversarial attacks, especially for the form of subtle perturbations to the inputs. Although many adversarial attack methods have been proposed to handle the robustness of hashing retrieval models, they still suffer from two problems: 1) Most of them are based on the white-box settings, which is usually unrealistic in practical application. 2) Iterative optimization for the generation of adversarial examples in them results in heavy computation. To address these problems, we propose an Efficient Query-based Black-Box Attack (EQB
2
A) against deep cross-modal hashing retrieval, which can efficiently generate adversarial examples for the black-box attack. Specifically, by sending a few query requests to the attacked retrieval system, the cross-modal retrieval model stealing is performed based on the neighbor relationship between the retrieved results and the query, thus obtaining the knockoffs to substitute the attacked system. A multi-modal knockoffs-driven adversarial generation is proposed to achieve efficient adversarial example generation. While the entire network training converges, EQB
2
A can efficiently generate adversarial examples by forward-propagation with only given benign images. Experiments show that EQB
2
A achieves superior attacking performance under the black-box setting.
Collapse
Affiliation(s)
- Lei Zhu
- Shandong Normal University Peng Cheng Laboratory, China
| | | | - Jingjing Li
- University of Electronic Science and Technology of China, China
| | - Zheng Zhang
- Harbin Institute of Technology, Shenzhen, China
| | | | | |
Collapse
|
9
|
Qu X, Ong YS, Gupta A. Frame-Correlation Transfers Trigger Economical Attacks on Deep Reinforcement Learning Policies. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7577-7590. [PMID: 33417576 DOI: 10.1109/tcyb.2020.3041265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Adversarial attack can be deemed as a necessary prerequisite evaluation procedure before the deployment of any reinforcement learning (RL) policy. Most existing approaches for generating adversarial attacks are gradient based and are extensive, viz., perturbing every pixel of every frame. In contrast, recent advances show that gradient-free selective perturbations (i.e., attacking only selected pixels and frames) could be a more realistic adversary. However, these attacks treat every frame in isolation, ignoring the relationship between neighboring states of a Markov decision process; thus resulting in high computational complexity that tends to limit their real-world plausibility due to the tight time constraint in RL. Given the above, this article showcases the first study of how transferability across frames could be exploited for boosting the creation of minimal yet powerful attacks in image-based RL. To this end, we introduce three types of frame-correlation transfers (FCTs) (i.e., anterior case transfer, random projection-based transfer, and principal components-based transfer) with varying degrees of computational complexity in generating adversaries via a genetic algorithm. We empirically demonstrate the tradeoff between the complexity and potency of the transfer mechanism by exploring four fully trained state-of-the-art policies on six Atari games. Our FCTs dramatically speed up the attack generation compared to existing methods, often reducing the computation time required to nearly zero; thus, shedding light on the real threat of real-time attacks in RL.
Collapse
|
10
|
A Fast Method for Protecting Users’ Privacy in Image Hash Retrieval System. MACHINES 2022. [DOI: 10.3390/machines10040278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Effective search engines based on deep neural networks (DNNs) can be used to search for many images, as is the case with the Google Images search engine. However, the illegal use of search engines can lead to serious compromises of privacy. Affected by various factors such as economic interests and service providers, hackers and other malicious parties can steal and tamper with the image data uploaded by users, causing privacy leakage issues in image hash retrieval. Previous work has exploited the adversarial attack to protect the user’s privacy with an approximation strategy in the white-box setting, although this method leads to slow convergence. In this study, we utilized the penalty norm, which sets a strict constraint to quantify the feature of a query image into binary code via the non-convex optimization process. Moreover, we exploited the forward–backward strategy to solve the vanishing gradient caused by the quantization function. We evaluated our method on two widely used datasets and show an attractive performance with high convergence speed. Moreover, compared with other image privacy protection methods, our method shows the best performance in terms of privacy protection and image quality.
Collapse
|
11
|
Du T, Ji S, Wang B, He S, Li J, Li B, Wei T, Jia Y, Beyah R, Wang T. D
etect
S
ec
: Evaluating the robustness of object detection models to adversarial attacks. INT J INTELL SYST 2022. [DOI: 10.1002/int.22851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tianyu Du
- Institute of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shouling Ji
- Institute of Computer Science and Technology Zhejiang University Hangzhou China
- Department of Information Science Binjiang Institute of Zhejiang University Hangzhou China
| | - Bo Wang
- Institute of Computer Science and Technology Zhejiang University Hangzhou China
| | - Sirui He
- Institute of Computer Science and Technology Zhejiang University Hangzhou China
| | - Jinfeng Li
- Institute of Computer Science and Technology Zhejiang University Hangzhou China
| | - Bo Li
- Department of Computer Science University of Illinois at Urbana‐Champaign Urbana‐Champaign Illinois USA
| | - Tao Wei
- Department of Foundational Security Ant Group Hangzhou China
| | - Yunhan Jia
- Department of Security and Privacy Technology ByteDance AI Lab Palo Alto California USA
| | - Raheem Beyah
- Institute of Electrical and Computer Engineering Georgia Institute of Technology Atlanta Georgia USA
| | - Ting Wang
- Department of Information Sciences and Technology The Pennsylvania State University State College Pennsylvania USA
| |
Collapse
|
12
|
Hu P, Peng X, Zhu H, Lin J, Zhen L, Peng D. Joint Versus Independent Multiview Hashing for Cross-View Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4982-4993. [PMID: 33119532 DOI: 10.1109/tcyb.2020.3027614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.
Collapse
|
13
|
Quadruplet-Based Deep Cross-Modal Hashing. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9968716. [PMID: 34306059 PMCID: PMC8270718 DOI: 10.1155/2021/9968716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/24/2021] [Accepted: 06/14/2021] [Indexed: 12/02/2022]
Abstract
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.
Collapse
|
14
|
Ai S, Voundi Koe AS, Huang T. Adversarial perturbation in remote sensing image recognition. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107252] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Wang D, Li C, Wen S, Han QL, Nepal S, Zhang X, Xiang Y. Daedalus: Breaking Nonmaximum Suppression in Object Detection via Adversarial Examples. IEEE TRANSACTIONS ON CYBERNETICS 2021; PP:7427-7440. [PMID: 33400667 DOI: 10.1109/tcyb.2020.3041481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article demonstrates that nonmaximum suppression (NMS), which is commonly used in object detection (OD) tasks to filter redundant detection results, is no longer secure. Considering that NMS has been an integral part of OD systems, thwarting the functionality of NMS can result in unexpected or even lethal consequences for such systems. In this article, an adversarial example attack that triggers malfunctioning of NMS in OD models is proposed. The attack, namely, Daedalus, compresses the dimensions of detection boxes to evade NMS. As a result, the final detection output contains extremely dense false positives. This can be fatal for many OD applications, such as autonomous vehicles and surveillance systems. The attack can be generalized to different OD models, such that the attack cripples various OD applications. Furthermore, a way of crafting robust adversarial examples is developed by using an ensemble of popular detection models as the substitutes. Considering the pervasive nature of model reuse in real-world OD scenarios, Daedalus examples crafted based on an ensemble of substitutes can launch attacks without knowing the parameters of the victim models. The experimental results demonstrate that the attack effectively stops NMS from filtering redundant bounding boxes. As the evaluation results suggest, Daedalus increases the false positive rate in detection results to 99.9% and reduces the mean average precision scores to 0, while maintaining a low cost of distortion on the original inputs. It also demonstrates that the attack can be practically launched against real-world OD systems via printed posters.
Collapse
|
16
|
Xu X, Wang T, Yang Y, Zuo L, Shen F, Shen HT. Cross-Modal Attention With Semantic Consistence for Image-Text Matching. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5412-5425. [PMID: 32071004 DOI: 10.1109/tnnls.2020.2967597] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The task of image-text matching refers to measuring the visual-semantic similarity between an image and a sentence. Recently, the fine-grained matching methods that explore the local alignment between the image regions and the sentence words have shown advance in inferring the image-text correspondence by aggregating pairwise region-word similarity. However, the local alignment is hard to achieve as some important image regions may be inaccurately detected or even missing. Meanwhile, some words with high-level semantics cannot be strictly corresponding to a single-image region. To tackle these problems, we address the importance of exploiting the global semantic consistence between image regions and sentence words as complementary for the local alignment. In this article, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistency (CASC) for image-text matching. The proposed CASC is a joint framework that performs cross-modal attention for local alignment and multilabel prediction for global semantic consistence. It directly extracts semantic labels from available sentence corpus without additional labor cost, which further provides a global similarity constraint for the aggregated region-word similarity obtained by the local alignment. Extensive experiments on Flickr30k and Microsoft COCO (MSCOCO) data sets demonstrate the effectiveness of the proposed CASC on preserving global semantic consistence along with the local alignment and further show its superior image-text matching performance compared with more than 15 state-of-the-art methods.
Collapse
|
17
|
Deng C, Yang E, Liu T, Tao D. Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2189-2201. [PMID: 31514156 DOI: 10.1109/tnnls.2019.2929068] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. Recent supervised hashing research has shown that deep learning-based methods can significantly outperform nondeep methods. Most existing supervised deep hashing methods exploit supervisory signals to generate similar and dissimilar image pairs for training. However, natural images can have large intraclass and small interclass variations, which may degrade the accuracy of hash codes. To address this problem, we propose a novel two-stream ConvNet architecture, which learns hash codes with class-specific representation centers. Our basic idea is that if we can learn a unified binary representation for each class as a center and encourage hash codes of images to be close to the corresponding centers, the intraclass variation will be greatly reduced. Accordingly, we design a neural network that leverages label information and outputs a unified binary representation for each class. Moreover, we also design an image network to learn hash codes from images and force these hash codes to be close to the corresponding class-specific centers. These two neural networks are then seamlessly incorporated to create a unified, end-to-end trainable framework. Extensive experiments on three popular benchmarks corroborate that our proposed method outperforms current state-of-the-art methods.
Collapse
|
18
|
Xie D, Deng C, Li C, Liu X, Tao D. Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:3626-3637. [PMID: 31940536 DOI: 10.1109/tip.2020.2963957] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Owing to the advantages of low storage cost and high query efficiency, cross-modal hashing has received increasing attention recently. As failing to bridge the inherent modality gap between modalities, most existing cross-modal hashing methods have limited capability to explore the semantic consistency information between different modality data, leading to unsatisfactory search performance. To address this problem, we propose a novel deep hashing method named Multi-Task Consistency- Preserving Adversarial Hashing (CPAH) to fully explore the semantic consistency and correlation between different modalities for efficient cross-modal retrieval. First, we design a consistency refined module (CR) to divide the representations of different modality into two irrelevant parts, i.e., modality-common and modality-private representations. Then, a multi-task adversarial learning module (MA) is presented, which can make the modality-common representation of different modalities close to each other on feature distribution and semantic consistency. Finally, the compact and powerful hash codes can be generated from modality-common representation. Comprehensive evaluations conducted on three representative cross-modal benchmark datasets illustrate our method is superior to the state-of-the-art cross-modal hashing methods.
Collapse
|
19
|
Deng C, Yang E, Liu T, Li J, Liu W, Tao D. Unsupervised Semantic-Preserving Adversarial Hashing for Image Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4032-4044. [PMID: 30872226 DOI: 10.1109/tip.2019.2903661] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Hashing plays a pivotal role in nearest-neighbor searching for large-scale image retrieval. Recently, deep learning-based hashing methods have achieved promising performance. However, most of these deep methods involve discriminative models, which require large-scale, labeled training datasets, thus hindering their real-world applications. In this paper, we propose a novel strategy to exploit the semantic similarity of the training data and design an efficient generative adversarial framework to learn binary hash codes in an unsupervised manner. Specifically, our model consists of three different neural networks: an encoder network to learn hash codes from images, a generative network to generate images from hash codes, and a discriminative network to distinguish between pairs of hash codes and images. By adversarially training these networks, we successfully learn mutually coherent encoder and generative networks, and can output efficient hash codes from the encoder network. We also propose a novel strategy, which utilizes both feature and neighbor similarities, to construct a semantic similarity matrix, then use this matrix to guide the hash code learning process. Integrating the supervision of this semantic similarity matrix into the adversarial learning framework can efficiently preserve the semantic information of training data in Hamming space. The experimental results on three widely used benchmarks show that our method not only significantly outperforms several state-of-the-art unsupervised hashing methods, but also achieves comparable performance with popular supervised hashing methods.
Collapse
|