1
|
Shen X, Chen Y, Liu W, Zheng Y, Sun QS, Pan S. Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7997-8009. [PMID: 39028597 DOI: 10.1109/tnnls.2024.3421583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/21/2024]
Abstract
Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g., "ocean" and "cloud," often co-occur. However, existing cross-modal hashing methods overlook label dependency that is crucial for improving performance. To fulfill this gap, this article proposes graph convolutional multi-label hashing (GCMLH) for effective multi-label cross-modal retrieval. Specifically, GCMLH first generates word embedding of each label and develops label encoder to learn highly correlated label embedding via graph convolutional network (GCN). In addition, GCMLH develops feature encoder for each modality, and feature fusion module to generate highly semantic feature via GCN. GCMLH uses teacher-student learning scheme to transfer knowledge from the teacher modules, i.e., label encoder and feature fusion module, to the student module, i.e., feature encoder, such that learned hash code can well exploit multi-label dependency and multimodal semantic structure. Extensive empirical results on several benchmarks demonstrate the superiority of the proposed method over existing state-of-the-arts.
Collapse
|
2
|
Yuan L, Wang T, Zhang X, Tay FEH, Jie Z, Tian Y, Liu W, Feng J. Learnable Central Similarity Quantization for Efficient Image and Video Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18717-18730. [PMID: 38090871 DOI: 10.1109/tnnls.2023.3321148] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Data-dependent hashing methods aim to learn hash functions from the pairwise or triplet relationships among the data, which often lead to low efficiency and low collision rate by only capturing the local distribution of the data. To solve the limitation, we propose central similarity, in which the hash codes of similar data pairs are encouraged to approach a common center and those of dissimilar pairs to converge to different centers. As a new global similarity metric, central similarity can improve the efficiency and retrieval accuracy of hash learning. By introducing a new concept, hash centers, we principally formulate the computation of the proposed central similarity metric, in which the hash centers refer to a set of points scattered in the Hamming space with a sufficient mutual distance between each other. To construct well-separated hash centers, we provide two efficient methods: 1) leveraging the Hadamard matrix and Bernoulli distributions to generate data-independent hash centers and 2) learning data-dependent hash centers from data representations. Based on the proposed similarity metric and hash centers, we propose central similarity quantization (CSQ) that optimizes the central similarity between data points with respect to their hash centers instead of optimizing the local similarity to generate a high-quality deep hash function. We also further improve the CSQ with data-dependent hash centers, dubbed as CSQ with learnable center (CSQLC). The proposed CSQ and CSQLC are generic and applicable to image and video hashing scenarios. We conduct extensive experiments on large-scale image and video retrieval tasks, and the proposed CSQ yields noticeably boosted retrieval performance, i.e., 3%-20% in mean average precision (mAP) over the previous state-of-the-art methods, which also demonstrates that our methods can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs.
Collapse
|
3
|
Yang E, Deng C, Liu M. Deep Bayesian Quantization for Supervised Neuroimage Search. MACHINE LEARNING IN MEDICAL IMAGING. MLMI (WORKSHOP) 2023; 14349:396-406. [PMID: 38390519 PMCID: PMC10883338 DOI: 10.1007/978-3-031-45676-3_40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Neuroimage retrieval plays a crucial role in providing physicians with access to previous similar cases, which is essential for case-based reasoning and evidence-based medicine. Due to low computation and storage costs, hashing-based search techniques have been widely adopted for establishing image retrieval systems. However, these methods often suffer from nonnegligible quantization loss, which can degrade the overall search performance. To address this issue, this paper presents a compact coding solution namely Deep Bayesian Quantization (DBQ), which focuses on deep compact quantization that can estimate continuous neuroimage representations and achieve superior performance over existing hashing solutions. Specifically, DBQ seamlessly combines the deep representation learning and the representation compact quantization within a novel Bayesian learning framework, where a proxy embedding-based likelihood function is developed to alleviate the sampling issue for traditional similarity supervision. Additionally, a Gaussian prior is employed to reduce the quantization losses. By utilizing pre-computed lookup tables, the proposed DBQ can enable efficient and effective similarity search. Extensive experiments conducted on 2, 008 structural MRI scans from three benchmark neuroimage datasets demonstrate that our method outperforms previous state-of-the-arts.
Collapse
Affiliation(s)
- Erkun Yang
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Xidian University, Xi'an, China
| | | | - Mingxia Liu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
4
|
Zhao S, Hu M, Cai Z, Liu F. Dynamic Modeling Cross-Modal Interactions in Two-Phase Prediction for Entity-Relation Extraction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1122-1131. [PMID: 34432639 DOI: 10.1109/tnnls.2021.3104971] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Joint extraction of entities and their relations benefits from the close interaction between named entities and their relation information. Therefore, how to effectively model such cross-modal interactions is critical for the final performance. Previous works have used simple methods, such as label-feature concatenation, to perform coarse-grained semantic fusion among cross-modal instances but fail to capture fine-grained correlations over token and label spaces, resulting in insufficient interactions. In this article, we propose a dynamic cross-modal attention network (CMAN) for joint entity and relation extraction. The network is carefully constructed by stacking multiple attention units in depth to dynamic model dense interactions over token-label spaces, in which two basic attention units and a novel two-phase prediction are proposed to explicitly capture fine-grained correlations across different modalities (e.g., token-to-token and label-to-token). Experiment results on the CoNLL04 dataset show that our model obtains state-of-the-art results by achieving 91.72% F1 on entity recognition and 73.46% F1 on relation classification. In the ADE and DREC datasets, our model surpasses existing approaches by more than 2.1% and 2.54% F1 on relation classification. Extensive analyses further confirm the effectiveness of our approach.
Collapse
|
5
|
U-Turn: Crafting Adversarial Queries with Opposite-Direction Features. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01737-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
6
|
Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu XJ. Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08006-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Liu X, Wang X, Cheung YM. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6306-6320. [PMID: 33979294 DOI: 10.1109/tnnls.2021.3076684] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-modal hashing, favored for its effectiveness and efficiency, has received wide attention to facilitating efficient retrieval across different modalities. Nevertheless, most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes while often involving time-consuming training procedure for handling the large-scale dataset. To tackle these issues, we formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data, so as to minimize the quantization loss of mapping such data to hamming space and propose an efficient fast discriminative discrete hashing (FDDH) approach for large-scale cross-modal retrieval. More specifically, FDDH introduces an orthogonal basis to regress the targeted hash codes of training examples to their corresponding semantic labels and utilizes the ε -dragging technique to provide provable large semantic margins. Accordingly, the discriminative power of semantic information can be explicitly captured and maximized. Moreover, an orthogonal transformation scheme is further proposed to map the nonlinear embedding data into the semantic subspace, which can well guarantee the semantic consistency between the data feature and its semantic representation. Consequently, an efficient closed-form solution is derived for discriminative hash code learning, which is very computationally efficient. In addition, an effective and stable online learning strategy is presented for optimizing modality-specific projection functions, featuring adaptivity to different training sizes and streaming data. The proposed FDDH approach theoretically approximates the bi-Lipschitz continuity, runs sufficiently fast, and also significantly improves the retrieval performance over the state-of-the-art methods. The source code is released at https://github.com/starxliu/FDDH.
Collapse
|
8
|
Wang Y, Xiao Y, Lu J, Tan B, Cao Z, Zhang Z, Zhou JT. Discriminative Multi-View Dynamic Image Fusion for Cross-View 3-D Action Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5332-5345. [PMID: 33852396 DOI: 10.1109/tnnls.2021.3070179] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Dramatic imaging viewpoint variation is the critical challenge toward action recognition for depth video. To address this, one feasible way is to enhance view-tolerance of visual feature, while still maintaining strong discriminative capacity. Multi-view dynamic image (MVDI) is the most recently proposed 3-D action representation manner that is able to compactly encode human motion information and 3-D visual clue well. However, it is still view-sensitive. To leverage its performance, a discriminative MVDI fusion method is proposed by us via multi-instance learning (MIL). Specifically, the dynamic images (DIs) from different observation viewpoints are regarded as the instances for 3-D action characterization. After being encoded using Fisher vector (FV), they are then aggregated by sum-pooling to yield the representative 3-D action signature. Our insight is that viewpoint aggregation helps to enhance view-tolerance. And, FV can map the raw DI feature to the higher dimensional feature space to promote the discriminative power. Meanwhile, a discriminative viewpoint instance discovery method is also proposed to discard the viewpoint instances unfavorable for action characterization. The wide-range experiments on five data sets demonstrate that our proposition can significantly enhance the performance of cross-view 3-D action recognition. And, it is also applicable to cross-view 3-D object recognition. The source code is available at https://github.com/3huo/ActionView.
Collapse
|
9
|
Xu L, Zeng X, Zheng B, Li W. Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3371-3385. [PMID: 35507618 DOI: 10.1109/tip.2022.3171081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Benefitting from the low storage cost and high retrieval efficiency, hash learning has become a widely used retrieval technology to approximate nearest neighbors. Within it, the cross-modal medical hashing has attracted an increasing attention in facilitating efficiently clinical decision. However, there are still two main challenges in weak multi-manifold structure perseveration across multiple modalities and weak discriminability of hash code. Specifically, existing cross-modal hashing methods focus on pairwise relations within two modalities, and ignore underlying multi-manifold structures across over 2 modalities. Then, there is little consideration about discriminability, i.e., any pair of hash codes should be different. In this paper, we propose a novel hashing method named multi-manifold deep discriminative cross-modal hashing (MDDCH) for large-scale medical image retrieval. The key point is multi-modal manifold similarity which integrates multiple sub-manifolds defined on heterogeneous data to preserve correlation among instances, and it can be measured by three-step connection on corresponding hetero-manifold. Then, we propose discriminative item to make each hash code encoded by hash functions be different, which improves discriminative performance of hash code. Besides, we introduce Gaussian-binary Restricted Boltzmann Machine to directly output hash codes without using any continuous relaxation. Experiments on three benchmark datasets (AIBL, Brain and SPLP) show that our proposed MDDCH achieves comparative performance to recent state-of-the-art hashing methods. Additionally, diagnostic evaluation from professional physicians shows that all the retrieved medical images describe the same object and illness as the queried image.
Collapse
|
10
|
Zhu L, Zheng C, Lu X, Cheng Z, Nie L, Zhang H. Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval. ACM T INFORM SYST 2022. [DOI: 10.1145/3477180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Multi-modal hashing supports efficient multimedia retrieval well. However, existing methods still suffer from two problems: (1) Fixed multi-modal fusion. They collaborate the multi-modal features with fixed weights for hash learning, which cannot adaptively capture the variations of online streaming multimedia contents. (2) Binary optimization challenge. To generate binary hash codes, existing methods adopt either two-step relaxed optimization that causes significant quantization errors or direct discrete optimization that consumes considerable computation and storage cost. To address these problems, we first propose a Supervised Multi-modal Hashing with Online Query-adaption method. A self-weighted fusion strategy is designed to adaptively preserve the multi-modal features into hash codes by exploiting their complementarity. Besides, the hash codes are efficiently learned with the supervision of pair-wise semantic labels to enhance their discriminative capability while avoiding the challenging symmetric similarity matrix factorization. Further, we propose an efficient Unsupervised Multi-modal Hashing with Online Query-adaption method with an adaptive multi-modal quantization strategy. The hash codes are directly learned without the reliance on the specific objective formulations. Finally, in both methods, we design a parameter-free online hashing module to adaptively capture query variations at the online retrieval stage. Experiments validate the superiority of our proposed methods.
Collapse
Affiliation(s)
- Lei Zhu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Chaoqun Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Xu Lu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Zhiyong Cheng
- Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Liqiang Nie
- School of Computer Science and Technology, Shandong University, Qingdao, Shandong, China
| | - Huaxiang Zhang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
11
|
Xie L, Guo W, Wei H, Tang Y, Tao D. Efficient Unsupervised Dimension Reduction for Streaming Multiview Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1772-1784. [PMID: 32525809 DOI: 10.1109/tcyb.2020.2996684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multiview learning has received substantial attention over the past decade due to its powerful capacity in integrating various types of information. Conventional unsupervised multiview dimension reduction (UMDR) methods are usually conducted in an offline manner and may fail in many real-world applications, where data arrive sequentially and the data distribution changes periodically. Moreover, satisfying the requirements of high memory consumption and expensive retraining of the time cost in large-scale scenarios are difficult. To remedy these drawbacks, we propose an online UMDR (OUMDR) framework. OUMDR aims to seek a low-dimensional and informative consensus representation for streaming multiview data. View-specific weights are also learned in this article to reflect the contributions of different views to the final consensus presentation. A specific model called OUMDR-E is developed by introducing the exclusive group LASSO (EG-LASSO) to explore the intraview and interview correlations. Then, we develop an efficient iterative algorithm with limited memory and time cost requirements for optimization, where the convergence of each update is theoretically guaranteed. We evaluate the proposed approach in video-based expression recognition applications. The experimental results demonstrate the superiority of our approach in terms of both effectiveness and efficiency.
Collapse
|
12
|
Liu B, Zheng Q, Wang Y, Zhang M, Dong J, Wang X. FeatInter: Exploring Fine-Grained Object Features for Video-Text Retrieval. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Multimodal graph inference network for scene graph generation. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02304-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Qin Y, Wu H, Zhang X, Feng G. Semi-Supervised Structured Subspace Learning for Multi-View Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:1-14. [PMID: 34807827 DOI: 10.1109/tip.2021.3128325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-view clustering aims at simultaneously obtaining a consensus underlying subspace across multiple views and conducting clustering on the learned consensus subspace, which has gained a variety of interest in image processing. In this paper, we propose the Semi-supervised Structured Subspace Learning algorithm for clustering data points from Multiple sources (SSSL-M). We explicitly extend the traditional multi-view clustering with a semi-supervised manner and then build an anti-block-diagonal indicator matrix with small amount of supervisory information to pursue the block-diagonal structure of the shared affinity matrix. SSSL-M regularizes multiple view-specific affinity matrices into a shared affinity matrix based on reconstruction through a unified framework consisting of backward encoding networks and the self-expressive mapping. The shared affinity matrix is comprehensive and can flexibly encode complementary information from multiple view-specific affinity matrices. An enhanced structural consistency of affinity matrices from different views can be achieved and the intrinsic relationships among affinity matrices from multiple views can be effectively reflected in this manner. Technically, we formulate the proposed model as an optimization problem, which can be solved by an alternating optimization scheme. Experimental results over seven different benchmark datasets demonstrate that better clustering results can be obtained by our method compared with the state-of-the-art approaches.
Collapse
|
15
|
Hu P, Peng X, Zhu H, Lin J, Zhen L, Peng D. Joint Versus Independent Multiview Hashing for Cross-View Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4982-4993. [PMID: 33119532 DOI: 10.1109/tcyb.2020.3027614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.
Collapse
|
16
|
Hu W, Wu L, Jian M, Chen Y, Yu H. Cosine metric supervised deep hashing with balanced similarity. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.093] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Ren Z, Li X, Mukherjee M, Huang Y, Sun Q, Huang Z. Robust multi-view graph clustering in latent energy-preserving embedding space. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.05.025] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Liu X, Cheung YM, Hu Z, He Y, Zhong B. Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2020.3007143] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
19
|
Wang J, Xu S, Zheng F, Lu K, Song J, Shao L. Learning Efficient Hash Codes for Fast Graph-Based Data Similarity Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6321-6334. [PMID: 34224353 DOI: 10.1109/tip.2021.3093387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Traditional operations, e.g. graph edit distance (GED), are no longer suitable for processing the massive quantities of graph-structured data now available, due to their irregular structures and high computational complexities. With the advent of graph neural networks (GNNs), the problems of graph representation and graph similarity search have drawn particular attention in the field of computer vision. However, GNNs have been less studied for efficient and fast retrieval after graph representation. To represent graph-based data, and maintain fast retrieval while doing so, we introduce an efficient hash model with graph neural networks (HGNN) for a newly designed task (i.e. fast graph-based data retrieval). Due to its flexibility, HGNN can be implemented in both an unsupervised and supervised manner. Specifically, by adopting a graph neural network and hash learning algorithms, HGNN can effectively learn a similarity-preserving graph representation and compute pair-wise similarity or provide classification via low-dimensional compact hash codes. To the best of our knowledge, our model is the first to address graph hashing representation in the Hamming space. Our experimental results reach comparable prediction accuracy to full-precision methods and can even outperform traditional models in some cases. In real-world applications, using hash codes can greatly benefit systems with smaller memory capacities and accelerate the retrieval speed of graph-structured data. Hence, we believe the proposed HGNN has great potential in further research.
Collapse
|
20
|
Quadruplet-Based Deep Cross-Modal Hashing. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9968716. [PMID: 34306059 PMCID: PMC8270718 DOI: 10.1155/2021/9968716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/24/2021] [Accepted: 06/14/2021] [Indexed: 12/02/2022]
Abstract
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.
Collapse
|
21
|
Dong J, Long Z, Mao X, Lin C, He Y, Ji S. Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.114] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
22
|
Li H, Pang J, Tao D, Yu Z. Cross adversarial consistency self-prediction learning for unsupervised domain adaptation person re-identification. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.01.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
23
|
Chen Y, Huang R, Chang H, Tan C, Xue T, Ma B. Cross-Modal Knowledge Adaptation for Language-Based Person Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4057-4069. [PMID: 33788687 DOI: 10.1109/tip.2021.3068825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this paper, we present a method named Cross-Modal Knowledge Adaptation (CMKA) for language-based person search. We argue that the image and text information are not equally important in determining a person's identity. In other words, image carries image-specific information such as lighting condition and background, while text contains more modal agnostic information that is more beneficial to cross-modal matching. Based on this consideration, we propose CMKA to adapt the knowledge of image to the knowledge of text. Specially, text-to-image guidance is obtained at different levels: individuals, lists, and classes. By combining these levels of knowledge adaptation, the image-specific information is suppressed, and the common space of image and text is better constructed. We conduct experiments on the CUHK-PEDES dataset. The experimental results show that the proposed CMKA outperforms the state-of-the-art methods.
Collapse
|
24
|
Yao D, Sui J, Wang M, Yang E, Jiaerken Y, Luo N, Yap PT, Liu M, Shen D. A Mutual Multi-Scale Triplet Graph Convolutional Network for Classification of Brain Disorders Using Functional or Structural Connectivity. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1279-1289. [PMID: 33444133 PMCID: PMC8238125 DOI: 10.1109/tmi.2021.3051604] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Brain connectivity alterations associated with mental disorders have been widely reported in both functional MRI (fMRI) and diffusion MRI (dMRI). However, extracting useful information from the vast amount of information afforded by brain networks remains a great challenge. Capturing network topology, graph convolutional networks (GCNs) have demonstrated to be superior in learning network representations tailored for identifying specific brain disorders. Existing graph construction techniques generally rely on a specific brain parcellation to define regions-of-interest (ROIs) to construct networks, often limiting the analysis into a single spatial scale. In addition, most methods focus on the pairwise relationships between the ROIs and ignore high-order associations between subjects. In this letter, we propose a mutual multi-scale triplet graph convolutional network (MMTGCN) to analyze functional and structural connectivity for brain disorder diagnosis. We first employ several templates with different scales of ROI parcellation to construct coarse-to-fine brain connectivity networks for each subject. Then, a triplet GCN (TGCN) module is developed to learn functional/structural representations of brain connectivity networks at each scale, with the triplet relationship among subjects explicitly incorporated into the learning process. Finally, we propose a template mutual learning strategy to train different scale TGCNs collaboratively for disease classification. Experimental results on 1,160 subjects from three datasets with fMRI or dMRI data demonstrate that our MMTGCN outperforms several state-of-the-art methods in identifying three types of brain disorders.
Collapse
|
25
|
|
26
|
Fang Y, Li B, Li X, Ren Y. Unsupervised cross-modal similarity via Latent Structure Discrete Hashing Factorization. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106857] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
27
|
Shao H, Zhong D, Du X. A deep biometric hash learning framework for three advanced hand‐based biometrics. IET BIOMETRICS 2021. [DOI: 10.1049/bme2.12014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Huikai Shao
- School of Automation Science and Engineering Xi'an Jiaotong University Xi'an Shaanxi China
| | - Dexing Zhong
- School of Automation Science and Engineering Xi'an Jiaotong University Xi'an Shaanxi China
- State Key Lab. for Novel Software Technology Nanjing University Nanjing China
- Pazhou Lab Guangzhou China
| | - Xuefeng Du
- School of Automation Science and Engineering Xi'an Jiaotong University Xi'an Shaanxi China
| |
Collapse
|
28
|
Xiao X, Chen Y, Gong YJ, Zhou Y. Prior Knowledge Regularized Multiview Self-Representation and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1325-1338. [PMID: 32310792 DOI: 10.1109/tnnls.2020.2984625] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
To learn the self-representation matrices/tensor that encodes the intrinsic structure of the data, existing multiview self-representation models consider only the multiview features and, thus, impose equal membership preference across samples. However, this is inappropriate in real scenarios since the prior knowledge, e.g., explicit labels, semantic similarities, and weak-domain cues, can provide useful insights into the underlying relationship of samples. Based on this observation, this article proposes a prior knowledge regularized multiview self-representation (P-MVSR) model, in which the prior knowledge, multiview features, and high-order cross-view correlation are jointly considered to obtain an accurate self-representation tensor. The general concept of "prior knowledge" is defined as the complement of multiview features, and the core of P-MVSR is to take advantage of the membership preference, which is derived from the prior knowledge, to purify and refine the discovered membership of the data. Moreover, P-MVSR adopts the same optimization procedure to handle different prior knowledge and, thus, provides a unified framework for weakly supervised clustering and semisupervised classification. Extensive experiments on real-world databases demonstrate the effectiveness of the proposed P-MVSR model.
Collapse
|
29
|
Zhao W, Guan Z, Luo H, Peng J, Fan J. Deep Multiple Instance Hashing for Fast Multi-Object Image Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7995-8007. [PMID: 34554911 DOI: 10.1109/tip.2021.3112011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) approach for multi-object image retrieval. Our DMIH approach, which leverages a popular CNN model to build the end-to-end relation between a raw image and the binary hash codes of its multiple objects, can support multi-object queries effectively and integrate object detection with hashing learning seamlessly. We treat object detection as a binary multiple instance learning (MIL) problem and such instances are automatically extracted from multi-scale convolutional feature maps. We also design a conditional random field (CRF) module to capture both the semantic and spatial relations among different class labels. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in a multi-task learning scheme. Finally, a two-level inverted index method is proposed to further speed up the retrieval of multi-object queries. Our DMIH approach outperforms state-of-the-arts on public benchmarks for object-based image retrieval and achieves promising results for multi-object queries.
Collapse
|
30
|
Xu X, Wang T, Yang Y, Zuo L, Shen F, Shen HT. Cross-Modal Attention With Semantic Consistence for Image-Text Matching. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5412-5425. [PMID: 32071004 DOI: 10.1109/tnnls.2020.2967597] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The task of image-text matching refers to measuring the visual-semantic similarity between an image and a sentence. Recently, the fine-grained matching methods that explore the local alignment between the image regions and the sentence words have shown advance in inferring the image-text correspondence by aggregating pairwise region-word similarity. However, the local alignment is hard to achieve as some important image regions may be inaccurately detected or even missing. Meanwhile, some words with high-level semantics cannot be strictly corresponding to a single-image region. To tackle these problems, we address the importance of exploiting the global semantic consistence between image regions and sentence words as complementary for the local alignment. In this article, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistency (CASC) for image-text matching. The proposed CASC is a joint framework that performs cross-modal attention for local alignment and multilabel prediction for global semantic consistence. It directly extracts semantic labels from available sentence corpus without additional labor cost, which further provides a global similarity constraint for the aggregated region-word similarity obtained by the local alignment. Extensive experiments on Flickr30k and Microsoft COCO (MSCOCO) data sets demonstrate the effectiveness of the proposed CASC on preserving global semantic consistence along with the local alignment and further show its superior image-text matching performance compared with more than 15 state-of-the-art methods.
Collapse
|
31
|
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval. Neural Netw 2020; 134:143-162. [PMID: 33310483 DOI: 10.1016/j.neunet.2020.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 11/10/2020] [Accepted: 11/23/2020] [Indexed: 11/23/2022]
Abstract
Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the "heterogeneity gap" among various modalities, which is a challenge in cross-modal retrieval. For bridging the "heterogeneity gap," the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the "heterogeneity gap" among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the "heterogeneity gap" in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods.
Collapse
|
32
|
|
33
|
|
34
|
Deep multilevel similarity hashing with fine-grained features for multi-label image retrieval. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
35
|
A Deep Spatial Context Guided Framework for Infant Brain Subcortical Segmentation. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2020; 12267:646-656. [PMID: 33564753 DOI: 10.1007/978-3-030-59728-3_63] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Accurate subcortical segmentation of infant brain magnetic resonance (MR) images is crucial for studying early subcortical structural growth patterns and related diseases diagnosis. However, dynamic intensity changes, low tissue contrast, and small subcortical size of infant brain MR images make subcortical segmentation a challenging task. In this paper, we propose a spatial context guided, coarse-to-fine deep convolutional neural network (CNN) based framework for accurate infant subcortical segmentation. At the coarse stage, we propose a signed distance map (SDM) learning UNet (SDM-UNet) to predict SDMs from the original multi-modal images, including T1w, T2w, and T1w/T2w images. By doing this, the spatial context information, including the relative position information across different structures and the shape information of the segmented structures contained in the ground-truth SDMs, is used for supervising the SDM-UNet to remedy the bad influence from the low tissue contrast in infant brain MR images and generate high-quality SDMs. To improve the robustness to outliers, a Correntropy based loss is introduced in SDM-UNet to penalize the difference between the ground-truth SDMs and predicted SDMs in training. At the fine stage, the predicted SDMs, which contains spatial context information of subcortical structures, are combined with the multi-modal images, and then fed into a multi-source and multi-path UNet (M2-UNet) for delivering refined segmentation. We validate our method on an infant brain MR image dataset with 24 scans by evaluating the Dice ratio between our segmentation and the manual delineation. Compared to four state-of-the-art methods, our method consistently achieves better performances in both qualitative and quantitative evaluations.
Collapse
|
36
|
Li J, Li M, Lu G, Zhang B, Yin H, Zhang D. Similarity and diversity induced paired projection for cross-modal retrieval. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.06.032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
37
|
Yang E, Yao D, Cao B, Guan H, Yap PT, Shen D, Liu M. Deep Disentangled Hashing with Momentum Triplets for Neuroimage Search. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2020; 12261:191-201. [PMID: 34746936 PMCID: PMC8570551 DOI: 10.1007/978-3-030-59710-8_19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Neuroimaging has been widely used in computer-aided clinical diagnosis and treatment, and the rapid increase of neuroimage repositories introduces great challenges for efficient neuroimage search. Existing image search methods often use triplet loss to capture high-order relationships between samples. However, we find that the traditional triplet loss is difficult to pull positive and negative sample pairs to make their Hamming distance discrepancies larger than a small fixed value. This may reduce the discriminative ability of learned hash code and degrade the performance of image search. To address this issue, in this work, we propose a deep disentangled momentum hashing (DDMH) framework for neuroimage search. Specifically, we first investigate the original triplet loss and find that this loss function can be determined by the inner product of hash code pairs. Accordingly, we disentangle hash code norms and hash code directions and analyze the role of each part. By decoupling the loss function from the hash code norm, we propose a unique disentangled triplet loss, which can effectively push positive and negative sample pairs by desired Hamming distance discrepancies for hash codes with different lengths. We further develop a momentum triplet strategy to address the problem of insufficient triplet samples caused by small batch-size for 3D neuroimages. With the proposed disentangled triplet loss and the momentum triplet strategy, we design an end-to-end trainable deep hashing framework for neuroimage search. Comprehensive empirical evidence on three neuroimage datasets shows that DDMH has better performance in neuroimage search compared to several state-of-the-art methods.
Collapse
Affiliation(s)
- Erkun Yang
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dongren Yao
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, University of Chinese Academy of Sciences, Beijing 100190, China
| | - Bing Cao
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hao Guan
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Pew-Thian Yap
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dinggang Shen
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mingxia Liu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
38
|
Gao L, Zhang Y, Zou F, Shao J, Lai J. Unsupervised urban scene segmentation via domain adaptation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
39
|
|
40
|
Cross-Modal Search for Social Networks via Adversarial Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020; 2020:7834953. [PMID: 32733547 PMCID: PMC7369674 DOI: 10.1155/2020/7834953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 05/01/2020] [Accepted: 06/25/2020] [Indexed: 11/25/2022]
Abstract
Cross-modal search has become a research hotspot in the recent years. In contrast to traditional cross-modal search, social network cross-modal information search is restricted by data quality for arbitrary text and low-resolution visual features. In addition, the semantic sparseness of cross-modal data from social networks results in the text and visual modalities misleading each other. In this paper, we propose a cross-modal search method for social network data that capitalizes on adversarial learning (cross-modal search with adversarial learning: CMSAL). We adopt self-attention-based neural networks to generate modality-oriented representations for further intermodal correlation learning. A search module is implemented based on adversarial learning, through which the discriminator is designed to measure the distribution of generated features from intramodal and intramodal perspectives. Experiments on real-word datasets from Sina Weibo and Wikipedia, which have similar properties to social networks, show that the proposed method outperforms the state-of-the-art cross-modal search methods.
Collapse
|
41
|
Huang Y, Zheng F, Cong R, Huang W, Scott MR, Shao L. MCMT-GAN: Multi-Task Coherent Modality Transferable GAN for 3D Brain Image Synthesis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8187-8198. [PMID: 32746245 DOI: 10.1109/tip.2020.3011557] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The ability to synthesize multi-modality data is highly desirable for many computer-aided medical applications, e.g. clinical diagnosis and neuroscience research, since rich imaging cohorts offer diverse and complementary information unraveling human tissues. However, collecting acquisitions can be limited by adversary factors such as patient discomfort, expensive cost and scanner unavailability. In this paper, we propose a multi-task coherent modality transferable GAN (MCMT-GAN) to address this issue for brain MRI synthesis in an unsupervised manner. Through combining the bidirectional adversarial loss, cycle-consistency loss, domain adapted loss and manifold regularization in a volumetric space, MCMT-GAN is robust for multi-modality brain image synthesis with visually high fidelity. In addition, we complement discriminators collaboratively working with segmentors which ensure the usefulness of our results to segmentation task. Experiments evaluated on various cross-modality synthesis show that our method produces visually impressive results with substitutability for clinical post-processing and also exceeds the state-of-the-art methods.
Collapse
|
42
|
|
43
|
Li H, He X, Yu Z, Luo J. Noise-robust image fusion with low-rank sparse decomposition guided by external patch prior. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
44
|
Wang Q, Dai W, Ma X, Shang Z. Driving amount based stochastic configuration network for industrial process modeling. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.029] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
45
|
Deng C, Yang E, Liu T, Tao D. Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2189-2201. [PMID: 31514156 DOI: 10.1109/tnnls.2019.2929068] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. Recent supervised hashing research has shown that deep learning-based methods can significantly outperform nondeep methods. Most existing supervised deep hashing methods exploit supervisory signals to generate similar and dissimilar image pairs for training. However, natural images can have large intraclass and small interclass variations, which may degrade the accuracy of hash codes. To address this problem, we propose a novel two-stream ConvNet architecture, which learns hash codes with class-specific representation centers. Our basic idea is that if we can learn a unified binary representation for each class as a center and encourage hash codes of images to be close to the corresponding centers, the intraclass variation will be greatly reduced. Accordingly, we design a neural network that leverages label information and outputs a unified binary representation for each class. Moreover, we also design an image network to learn hash codes from images and force these hash codes to be close to the corresponding class-specific centers. These two neural networks are then seamlessly incorporated to create a unified, end-to-end trainable framework. Extensive experiments on three popular benchmarks corroborate that our proposed method outperforms current state-of-the-art methods.
Collapse
|
46
|
|
47
|
Crafting adversarial example with adaptive root mean square gradient on deep neural networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.084] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
48
|
Bai X, Zhu L, Liang C, Li J, Nie X, Chang X. Multi-view feature selection via Nonnegative Structured Graph Learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.044] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
49
|
Yao T, Han Y, Wang R, Kong X, Yan L, Fu H, Tian Q. Efficient discrete supervised hashing for large-scale cross-modal retrieval. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.086] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
50
|
Yang E, Liu T, Deng C, Tao D. Adversarial Examples for Hamming Space Search. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1473-1484. [PMID: 30561358 DOI: 10.1109/tcyb.2018.2882908] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to its strong representation learning ability and its facilitation of joint learning for representation and hash codes, deep learning-to-hash has achieved promising results and is becoming increasingly popular for the large-scale approximate nearest neighbor search. However, recent studies highlight the vulnerability of deep image classifiers to adversarial examples; this also introduces profound security concerns for deep retrieval systems. Accordingly, in order to study the robustness of modern deep hashing models to adversarial perturbations, we propose hash adversary generation (HAG), a novel method of crafting adversarial examples for Hamming space search. The main goal of HAG is to generate imperceptibly perturbed examples as queries, whose nearest neighbors from a targeted hashing model are semantically irrelevant to the original queries. Extensive experiments prove that HAG can successfully craft adversarial examples with small perturbations to mislead targeted hashing models. The transferability of these perturbations under a variety of settings is also verified. Moreover, by combining heterogeneous perturbations, we further provide a simple yet effective method of constructing adversarial examples for black-box attacks.
Collapse
|