1
|
Fan W, Zhang C, Li H, Jia X, Wang G. Three-Stage Semisupervised Cross-Modal Hashing With Pairwise Relations Exploitation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:260-273. [PMID: 37023166 DOI: 10.1109/tnnls.2023.3263221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Hashing methods have sparked a great revolution in cross-modal retrieval due to the low cost of storage and computation. Benefiting from the sufficient semantic information of labeled data, supervised hashing methods have shown better performance compared with unsupervised ones. Nevertheless, it is expensive and labor intensive to annotate the training samples, which restricts the feasibility of supervised methods in real applications. To deal with this limitation, a novel semisupervised hashing method, i.e., three-stage semisupervised hashing (TS3H) is proposed in this article, where both labeled and unlabeled data are seamlessly handled. Different from other semisupervised approaches that learn the pseudolabels, hash codes, and hash functions simultaneously, the new approach is decomposed into three stages as the name implies, in which all of the stages are conducted individually to make the optimization cost-effective and precise. Specifically, the classifiers of different modalities are learned via the provided supervised information to predict the labels of unlabeled data at first. Then, hash code learning is achieved with a simple but efficient scheme by unifying the provided and the newly predicted labels. To capture the discriminative information and preserve the semantic similarities, we leverage pairwise relations to supervise both classifier learning and hash code learning. Finally, the modality-specific hash functions are obtained by transforming the training samples to the generated hash codes. The new approach is compared with the state-of-the-art shallow and deep cross-modal hashing (DCMH) methods on several widely used benchmark databases, and the experiment results verify its efficiency and superiority.
Collapse
|
2
|
Yuan L, Wang T, Zhang X, Tay FEH, Jie Z, Tian Y, Liu W, Feng J. Learnable Central Similarity Quantization for Efficient Image and Video Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18717-18730. [PMID: 38090871 DOI: 10.1109/tnnls.2023.3321148] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Data-dependent hashing methods aim to learn hash functions from the pairwise or triplet relationships among the data, which often lead to low efficiency and low collision rate by only capturing the local distribution of the data. To solve the limitation, we propose central similarity, in which the hash codes of similar data pairs are encouraged to approach a common center and those of dissimilar pairs to converge to different centers. As a new global similarity metric, central similarity can improve the efficiency and retrieval accuracy of hash learning. By introducing a new concept, hash centers, we principally formulate the computation of the proposed central similarity metric, in which the hash centers refer to a set of points scattered in the Hamming space with a sufficient mutual distance between each other. To construct well-separated hash centers, we provide two efficient methods: 1) leveraging the Hadamard matrix and Bernoulli distributions to generate data-independent hash centers and 2) learning data-dependent hash centers from data representations. Based on the proposed similarity metric and hash centers, we propose central similarity quantization (CSQ) that optimizes the central similarity between data points with respect to their hash centers instead of optimizing the local similarity to generate a high-quality deep hash function. We also further improve the CSQ with data-dependent hash centers, dubbed as CSQ with learnable center (CSQLC). The proposed CSQ and CSQLC are generic and applicable to image and video hashing scenarios. We conduct extensive experiments on large-scale image and video retrieval tasks, and the proposed CSQ yields noticeably boosted retrieval performance, i.e., 3%-20% in mean average precision (mAP) over the previous state-of-the-art methods, which also demonstrates that our methods can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs.
Collapse
|
3
|
Hoang T, Do TT, Nguyen TV, Cheung NM. Multimodal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6289-6302. [PMID: 34982698 DOI: 10.1109/tnnls.2021.3135420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed cross-modal info-max hashing (CMIMH). First, to learn informative representations that can preserve both intramodal and intermodal similarities, we leverage the recent advances in estimating variational lower bound of MI to maximizing the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modeled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intramodal and intermodal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence, balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
Collapse
|
4
|
Li Y, Ge M, Li M, Li T, Xiang S. CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval. SENSORS (BASEL, SWITZERLAND) 2023; 23:3439. [PMID: 37050499 PMCID: PMC10099083 DOI: 10.3390/s23073439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 03/21/2023] [Accepted: 03/23/2023] [Indexed: 06/19/2023]
Abstract
With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks.
Collapse
Affiliation(s)
- Yewen Li
- School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China; (Y.L.); (T.L.)
| | - Mingyuan Ge
- School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China; (Y.L.); (T.L.)
| | - Mingyong Li
- School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China; (Y.L.); (T.L.)
| | - Tiansong Li
- School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China; (Y.L.); (T.L.)
| | - Sen Xiang
- School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China
| |
Collapse
|
5
|
Hu P, Zhu H, Lin J, Peng D, Zhao YP, Peng X. Unsupervised Contrastive Cross-Modal Hashing. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:3877-3889. [PMID: 35617190 DOI: 10.1109/tpami.2022.3177356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-the-shelf deep cross-modal hashing possible. In other words, our method does not involve binary-continuous relaxation like most existing methods, thus enjoying better retrieval performance; ii) to alleviate the influence brought by false-negative pairs (FNPs), we propose a Cross-modal Ranking Learning loss (CRL) which utilizes the discrimination from all instead of only the hard negative pairs, where FNP refers to the within-class pairs that were wrongly treated as negative pairs. Thanks to such a global strategy, CRL endows our method with better performance because CRL will not overuse the FNPs while ignoring the true-negative pairs. To the best of our knowledge, the proposed method could be one of the first successful contrastive hashing methods. To demonstrate the effectiveness of the proposed method, we carry out experiments on five widely-used datasets compared with 13 state-of-the-art methods. The code is available at https://github.com/penghu-cs/UCCH.
Collapse
|
6
|
Liu X, Wang X, Cheung YM. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6306-6320. [PMID: 33979294 DOI: 10.1109/tnnls.2021.3076684] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-modal hashing, favored for its effectiveness and efficiency, has received wide attention to facilitating efficient retrieval across different modalities. Nevertheless, most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes while often involving time-consuming training procedure for handling the large-scale dataset. To tackle these issues, we formulate the learning of similarity-preserving hash codes in terms of orthogonally rotating the semantic data, so as to minimize the quantization loss of mapping such data to hamming space and propose an efficient fast discriminative discrete hashing (FDDH) approach for large-scale cross-modal retrieval. More specifically, FDDH introduces an orthogonal basis to regress the targeted hash codes of training examples to their corresponding semantic labels and utilizes the ε -dragging technique to provide provable large semantic margins. Accordingly, the discriminative power of semantic information can be explicitly captured and maximized. Moreover, an orthogonal transformation scheme is further proposed to map the nonlinear embedding data into the semantic subspace, which can well guarantee the semantic consistency between the data feature and its semantic representation. Consequently, an efficient closed-form solution is derived for discriminative hash code learning, which is very computationally efficient. In addition, an effective and stable online learning strategy is presented for optimizing modality-specific projection functions, featuring adaptivity to different training sizes and streaming data. The proposed FDDH approach theoretically approximates the bi-Lipschitz continuity, runs sufficiently fast, and also significantly improves the retrieval performance over the state-of-the-art methods. The source code is released at https://github.com/starxliu/FDDH.
Collapse
|
7
|
Zhang D, Wu XJ. Scalable Discrete Matrix Factorization and Semantic Autoencoder for Cross-Media Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5947-5960. [PMID: 33326392 DOI: 10.1109/tcyb.2020.3032017] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Hashing methods have sparked great attention on multimedia tasks due to their effectiveness and efficiency. However, most existing methods generate binary codes by relaxing the binary constraints, which may cause large quantization error. In addition, most supervised cross-modal approaches preserve the similarity relationship by constructing an n×n large-size similarity matrix, which requires huge computation, making these methods unscalable. To address the above challenges, this article presents a novel algorithm, called scalable discrete matrix factorization and semantic autoencoder method (SDMSA). SDMSA is a two-stage method. In the first stage, the matrix factorization scheme is utilized to learn the latent semantic information, the label matrix is incorporated into the loss function instead of the similarity matrix. Thereafter, the binary codes can be generated by the latent representations. During optimization, we can avoid manipulating a large n×n similarity matrix, and the hash codes can be generated directly. In the second stage, a novel hash function learning scheme based on the autoencoder is proposed. The encoder-decoder paradigm aims to learn projections, the feature vectors are projected to code vectors by encoder, and the code vectors are projected back to the original feature vectors by the decoder. The encoder-decoder scheme ensures the embedding can well preserve both the semantic and feature information. Specifically, two algorithms SDMSA-lin and SDMSA-ker are developed under the SDMSA framework. Owing to the merit of SDMSA, we can get more semantically meaningful binary hash codes. Extensive experiments on several databases show that SDMSA-lin and SDMSA-ker achieve promising performance.
Collapse
|
8
|
Zhu L, Zheng C, Lu X, Cheng Z, Nie L, Zhang H. Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval. ACM T INFORM SYST 2022. [DOI: 10.1145/3477180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Multi-modal hashing supports efficient multimedia retrieval well. However, existing methods still suffer from two problems: (1) Fixed multi-modal fusion. They collaborate the multi-modal features with fixed weights for hash learning, which cannot adaptively capture the variations of online streaming multimedia contents. (2) Binary optimization challenge. To generate binary hash codes, existing methods adopt either two-step relaxed optimization that causes significant quantization errors or direct discrete optimization that consumes considerable computation and storage cost. To address these problems, we first propose a Supervised Multi-modal Hashing with Online Query-adaption method. A self-weighted fusion strategy is designed to adaptively preserve the multi-modal features into hash codes by exploiting their complementarity. Besides, the hash codes are efficiently learned with the supervision of pair-wise semantic labels to enhance their discriminative capability while avoiding the challenging symmetric similarity matrix factorization. Further, we propose an efficient Unsupervised Multi-modal Hashing with Online Query-adaption method with an adaptive multi-modal quantization strategy. The hash codes are directly learned without the reliance on the specific objective formulations. Finally, in both methods, we design a parameter-free online hashing module to adaptively capture query variations at the online retrieval stage. Experiments validate the superiority of our proposed methods.
Collapse
Affiliation(s)
- Lei Zhu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Chaoqun Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Xu Lu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Zhiyong Cheng
- Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Liqiang Nie
- School of Computer Science and Technology, Shandong University, Qingdao, Shandong, China
| | - Huaxiang Zhang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
9
|
Liu C, Yu J, Huang Y, Huang F. Time–frequency fusion learning for photoplethysmography biometric recognition. IET BIOMETRICS 2022. [DOI: 10.1049/bme2.12070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
| | - Jijiang Yu
- School of Computer Heze University Heze China
| | - Yuwen Huang
- School of Computer Heze University Heze China
- School of Software Shandong University Jinan China
| | | |
Collapse
|
10
|
Object-Level Visual-Text Correlation Graph Hashing for Unsupervised Cross-Modal Retrieval. SENSORS 2022; 22:s22082921. [PMID: 35458906 PMCID: PMC9029824 DOI: 10.3390/s22082921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 03/29/2022] [Accepted: 04/07/2022] [Indexed: 02/01/2023]
Abstract
The core of cross-modal hashing methods is to map high dimensional features into binary hash codes, which can then efficiently utilize the Hamming distance metric to enhance retrieval efficiency. Recent development emphasizes the advantages of the unsupervised cross-modal hashing technique, since it only relies on relevant information of the paired data, making it more applicable to real-world applications. However, two problems, that is intro-modality correlation and inter-modality correlation, still have not been fully considered. Intra-modality correlation describes the complex overall concept of a single modality and provides semantic relevance for retrieval tasks, while inter-modality correction refers to the relationship between different modalities. From our observation and hypothesis, the dependency relationship within the modality and between different modalities can be constructed at the object level, which can further improve cross-modal hashing retrieval accuracy. To this end, we propose a Visual-textful Correlation Graph Hashing (OVCGH) approach to mine the fine-grained object-level similarity in cross-modal data while suppressing noise interference. Specifically, a novel intra-modality correlation graph is designed to learn graph-level representations of different modalities, obtaining the dependency relationship of the image region to image region and the tag to tag in an unsupervised manner. Then, we design a visual-text dependency building module that can capture correlation semantic information between different modalities by modeling the dependency relationship between image object region and text tag. Extensive experiments on two widely used datasets verify the effectiveness of our proposed approach.
Collapse
|
11
|
Yu Z, Wu S, Dou Z, Bakker EM. Deep hashing with self-supervised asymmetric semantic excavation and margin-scalable constraint. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
12
|
Khan A, Hayat S, Ahmad M, Wen J, Farooq MU, Fang M, Jiang W. Cross‐modal retrieval based on deep regularized hashing constraints. INT J INTELL SYST 2022. [DOI: 10.1002/int.22853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Asad Khan
- School of Computer Science and Cyber Engineering Guangzhou University Guangzhou China
| | - Sakander Hayat
- School of Mathematics and Information Sciences Guangzhou University Guangzhou China
| | - Muhammad Ahmad
- Department of Computer Science National University of Computer and Emerging Sciences (NUCES‐FAST) Faisalabad Campus Chiniot Pakistan
| | - Jinyu Wen
- School of Computer Science and Cyber Engineering Guangzhou University Guangzhou China
| | - Muhammad Umar Farooq
- School of Computer Science and Technology University of Science and Technology of China Hefei China
| | - Meie Fang
- School of Computer Science and Cyber Engineering Guangzhou University Guangzhou China
| | - Wenchao Jiang
- School of Computers Guangdong University of Technology Guangzhou China
| |
Collapse
|
13
|
Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00615-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractWith the vigorous development of mobile Internet technology and the popularization of smart devices, while the amount of multimedia data has exploded, its forms have become more and more diversified. People’s demand for information is no longer satisfied with single-modal data retrieval, and cross-modal retrieval has become a research hotspot in recent years. Due to the strong feature learning ability of deep learning, cross-modal deep hashing has been extensively studied. However, the similarity of different modalities is difficult to measure directly because of the different distribution and representation of cross-modal. Therefore, it is urgent to eliminate the modal gap and improve retrieval accuracy. Some previous research work has introduced GANs in cross-modal hashing to reduce semantic differences between different modalities. However, most of the existing GAN-based cross-modal hashing methods have some issues such as network training is unstable and gradient disappears, which affect the elimination of modal differences. To solve this issue, this paper proposed a novel Semantic-guided Autoencoder Adversarial Hashing method for cross-modal retrieval (SAAH). First of all, two kinds of adversarial autoencoder networks, under the guidance of semantic multi-labels, maximize the semantic relevance of instances and maintain the immutability of cross-modal. Secondly, under the supervision of semantics, the adversarial module guides the feature learning process and maintains the modality relations. In addition, to maintain the inter-modal correlation of all similar pairs, this paper use two types of loss functions to maintain the similarity. To verify the effectiveness of our proposed method, sufficient experiments were conducted on three widely used cross-modal datasets (MIRFLICKR, NUS-WIDE and MS COCO), and compared with several representatives advanced cross-modal retrieval methods, SAAH achieved leading retrieval performance.
Collapse
|
14
|
Luo J, Wo Y, Wu B, Han G. Learning sufficient scene representation for unsupervised cross-modal retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
Shen X, Zhang H, Li L, Zhang Z, Chen D, Liu L. Clustering-driven Deep Adversarial Hashing for scalable unsupervised cross-modal retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.087] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Li M, Li Q, Tang L, Peng S, Ma Y, Yang D. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5107034. [PMID: 34326867 PMCID: PMC8310450 DOI: 10.1155/2021/5107034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/08/2021] [Indexed: 11/18/2022]
Abstract
Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.
Collapse
Affiliation(s)
- Mingyong Li
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Qiqi Li
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Lirong Tang
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Shuang Peng
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Yan Ma
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Degang Yang
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| |
Collapse
|
17
|
|
18
|
Yang Z, Yang L, Raymond OI, Zhu L, Huang W, Liao Z, Long J. NSDH: A Nonlinear Supervised Discrete Hashing framework for large-scale cross-modal retrieval. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106818] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
19
|
Fang Y, Li B, Li X, Ren Y. Unsupervised cross-modal similarity via Latent Structure Discrete Hashing Factorization. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106857] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
20
|
Liu X, Hu Z, Ling H, Cheung YM. MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:964-981. [PMID: 31514125 DOI: 10.1109/tpami.2019.2940446] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hashing has recently sparked a great revolution in cross-modal retrieval because of its low storage cost and high query speed. Recent cross-modal hashing methods often learn unified or equal-length hash codes to represent the multi-modal data and make them intuitively comparable. However, such unified or equal-length hash representations could inherently sacrifice their representation scalability because the data from different modalities may not have one-to-one correspondence and could be encoded more efficiently by different hash codes of unequal lengths. To mitigate these problems, this paper exploits a related and relatively unexplored problem: encode the heterogeneous data with varying hash lengths and generalize the cross-modal retrieval in various challenging scenarios. To this end, a generalized and flexible cross-modal hashing framework, termed Matrix Tri-Factorization Hashing (MTFH), is proposed to work seamlessly in various settings including paired or unpaired multi-modal data, and equal or varying hash length encoding scenarios. More specifically, MTFH exploits an efficient objective function to flexibly learn the modality-specific hash codes with different length settings, while synchronously learning two semantic correlation matrices to semantically correlate the different hash representations for heterogeneous data comparable. As a result, the derived hash codes are more semantically meaningful for various challenging cross-modal retrieval tasks. Extensive experiments evaluated on public benchmark datasets highlight the superiority of MTFH under various retrieval scenarios and show its competitive performance with the state-of-the-arts.
Collapse
|
21
|
Wu G, Lin Z, Ding G, Ni Q, Han J. On Aggregation of Unsupervised Deep Binary Descriptor with Weak Bits. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9266-9278. [PMID: 32976101 DOI: 10.1109/tip.2020.3025437] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Despite the thrilling success achieved by existing binary descriptors, most of them are still in the mire of three limitations: 1) vulnerable to the geometric transformations; 2) incapable of preserving the manifold structure when learning binary codes; 3) NO guarantee to find the true match if multiple candidates happen to have the same Hamming distance to a given query. All these together make the binary descriptor less effective, given large-scale visual recognition tasks. In this paper, we propose a novel learning-based feature descriptor, namely Unsupervised Deep Binary Descriptor (UDBD), which learns transformation invariant binary descriptors via projecting the original data and their transformed sets into a joint binary space. Moreover, we involve a ℓ2,1-norm loss term in the binary embedding process to gain simultaneously the robustness against data noises and less probability of mistakenly flipping bits of the binary descriptor, on top of it, a graph constraint is used to preserve the original manifold structure in the binary space. Furthermore, a weak bit mechanism is adopted to find the real match from candidates sharing the same minimum Hamming distance, thus enhancing matching performance. Extensive experimental results on public datasets show the superiority of UDBD in terms of matching and retrieval accuracy over state-of-the-arts.
Collapse
|
22
|
Hoang T, Do TT, Nguyen TV, Cheung NM. Unsupervised Deep Cross-modality Spectral Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8391-8406. [PMID: 32784139 DOI: 10.1109/tip.2020.3014727] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods.
Collapse
|
23
|
Wang L, Qian X, Zhang Y, Shen J, Cao X. Enhancing Sketch-Based Image Retrieval by CNN Semantic Re-ranking. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3330-3342. [PMID: 30892258 DOI: 10.1109/tcyb.2019.2894498] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper introduces a convolutional neural network (CNN) semantic re-ranking system to enhance the performance of sketch-based image retrieval (SBIR). Distinguished from the existing approaches, the proposed system can leverage category information brought by CNNs to support effective similarity measurement between the images. To achieve effective classification of query sketches and high-quality initial retrieval results, one CNN model is trained for classification of sketches, another for that of natural images. Through training dual CNN models, the semantic information of both the sketches and natural images is captured by deep learning. In order to measure the category similarity between images, a category similarity measurement method is proposed. Category information is then used for re-ranking. Re-ranking operation first infers the retrieval category of the query sketch and then uses the category similarity measurement to measure the category similarity between the query sketch and each initial retrieval result. Finally, the initial retrieval results are re-ranked. The experiments on different types of SBIR datasets demonstrate the effectiveness of the proposed re-ranking method. Comparisons with other re-ranking algorithms are also given to show the proposed method's superiority. Further, compared to the baseline systems, the proposed re-ranking approach achieves significantly higher precision in the top ten different SBIR methods and datasets.
Collapse
|
24
|
Cheng M, Jing L, Ng MK. Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval. ACM T INFORM SYST 2020. [DOI: 10.1145/3389547] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
With the quick development of social websites, there are more opportunities to have different media types (such as text, image, video, etc.) describing the same topic from large-scale heterogeneous data sources. To efficiently identify the inter-media correlations for multimedia retrieval, unsupervised cross-modal hashing (UCMH) has gained increased interest due to the significant reduction in computation and storage. However, most UCMH methods assume that the data from different modalities are well paired. As a result, existing UCMH methods may not achieve satisfactory performance when partially paired data are given only. In this article, we propose a new-type of UCMH method called robust unsupervised cross-modal hashing (
RUCMH
). The major contribution lies in jointly learning modal-specific hash function, exploring the correlations among modalities with partial or even without any pairwise correspondence, and preserving the information of original features as much as possible. The learning process can be modeled via a joint minimization problem, and the corresponding optimization algorithm is presented. A series of experiments is conducted on four real-world datasets (Wiki, MIRFlickr, NUS-WIDE, and MS-COCO). The results demonstrate that RUCMH can significantly outperform the state-of-the-art unsupervised cross-modal hashing methods, especially for the partially paired case, which validates the effectiveness of RUCMH.
Collapse
Affiliation(s)
- Miaomiao Cheng
- Beijing Jiaotong University, Beijing Key Lab of Traffic Data Analysis and Mining, Beijing, China
| | - Liping Jing
- Beijing Jiaotong University, Beijing Key Lab of Traffic Data Analysis and Mining, Beijing, China
| | - Michael K. Ng
- The University of Hong Kong, Department of Mathematics, Hong Kong, China
| |
Collapse
|
25
|
Fang Y, Ren Y, Park JH. Semantic-enhanced discrete matrix factorization hashing for heterogeneous modal matching. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105381] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
26
|
|
27
|
Zhang J, Peng Y, Yuan M. SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:489-502. [PMID: 30273169 DOI: 10.1109/tcyb.2018.2868826] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Cross-modal hashing maps heterogeneous multimedia data into a common Hamming space to realize fast and flexible cross-modal retrieval. Supervised cross-modal hashing methods have achieved considerable progress by incorporating semantic side information. However, they heavily rely on large-scale labeled cross-modal training data which are hard to obtain, since multiple modalities are involved. They also ignore the rich information contained in the large amount of unlabeled data across different modalities, which can help to model the correlations between different modalities. To address these problems, in this paper, we propose a novel semi-supervised cross-modal hashing approach by generative adversarial network (SCH-GAN). The main contributions can be summarized as follows: 1) we propose a novel generative adversarial network for cross-modal hashing, in which the generative model tries to select margin examples of one modality from unlabeled data when given a query of another modality (e.g., giving a text query to retrieve images and vice versa). The discriminative model tries to distinguish the selected examples and true positive examples of the query. These two models play a minimax game so that the generative model can promote the hashing performance of the discriminative model and 2) we propose a reinforcement learning-based algorithm to drive the training of proposed SCH-GAN. The generative model takes the correlation score predicted by discriminative model as a reward, and tries to select the examples close to the margin to promote a discriminative model. Extensive experiments verify the effectiveness of our proposed approach, compared with nine state-of-the-art methods on three widely used datasets.
Collapse
|
28
|
Ji Z, Sun Y, Yu Y, Pang Y, Han J. Attribute-Guided Network for Cross-Modal Zero-Shot Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:321-330. [PMID: 30990194 DOI: 10.1109/tnnls.2019.2904991] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Zero-shot hashing (ZSH) aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories. Typically, it is achieved by utilizing a semantic embedding space to transfer knowledge from seen domain to unseen domain. Existing efforts mainly focus on single-modal retrieval task, especially image-based image retrieval (IBIR). However, as a highlighted research topic in the field of hashing, cross-modal retrieval is more common in real-world applications. To address the cross-modal ZSH (CMZSH) retrieval task, we propose a novel attribute-guided network (AgNet), which can perform not only IBIR but also text-based image retrieval (TBIR). In particular, AgNet aligns different modal data into a semantically rich attribute space, which bridges the gap caused by modality heterogeneity and zero-shot setting. We also design an effective strategy that exploits the attribute to guide the generation of hash codes for image and text within the same network. Extensive experimental results on three benchmark data sets (AwA, SUN, and ImageNet) demonstrate the superiority of AgNet on both cross-modal and single-modal zero-shot image retrieval tasks.
Collapse
|
29
|
Cognitive Intelligence Assisted Fog-Cloud Architecture for Generalized Anxiety Disorder (GAD) Prediction. J Med Syst 2019; 44:7. [DOI: 10.1007/s10916-019-1495-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 11/03/2019] [Indexed: 10/25/2022]
|
30
|
Wang S, Dou Z, Chen D, Yu H, Li Y, Pan P. Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
31
|
Han X, Song X, Yao Y, Xu XS, Nie L. Neural Compatibility Modeling with Probabilistic Knowledge Distillation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:871-882. [PMID: 31478851 DOI: 10.1109/tip.2019.2936742] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In modern society, clothing matching plays a pivotal role in people's daily life, as suitable outfits can beautify their appearance directly. Nevertheless, how to make a suitable outfit has become a daily headache for many people, especially those who do not have much sense of aesthetics. In the light of this, many research efforts have been dedicated to the task of complementary clothing matching and have achieved great success relying on the advanced data-driven neural networks. However, most existing methods overlook the rich valuable knowledge accumulated by our human beings in the fashion domain, especially the rules regarding clothing matching, like "coats go with dresses" and "silk tops cannot go with chiffon bottoms". Towards this end, in this work, we propose a knowledge-guided neural compatibility modeling scheme, which is able to incorporate the rich fashion domain knowledge to enhance the performance of the compatibility modeling in the context of clothing matching. To better integrate the huge and implicit fashion domain knowledge into the data-driven neural networks, we present a probabilistic knowledge distillation (PKD) method, which is able to encode vast knowledge rules in a probabilistic manner. Extensive experiments on two real-world datasets have verified the guidance of rules from different sources and demonstrated the effectiveness and portability of our model. As a byproduct, we released the codes and involved parameters to benefit the research community.
Collapse
|
32
|
Unsupervised cross-modal retrieval via Multi-modal graph regularized Smooth Matrix Factorization Hashing. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.02.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
33
|
|
34
|
Wu G, Han J, Guo Y, Liu L, Ding G, Ni Q, Shao L. Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:1993-2007. [PMID: 30452370 DOI: 10.1109/tip.2018.2882155] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper proposes a deep hashing framework, namely Unsupervised Deep Video Hashing (UDVH), for largescale video similarity search with the aim to learn compact yet effective binary codes. Our UDVH produces the hash codes in a self-taught manner by jointly integrating discriminative video representation with optimal code learning, where an efficient alternating approach is adopted to optimize the objective function. The key differences from most existing video hashing methods lie in 1) UDVH is an unsupervised hashing method that generates hash codes by cooperatively utilizing feature clustering and a specifically-designed binarization with the original neighborhood structure preserved in the binary space; 2) a specific rotation is developed and applied onto video features such that the variance of each dimension can be balanced, thus facilitating the subsequent quantization step. Extensive experiments performed on three popular video datasets show that UDVH is overwhelmingly better than the state-of-the-arts in terms of various evaluation metrics, which makes it practical in real-world applications.
Collapse
|
35
|
Zhu L, Huang Z, Li Z, Xie L, Shen HT. Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5264-5276. [PMID: 29994644 DOI: 10.1109/tnnls.2018.2797248] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Unsupervised hashing can desirably support scalable content-based image retrieval for its appealing advantages of semantic label independence, memory, and search efficiency. However, the learned hash codes are embedded with limited discriminative semantics due to the intrinsic limitation of image representation. To address the problem, in this paper, we propose a novel hashing approach, dubbed as discrete semantic transfer hashing (DSTH). The key idea is to directly augment the semantics of discrete image hash codes by exploring auxiliary contextual modalities. To this end, a unified hashing framework is formulated to simultaneously preserve visual similarities of images and perform semantic transfer from contextual modalities. Furthermore, to guarantee direct semantic transfer and avoid information loss, we explicitly impose the discrete constraint, bit-uncorrelation constraint, and bit-balance constraint on hash codes. A novel and effective discrete optimization method based on augmented Lagrangian multiplier is developed to iteratively solve the optimization problem. The whole learning process has linear computation complexity and desirable scalability. Experiments on three benchmark data sets demonstrate the superiority of DSTH compared with several state-of-the-art approaches.
Collapse
|
36
|
|
37
|
|
38
|
Deng C, Chen Z, Liu X, Gao X, Tao D. Triplet-Based Deep Hashing Network for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3893-3903. [PMID: 29993656 DOI: 10.1109/tip.2018.2821921] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Given the benefits of its low storage requirements and high retrieval efficiency, hashing has recently received increasing attention. In particular, cross-modal hashing has been widely and successfully used in multimedia similarity search applications. However, almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information, leading to unsatisfactory retrieval performance. In this paper, we propose a tripletbased deep hashing (TDH) network for cross-modal retrieval. First, we utilize the triplet labels, which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally, graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets.
Collapse
|
39
|
Song K, Nie F, Han J, Li X. Rank- 2-D Multinomial Logistic Regression for Matrix Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3524-3537. [PMID: 28816682 DOI: 10.1109/tnnls.2017.2731999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The amount of matrix data has increased rapidly nowadays. How to classify matrix data efficiently is an important issue. In this paper, by discovering the shortages of 2-D linear discriminant analysis and 2-D logistic regression, a novel 2-D framework named rank- 2-D multinomial logistic regression (2DMLR-RK) is proposed. The 2DMLR-RK is designed for a multiclass matrix classification problem. In the proposed framework, each category is modeled by a left projection matrix and a right projection matrix with rank . The left projection matrices capture the row information of matrix data, and the right projection matrices acquire the column information. We identify the parameter plays the role of balancing the capacity of learning and generalization of the 2DMLR-RK. In addition, we develop an effective framework for solving the proposed nonconvex optimization problem. The convergence, initialization, and computational complexity are discussed. Extensive experiments on various types of data sets are conducted. Comparing with 1-D methods, 2DMLR-RK not only achieves a better classification accuracy, but also costs less computation time. Comparing with other state-of-the-art 2-D methods, the 2DMLR-RK achieves a better performance for matrix data classification.
Collapse
|
40
|
Koutaki G, Shirai K, Ambai M. Hadamard Coding for Supervised Discrete Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:5378-5392. [PMID: 30010571 DOI: 10.1109/tip.2018.2855427] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose a learning-based supervised discrete hashing method. Binary hashing is widely used for large-scale image retrieval as well as video and document searches because the compact binary code representation is essential for data storage and reasonable for query searches using bit-operations. The recently proposed supervised discrete hashing (SDH) method efficiently solves mixed-integer programming problems by alternating optimization and the discrete cyclic coordinate descent (DCC) method. Based on some preliminary experiments, we show that the SDH method can be simplified without performance degradation. We analyze the simplified model and provide a mathematically exact solution thereof; we reveal that the exact binary code is provided by a "Hadamard matrix." Therefore, we named our method Hadamard codedsupervised discrete hashing (HC-SDH). In contrast to SDH, our model does not require an alternating optimization algorithm and does not depend on initial values. HC-SDH is also easier to implement than iterative quantization (ITQ). Experimental results involving a large-scale database show that Hadamard coding outperforms conventional SDH in terms of precision, recall, and computational time. On the large datasets SUN-397 and ImageNet, HC-SDH provides a superior mean average of precision (mAP) and top-accuracy compared to the conventional SDH methods with the same code length and FastHash. The training time of HC-SDH is 170 times faster than conventional SDH and the testing time including the encoding time is seven times faster than FastHash which encodes using a binary-tree.
Collapse
|
41
|
Lin Z, Ding G, Han J, Shao L, Ding G, Lin Z, Han J, Shao L. End-to-End Feature-Aware Label Space Encoding for Multilabel Classification With Many Classes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2472-2487. [PMID: 28500009 DOI: 10.1109/tnnls.2017.2691545] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
To make the problem of multilabel classification with many classes more tractable, in recent years, academia has seen efforts devoted to performing label space dimension reduction (LSDR). Specifically, LSDR encodes high-dimensional label vectors into low-dimensional code vectors lying in a latent space, so as to train predictive models at much lower costs. With respect to the prediction, it performs classification for any unseen instance by recovering a label vector from its predicted code vector via a decoding process. In this paper, we propose a novel method, namely End-to-End Feature-aware label space Encoding (E2FE), to perform LSDR. Instead of requiring an encoding function like most previous works, E2FE directly learns a code matrix formed by code vectors of the training instances in an end-to-end manner. Another distinct property of E2FE is its feature awareness attributable to the fact that the code matrix is learned by jointly maximizing the recoverability of the label space and the predictability of the latent space. Based on the learned code matrix, E2FE further trains predictive models to map instance features into code vectors, and also learns a linear decoding matrix for efficiently recovering the label vector of any unseen instance from its predicted code vector. Theoretical analyses show that both the code matrix and the linear decoding matrix in E2FE can be efficiently learned. Moreover, similar to previous works, E2FE can be specified to learn an encoding function. And it can also be extended with kernel tricks to handle nonlinear correlations between the feature space and the latent space. Comprehensive experiments conducted on diverse benchmark data sets with many classes show consistent performance gains of E2FE over the state-of-the-art methods.
Collapse
|
42
|
Qian X, Li C, Lan K, Hou X, Li Z, Han J. POI Summarization by Aesthetics Evaluation From Crowd Source Social Media. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1178-1189. [PMID: 29220319 DOI: 10.1109/tip.2017.2769454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Place-of-Interest (POI) summarization by aesthetics evaluation can recommend a set of POI images to the user and it is significant in image retrieval. In this paper, we propose a system that summarizes a collection of POI images regarding both aesthetics and diversity of the distribution of cameras. First, we generate visual albums by a coarse-to-fine POI clustering approach and then generate 3D models for each album by the collected images from social media. Second, based on the 3D to 2D projection relationship, we select candidate photos in terms of the proposed crowd source saliency model. Third, in order to improve the performance of aesthetic measurement model, we propose a crowd-sourced saliency detection approach by exploring the distribution of salient regions in the 3D model. Then, we measure the composition aesthetics of each image and we explore crowd source salient feature to yield saliency map, based on which, we propose an adaptive image adoption approach. Finally, we combine the diversity and the aesthetics to recommend aesthetic pictures. Experimental results show that the proposed POI summarization approach can return images with diverse camera distributions and aesthetics.
Collapse
|
43
|
Guo Y, Ding G, Han J. Robust Quantization for General Similarity Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:949-963. [PMID: 29757738 DOI: 10.1109/tip.2017.2766445] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The recent years have witnessed the emerging of vector quantization (VQ) techniques for efficient similarity search. VQ partitions the feature space into a set of codewords and encodes data points as integer indices using the codewords. Then the distance between data points can be efficiently approximated by simple memory lookup operations. By the compact quantization, the storage cost, and searching complexity are significantly reduced, thereby facilitating efficient large-scale similarity search. However, the performance of several celebrated VQ approaches degrades significantly when dealing with noisy data. In addition, it can barely facilitate a wide range of applications as the distortion measurement only limits to ℓ2 norm. To address the shortcomings of the squared Euclidean (ℓ2,2 norm) loss function employed by the VQ approaches, in this paper, we propose a novel robust and general VQ framework, named RGVQ, to enhance both robustness and generalization of VQ approaches. Specifically, a ℓp,q-norm loss function is proposed to conduct the ℓp-norm similarity search, rather than the ℓ2 norm search, and the q-th order loss is used to enhance the robustness. Despite the fact that changing the loss function to ℓp,q norm makes VQ approaches more robust and generic, it brings us a challenge that a non-smooth and non-convex orthogonality constrained ℓp,q-norm function has to be minimized. To solve this problem, we propose a novel and efficient optimization scheme and specify it to VQ approaches and theoretically prove its convergence. Extensive experiments on benchmark data sets demonstrate that the proposed RGVQ is better than the original VQ for several approaches, especially when searching similarity in noisy data.
Collapse
|
44
|
|
45
|
Cheng G, Zhou P, Han J. Duplex Metric Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:281-292. [PMID: 28991740 DOI: 10.1109/tip.2017.2760512] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Image set classification has attracted much attention because of its broad applications. Despite the success made so far, the problems of intra-class diversity and inter-class similarity still remain two major challenges. To explore a possible solution to these challenges, this paper proposes a novel approach, termed duplex metric learning (DML), for image set classification. The proposed DML consists of two progressive metric learning stages with different objectives used for feature learning and image classification, respectively. The metric learning regularization is not only used to learn powerful feature representations but also well explored to train an effective classifier. At the first stage, we first train a discriminative stacked autoencoder (DSAE) by layer-wisely imposing a metric learning regularization term on the neurons in the hidden layers and meanwhile minimizing the reconstruction error to obtain new feature mappings in which similar samples are mapped closely to each other and dissimilar samples are mapped farther apart. At the second stage, we discriminatively train a classifier and simultaneously fine-tune the DSAE by optimizing a new objective function, which consists of a classification error term and a metric learning regularization term. Finally, two simple voting strategies are devised for image set classification based on the learnt classifier. In the experiments, we extensively evaluate the proposed framework for the tasks of face recognition, object recognition, and face verification on several commonly-used data sets and state-of-the-art results are achieved in comparison with existing methods.
Collapse
|
46
|
|
47
|
Bhatia M, Sood SK. A comprehensive health assessment framework to facilitate IoT-assisted smart workouts: A predictive healthcare perspective. COMPUT IND 2017. [DOI: 10.1016/j.compind.2017.06.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
48
|
Ding G, Zhou J, Guo Y, Lin Z, Zhao S, Han J. Large-scale image retrieval with Sparse Embedded Hashing. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.055] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
49
|
Qian X, Lu D, Wang Y, Zhu L, Tang YY, Wang M. Image Re-Ranking Based on Topic Diversity. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:3734-3747. [PMID: 28463199 DOI: 10.1109/tip.2017.2699623] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Social media sharing Websites allow users to annotate images with free tags, which significantly contribute to the development of the web image retrieval. Tag-based image search is an important method to find images shared by users in social networks. However, how to make the top ranked result relevant and with diversity is challenging. In this paper, we propose a topic diverse ranking approach for tag-based image retrieval with the consideration of promoting the topic coverage performance. First, we construct a tag graph based on the similarity between each tag. Then, the community detection method is conducted to mine the topic community of each tag. After that, inter-community and intra-community ranking are introduced to obtain the final retrieved results. In the inter-community ranking process, an adaptive random walk model is employed to rank the community based on the multi-information of each topic community. Besides, we build an inverted index structure for images to accelerate the searching process. Experimental results on Flickr data set and NUS-Wide data sets show the effectiveness of the proposed approach.
Collapse
|
50
|
Shi Y, Chen G, Li J. Malicious Domain Name Detection Based on Extreme Machine Learning. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9666-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|