1
|
Shen X, Wu W, Wang X, Zheng Y. Multiple Riemannian Kernel Hashing for Large-Scale Image Set Classification and Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4261-4273. [PMID: 38954580 DOI: 10.1109/tip.2024.3419414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Conventional image set methods typically learn from small to medium-sized image set datasets. However, when applied to large-scale image set applications such as classification and retrieval, they face two primary challenges: 1) effectively modeling complex image sets; and 2) efficiently performing tasks. To address the above issues, we propose a novel Multiple Riemannian Kernel Hashing (MRKH) method that leverages the powerful capabilities of Riemannian manifold and Hashing on effective and efficient image set representation. MRKH considers multiple heterogeneous Riemannian manifolds to represent each image set. It introduces a multiple kernel learning framework designed to effectively combine statistics from multiple manifolds, and constructs kernels by selecting a small set of anchor points, enabling efficient scalability for large-scale applications. In addition, MRKH further exploits inter- and intra-modal semantic structure to enhance discrimination. Instead of employing continuous feature to represent each image set, MRKH suggests learning hash code for each image set, thereby achieving efficient computation and storage. We present an iterative algorithm with theoretical convergence guarantee to optimize MRKH, and the computational complexity is linear with the size of dataset. Extensive experiments on five image set benchmark datasets including three large-scale ones demonstrate the proposed method outperforms state-of-the-arts in accuracy and efficiency particularly in large-scale image set classification and retrieval.
Collapse
|
2
|
Liang X, Yang E, Yang Y, Deng C. Multi-Relational Deep Hashing for Cross-Modal Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3009-3020. [PMID: 38625760 DOI: 10.1109/tip.2024.3385656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Deep cross-modal hashing retrieval has recently made significant progress. However, existing methods generally learn hash functions with pairwise or triplet supervisions, which involves learning the relevant information by splicing partial similarity between data pairs; notably, this approach only captures the data similarity locally and incompletely, resulting in sub-optimal retrieval performance. In this paper, we propose a novel Multi-Relational Deep Hashing (MRDH) approach, which can fully bridge the modality gap by comprehensively modeling the similarity relationship between data in different modalities. In more detail, to investigate the inter-modal relationships, we constrain the consistency of cross-modal pairwise similarities to maintain the semantic similarity across modalities. Moreover, to further capture complete similarity information, we design a new similarity metric, which we term cross-modal global similarity, by encouraging hash codes of similar data pairs from different modalities to approach a common center and hash codes for dissimilar pairs to converge to different centers. Adopting this approach enables our model to generate more discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate the superiority of our method on cross-modal hashing retrieval.
Collapse
|
3
|
Shi D, Zhu L, Li J, Cheng Z, Zhang Z. Flexible Multiview Spectral Clustering With Self-Adaptation. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2586-2599. [PMID: 34910658 DOI: 10.1109/tcyb.2021.3131749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Multiview spectral clustering (MVSC) has achieved state-of-the-art clustering performance on multiview data. Most existing approaches first simply concatenate multiview features or combine multiple view-specific graphs to construct a unified fusion graph and then perform spectral embedding and cluster label discretization with k -means to obtain the final clustering results. They suffer from an important drawback: all views are treated as fixed when fusing multiple graphs and equal when handling the out-of-sample extension. They cannot adaptively differentiate the discriminative capabilities of multiview features. To alleviate these problems, we propose a flexible MVSC with self-adaptation (FMSCS) method in this article. A self-adaptive learning scheme is designed for structured graph construction, multiview graph fusion, and out-of-sample extension. Specifically, we learn a fusion graph with a desirable clustering structure by adaptively exploiting the complementarity of different view features under the guidance of a proper rank constraint. Meanwhile, we flexibly learn multiple projection matrices to handle the out-of-sample extension by adaptively adjusting the view combination weights according to the specific contents of unseen data. Finally, we derive an alternate optimization strategy that guarantees desirable convergence to iteratively solve the formulated unified learning model. Extensive experiments demonstrate the superiority of our proposed method compared with state-of-the-art MVSC approaches. For the purpose of reproducibility, we provide the code and testing datasets at https://github.com/shidan0122/FMICS.
Collapse
|
4
|
Sun Y, Wang X, Peng D, Ren Z, Shen X. Hierarchical Hashing Learning for Image Set Classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1732-1744. [PMID: 37028051 DOI: 10.1109/tip.2023.3251025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With the development of video network, image set classification (ISC) has received a lot of attention and can be used for various practical applications, such as video based recognition, action recognition, and so on. Although the existing ISC methods have obtained promising performance, they often have extreme high complexity. Due to the superiority in storage space and complexity cost, learning to hash becomes a powerful solution scheme. However, existing hashing methods often ignore complex structural information and hierarchical semantics of the original features. They usually adopt a single-layer hashing strategy to transform high-dimensional data into short-length binary codes in one step. This sudden drop of dimension could result in the loss of advantageous discriminative information. In addition, they do not take full advantage of intrinsic semantic knowledge from whole gallery sets. To tackle these problems, in this paper, we propose a novel Hierarchical Hashing Learning (HHL) for ISC. Specifically, a coarse-to-fine hierarchical hashing scheme is proposed that utilizes a two-layer hash function to gradually refine the beneficial discriminative information in a layer-wise fashion. Besides, to alleviate the effects of redundant and corrupted features, we impose the $\ell _{2,1}$ norm on the layer-wise hash function. Moreover, we adopt a bidirectional semantic representation with the orthogonal constraint to keep intrinsic semantic information of all samples in whole image sets adequately. Comprehensive experiments demonstrate HHL acquires significant improvements in accuracy and running time. We will release the demo code on https://github.com/sunyuan-cs.
Collapse
|
5
|
A Framework for Image Captioning Based on Relation Network and Multilevel Attention Mechanism. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11106-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
6
|
Fast unsupervised consistent and modality-specific hashing for multimedia retrieval. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08008-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Wang Y, Chen ZD, Luo X, Li R, Xu XS. Fast Cross-Modal Hashing With Global and Local Similarity Embedding. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10064-10077. [PMID: 33750723 DOI: 10.1109/tcyb.2021.3059886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, supervised cross-modal hashing has attracted much attention and achieved promising performance. To learn hash functions and binary codes, most methods globally exploit the supervised information, for example, preserving an at-least-one pairwise similarity into hash codes or reconstructing the label matrix with binary codes. However, due to the hardness of the discrete optimization problem, they are usually time consuming on large-scale datasets. In addition, they neglect the class correlation in supervised information. From another point of view, they only explore the global similarity of data but overlook the local similarity hidden in the data distribution. To address these issues, we present an efficient supervised cross-modal hashing method, that is, fast cross-modal hashing (FCMH). It leverages not only global similarity information but also the local similarity in a group. Specifically, training samples are partitioned into groups; thereafter, the local similarity in each group is extracted. Moreover, the class correlation in labels is also exploited and embedded into the learning of binary codes. In addition, to solve the discrete optimization problem, we further propose an efficient discrete optimization algorithm with a well-designed group updating scheme, making its computational complexity linear to the size of the training set. In light of this, it is more efficient and scalable to large-scale datasets. Extensive experiments on three benchmark datasets demonstrate that FCMH outperforms some state-of-the-art cross-modal hashing approaches in terms of both retrieval accuracy and learning efficiency.
Collapse
|
8
|
Song D, Nie WZ, Li WH, Kankanhalli M, Liu AA. Monocular Image-Based 3-D Model Retrieval: A Benchmark. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8114-8127. [PMID: 33531330 DOI: 10.1109/tcyb.2021.3051016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Monocular image-based 3-D model retrieval aims to search for relevant 3-D models from a dataset given one RGB image captured in the real world, which can significantly benefit several applications, such as self-service checkout, online shopping, etc. To help advance this promising yet challenging research topic, we built a novel dataset and organized the first international contest for monocular image-based 3-D model retrieval. Moreover, we conduct a thorough analysis of the state-of-the-art methods. Existing methods can be classified into supervised and unsupervised methods. The supervised methods can be analyzed based on several important aspects, such as the strategies of domain adaptation, view fusion, loss function, and similarity measure. The unsupervised methods focus on solving this problem with unlabeled data and domain adaptation. Seven popular metrics are employed to evaluate the performance, and accordingly, we provide a thorough analysis and guidance for future work. To the best of our knowledge, this is the first benchmark for monocular image-based 3-D model retrieval, which aims to help related research in multiview feature learning, domain adaptation, and information retrieval.
Collapse
|
9
|
PBGN: Phased Bidirectional Generation Network in Text-to-Image Synthesis. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10866-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
10
|
Shi Y, Nie X, Liu X, Zou L, Yin Y. Supervised Adaptive Similarity Matrix Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2755-2766. [PMID: 35320101 DOI: 10.1109/tip.2022.3158092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Compact hash codes can facilitate large-scale multimedia retrieval, significantly reducing storage and computation. Most hashing methods learn hash functions based on the data similarity matrix, which is predefined by supervised labels or a distance metric type. However, this predefined similarity matrix cannot accurately reflect the real similarity relationship among images, which results in poor retrieval performance of hashing methods, especially in multi-label datasets and zero-shot datasets that are highly dependent on similarity relationships. Toward this end, this study proposes a new supervised hashing method called supervised adaptive similarity matrix hashing (SASH) via feature-label space consistency. SASH not only learns the similarity matrix adaptively, but also extracts the label correlations by maintaining consistency between the feature and the label space. This correlation information is then used to optimize the similarity matrix. The experiments on three large normal benchmark datasets (including two multi-label datasets) and three large zero-shot benchmark datasets show that SASH has an excellent performance compared with several state-of-the-art techniques.
Collapse
|
11
|
Cheema U, Ahmad M, Han D, Moon S. Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition Using Cross-Modality Discriminator Network and Unit-Class Loss. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4623368. [PMID: 35310577 PMCID: PMC8933114 DOI: 10.1155/2022/4623368] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 02/09/2022] [Indexed: 11/17/2022]
Abstract
Heterogeneous face recognition (HFR) aims to match face images across different imaging domains such as visible-to-infrared and visible-to-thermal. Recently, the increasing utility of nonvisible imaging has increased the application prospects of HFR in areas such as biometrics, security, and surveillance. HFR is a challenging variate of face recognition due to the differences between different imaging domains. While the current research has proposed image preprocessing, feature extraction, or common subspace projection for HFR, the optimization of these multi-stage methods is a challenging task as each step needs to be optimized separately and the performance error accumulates over each stage. In this paper, we propose a unified end-to-end Cross-Modality Discriminator Network (CMDN) for HFR. The proposed network uses a Deep Relational Discriminator module to learn deep feature relations for cross-domain face matching. Simultaneously, the CMDN is used to extract modality-independent embedding vectors for face images. The CMDN parameters are optimized using a novel Unit-Class Loss that shows higher stability and accuracy over other popular metric-learning loss functions. The experimental results on five popular HFR datasets demonstrate that the proposed method achieves significant improvement over the existing state-of-the-art methods.
Collapse
Affiliation(s)
- Usman Cheema
- Department of Computer Engineering, Sejong University, Seoul, Republic of Korea
| | - Mobeen Ahmad
- Department of Computer Engineering, Sejong University, Seoul, Republic of Korea
| | - Dongil Han
- Department of Computer Engineering, Sejong University, Seoul, Republic of Korea
| | - Seungbin Moon
- Department of Computer Engineering, Sejong University, Seoul, Republic of Korea
| |
Collapse
|
12
|
Xiang X, Zhang Y, Jin L, Li Z, Tang J. Sub-Region Localized Hashing for Fine-Grained Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:314-326. [PMID: 34871171 DOI: 10.1109/tip.2021.3131042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image hashing is challenging due to the difficulties of capturing discriminative local information to generate hash codes. On the one hand, existing methods usually extract local features with the dense attention mechanism by focusing on dense local regions, which cannot contain diverse local information for fine-grained hashing. On the other hand, hash codes of the same class suffer from large intra-class variation of fine-grained images. To address the above problems, this work proposes a novel sub-Region Localized Hashing (sRLH) to learn intra-class compact and inter-class separable hash codes that also contain diverse subtle local information for efficient fine-grained image retrieval. Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map. Different from localizing dense local regions, these peaks can guide the sub-region localization module to capture multifarious local discriminative information by paying close attention to dispersive local regions. To mitigate intra-class variations, hash codes of the same class are enforced to approach one common binary center. Meanwhile, the gram-schmidt orthogonalization is performed on the binary centers to make the hash codes inter-class separable. Extensive experimental results on four widely used fine-grained image retrieval datasets demonstrate the superiority of sRLH to several state-of-the-art methods. The source code of sRLH will be released at https://github.com/ZhangYajie-NJUST/sRLH.git.
Collapse
|
13
|
Jiang Z, Lian Z, Wang J. Dual Attention Triplet Hashing Network for Image Retrieval. Front Neurorobot 2021; 15:728161. [PMID: 34733150 PMCID: PMC8560054 DOI: 10.3389/fnbot.2021.728161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 09/23/2021] [Indexed: 11/13/2022] Open
Abstract
In recent years, learning-based hashing techniques have proven to be efficient for large-scale image retrieval. However, since most of the hash codes learned by deep hashing methods contain repetitive and correlated information, there are some limitations. In this paper, we propose a Dual Attention Triplet Hashing Network (DATH). DATH is implemented with two-stream ConvNet architecture. Specifically, the first neural network focuses on the spatial semantic relevance, and the second neural network focuses on the channel semantic correlation. These two neural networks are incorporated to create an end-to-end trainable framework. At the same time, in order to make better use of label information, DATH combines triplet likelihood loss and classification loss to optimize the network. Experimental results show that DATH has achieved the state-of-the-art performance on benchmark datasets.
Collapse
Affiliation(s)
- Zhukai Jiang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Zhichao Lian
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Jinping Wang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| |
Collapse
|
14
|
Multi-modal discrete tensor decomposition hashing for efficient multimedia retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
15
|
Feng H, Wang N, Tang J. Deep Weibull hashing with maximum mean discrepancy quantization for image retrieval. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
16
|
|
17
|
Weighted-Attribute Triplet Hashing for Large-Scale Similar Judicial Case Matching. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:6650962. [PMID: 33953738 PMCID: PMC8064799 DOI: 10.1155/2021/6650962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/04/2021] [Accepted: 03/22/2021] [Indexed: 11/23/2022]
Abstract
Similar judicial case matching aims to enable an accurate selection of a judicial document that is most similar to the target document from multiple candidates. The core of similar judicial case matching is to calculate the similarity between two fact case documents. Owing to similar judicial case matching techniques, legal professionals can promptly find and judge similar cases in a candidate set. These techniques can also benefit the development of judicial systems. However, the document of judicial cases not only is long in length but also has a certain degree of structural complexity. Meanwhile, a variety of judicial cases are also increasing rapidly; thus, it is difficult to find the document most similar to the target document in a large corpus. In this study, we present a novel similar judicial case matching model, which obtains the weight of judicial feature attributes based on hash learning and realizes fast similar matching by using a binary code. The proposed model extracts the judicial feature attributes vector using the bidirectional encoder representations from transformers (BERT) model and subsequently obtains the weighted judicial feature attributes through learning the hash function. We further impose triplet constraints to ensure that the similarity of judicial case data is well preserved when projected into the Hamming space. Comprehensive experimental results on public datasets show that the proposed method is superior in the task of similar judicial case matching and is suitable for large-scale similar judicial case matching.
Collapse
|
18
|
Yang Z, Yang L, Raymond OI, Zhu L, Huang W, Liao Z, Long J. NSDH: A Nonlinear Supervised Discrete Hashing framework for large-scale cross-modal retrieval. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106818] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
19
|
Feng H, Wang N, Tang J, Chen J, Chen F. Multi-granularity feature learning network for deep hashing. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
20
|
Qiang H, Wan Y, Liu Z, Xiang L, Meng X. Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106188] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|