1
|
Jin L, Li Z, Pan Y, Tang J. Relational Consistency Induced Self-Supervised Hashing for Image Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1482-1494. [PMID: 37995167 DOI: 10.1109/tnnls.2023.3333294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
This article proposes a new hashing framework named relational consistency induced self-supervised hashing (RCSH) for large-scale image retrieval. To capture the potential semantic structure of data, RCSH explores the relational consistency between data samples in different spaces, which learns reliable data relationships in the latent feature space and then preserves the learned relationships in the Hamming space. The data relationships are uncovered by learning a set of prototypes that group similar data samples in the latent feature space. By uncovering the semantic structure of the data, meaningful data-to-prototype and data-to-data relationships are jointly constructed. The data-to-prototype relationships are captured by constraining the prototype assignments generated from different augmented views of an image to be the same. Meanwhile, these data-to-prototype relationships are preserved to learn informative compact hash codes by matching them with these reliable prototypes. To accomplish this, a novel dual prototype contrastive loss is proposed to maximize the agreement of prototype assignments in the latent feature space and Hamming space. The data-to-data relationships are captured by enforcing the distribution of pairwise similarities in the latent feature space and Hamming space to be consistent, which makes the learned hash codes preserve meaningful similarity relationships. Extensive experimental results on four widely used image retrieval datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods. Besides, the proposed method achieves promising performance in out-of-domain retrieval tasks, which shows its good generalization ability. The source code and models are available at https://github.com/IMAG-LuJin/RCSH.
Collapse
|
2
|
Liang X, Yang E, Yang Y, Deng C. Multi-Relational Deep Hashing for Cross-Modal Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3009-3020. [PMID: 38625760 DOI: 10.1109/tip.2024.3385656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Deep cross-modal hashing retrieval has recently made significant progress. However, existing methods generally learn hash functions with pairwise or triplet supervisions, which involves learning the relevant information by splicing partial similarity between data pairs; notably, this approach only captures the data similarity locally and incompletely, resulting in sub-optimal retrieval performance. In this paper, we propose a novel Multi-Relational Deep Hashing (MRDH) approach, which can fully bridge the modality gap by comprehensively modeling the similarity relationship between data in different modalities. In more detail, to investigate the inter-modal relationships, we constrain the consistency of cross-modal pairwise similarities to maintain the semantic similarity across modalities. Moreover, to further capture complete similarity information, we design a new similarity metric, which we term cross-modal global similarity, by encouraging hash codes of similar data pairs from different modalities to approach a common center and hash codes for dissimilar pairs to converge to different centers. Adopting this approach enables our model to generate more discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate the superiority of our method on cross-modal hashing retrieval.
Collapse
|
3
|
Yang E, Deng C, Liu M. Deep Bayesian Quantization for Supervised Neuroimage Search. MACHINE LEARNING IN MEDICAL IMAGING. MLMI (WORKSHOP) 2023; 14349:396-406. [PMID: 38390519 PMCID: PMC10883338 DOI: 10.1007/978-3-031-45676-3_40] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Neuroimage retrieval plays a crucial role in providing physicians with access to previous similar cases, which is essential for case-based reasoning and evidence-based medicine. Due to low computation and storage costs, hashing-based search techniques have been widely adopted for establishing image retrieval systems. However, these methods often suffer from nonnegligible quantization loss, which can degrade the overall search performance. To address this issue, this paper presents a compact coding solution namely Deep Bayesian Quantization (DBQ), which focuses on deep compact quantization that can estimate continuous neuroimage representations and achieve superior performance over existing hashing solutions. Specifically, DBQ seamlessly combines the deep representation learning and the representation compact quantization within a novel Bayesian learning framework, where a proxy embedding-based likelihood function is developed to alleviate the sampling issue for traditional similarity supervision. Additionally, a Gaussian prior is employed to reduce the quantization losses. By utilizing pre-computed lookup tables, the proposed DBQ can enable efficient and effective similarity search. Extensive experiments conducted on 2, 008 structural MRI scans from three benchmark neuroimage datasets demonstrate that our method outperforms previous state-of-the-arts.
Collapse
Affiliation(s)
- Erkun Yang
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Xidian University, Xi'an, China
| | | | - Mingxia Liu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
4
|
Zhao S, Hu M, Cai Z, Liu F. Dynamic Modeling Cross-Modal Interactions in Two-Phase Prediction for Entity-Relation Extraction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1122-1131. [PMID: 34432639 DOI: 10.1109/tnnls.2021.3104971] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Joint extraction of entities and their relations benefits from the close interaction between named entities and their relation information. Therefore, how to effectively model such cross-modal interactions is critical for the final performance. Previous works have used simple methods, such as label-feature concatenation, to perform coarse-grained semantic fusion among cross-modal instances but fail to capture fine-grained correlations over token and label spaces, resulting in insufficient interactions. In this article, we propose a dynamic cross-modal attention network (CMAN) for joint entity and relation extraction. The network is carefully constructed by stacking multiple attention units in depth to dynamic model dense interactions over token-label spaces, in which two basic attention units and a novel two-phase prediction are proposed to explicitly capture fine-grained correlations across different modalities (e.g., token-to-token and label-to-token). Experiment results on the CoNLL04 dataset show that our model obtains state-of-the-art results by achieving 91.72% F1 on entity recognition and 73.46% F1 on relation classification. In the ADE and DREC datasets, our model surpasses existing approaches by more than 2.1% and 2.54% F1 on relation classification. Extensive analyses further confirm the effectiveness of our approach.
Collapse
|
5
|
Fan Z, Shi L, Liu Q, Li Z, Zhang Z. Discriminative Fisher Embedding Dictionary Transfer Learning for Object Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:64-78. [PMID: 34170834 DOI: 10.1109/tnnls.2021.3089566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In transfer learning model, the source domain samples and target domain samples usually share the same class labels but have different distributions. In general, the existing transfer learning algorithms ignore the interclass differences and intraclass similarities across domains. To address these problems, this article proposes a transfer learning algorithm based on discriminative Fisher embedding and adaptive maximum mean discrepancy (AMMD) constraints, called discriminative Fisher embedding dictionary transfer learning (DFEDTL). First, combining the label information of source domain and part of target domain, we construct the discriminative Fisher embedding model to preserve the interclass differences and intraclass similarities of training samples in transfer learning. Second, an AMMD model is constructed using atoms and profiles, which can adaptively minimize the distribution differences between source domain and target domain. The proposed method has three advantages: 1) using the Fisher criterion, we construct the discriminative Fisher embedding model between source domain samples and target domain samples, which encourages the samples from the same class to have similar coding coefficients; 2) instead of using the training samples to design the maximum mean discrepancy (MMD), we construct the AMMD model based on the relationship between the dictionary atoms and profiles; thus, the source domain samples can be adaptive to the target domain samples; and 3) the dictionary learning is based on the combination of source and target samples which can avoid the classification error caused by the difference among samples and reduce the tedious and expensive data annotation. A large number of experiments on five public image classification datasets show that the proposed method obtains better classification performance than some state-of-the-art dictionary and transfer learning methods. The code has been available at https://github.com/shilinrui/DFEDTL.
Collapse
|
6
|
Discrete matrix factorization cross-modal hashing with multi-similarity consistency. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00950-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
AbstractRecently, matrix factorization-based hashing has gained wide attention because of its strong subspace learning ability and high search efficiency. However, some problems need to be further addressed. First, uniform hash codes can be generated by collective matrix factorization, but they often cause serious loss, degrading the quality of hash codes. Second, most of them preserve the absolute similarity simply in hash codes, failing to capture the inherent semantic affinity among training data. To overcome these obstacles, we propose a Discrete Multi-similarity Consistent Matrix Factorization Hashing (DMCMFH). Specifically, an individual subspace is first learned by matrix factorization and multi-similarity consistency for each modality. Then, the subspaces are aligned by a shared semantic space to generate homogenous hash codes. Finally, an iterative-based discrete optimization scheme is presented to reduce the quantization loss. We conduct quantitative experiments on three datasets, MSCOCO, Mirflickr25K and NUS-WIDE. Compared with supervised baseline methods, DMCMFH achieves increases of $$0.22\%$$
0.22
%
, $$3.00\%$$
3.00
%
and $$0.79\%$$
0.79
%
on the image-query-text tasks for three datasets respectively, and achieves increases of $$0.21\%$$
0.21
%
, $$1.62\%$$
1.62
%
and $$0.50\%$$
0.50
%
on the text-query-image tasks for three datasets respectively.
Collapse
|
7
|
Williams-Lekuona M, Cosma G, Phillips I. A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval. J Imaging 2022; 8:jimaging8120328. [PMID: 36547493 PMCID: PMC9785405 DOI: 10.3390/jimaging8120328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022] Open
Abstract
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has been on training with paired data. Paired data refers to samples with one-to-one correspondence across modalities, e.g., image and text pairs where the text sample describes the image. However, real-world applications produce unpaired data that cannot be utilised by most current CMH methods during the training process. Models that can learn from unpaired data are crucial for real-world applications such as cross-modal neural information retrieval where paired data is limited or not available to train the model. This paper provides (1) an overview of the CMH methods when applied to unpaired datasets, (2) proposes a framework that enables pairwise-constrained CMH methods to train with unpaired samples, and (3) evaluates the performance of state-of-the-art CMH methods across different pairing scenarios.
Collapse
|
8
|
Wang X, Zheng Z, He Y, Yan F, Zeng Z, Yang Y. Soft Person Reidentification Network Pruning via Blockwise Adjacent Filter Decaying. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13293-13307. [PMID: 34910650 DOI: 10.1109/tcyb.2021.3130047] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep learning has shown significant successes in person reidentification (re-id) tasks. However, most existing works focus on discriminative feature learning and impose complex neural networks, suffering from low inference efficiency. In fact, feature extraction time is also crucial for real-world applications and lightweight models are needed. Prevailing pruning methods usually pay attention to compact classification models. However, these methods are suboptimal for compacting re-id models, which usually produce continuous features and are sensitive to network pruning. The key point of pruning re-id models is how to retain the original filter distribution in continuous features as much as possible. In this work, we propose a blockwise adjacent filter decaying method to fill this gap. Specifically, given a trained model, we first evaluate the redundancy of filters based on the adjacency relationships to preserve the original filter distribution. Second, previous layerwise pruning methods ignore that discriminative information is enhanced block-by-block. Therefore, we propose a blockwise filter pruning strategy to better utilize the block relations in the pretrained model. Third, we propose a novel filter decaying policy to progressively reduce the scale of redundant filters. Different from conventional soft filter pruning that directly sets the filter values as zeros, the proposed filter decaying can keep the pretrained knowledge as much as possible. We evaluate our method on three popular person reidentification datasets, that is: 1) Market-1501; 2) DukeMTMC-reID; and 3) MSMT17_V1. The proposed method shows superior performance to the existing state-of-the-art pruning methods. After pruning over 91.9% parameters on DukeMTMC-reID, the Rank-1 accuracy only drops 3.7%, demonstrating its effectiveness for compacting person reidentification.
Collapse
|
9
|
|
10
|
Quan D, Wang S, Huyan N, Chanussot J, Wang R, Liang X, Hou B, Jiao L. Element-Wise Feature Relation Learning Network for Cross-Spectral Image Patch Matching. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3372-3386. [PMID: 33544676 DOI: 10.1109/tnnls.2021.3052756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, the majority of successful matching approaches are based on convolutional neural networks, which focus on learning the invariant and discriminative features for individual image patches based on image content. However, the image patch matching task is essentially to predict the matching relationship of patch pairs, that is, matching (similar) or non-matching (dissimilar). Therefore, we consider that the feature relation (FR) learning is more important than individual feature learning for image patch matching problem. Motivated by this, we propose an element-wise FR learning network for image patch matching, which transforms the image patch matching task into an image relationship-based pattern classification problem and dramatically improves generalization performances on image matching. Meanwhile, the proposed element-wise learning methods encourage full interaction between feature information and can naturally learn FR. Moreover, we propose to aggregate FR from multilevels, which integrates the multiscale FR for more precise matching. Experimental results demonstrate that our proposal achieves superior performances on cross-spectral image patch matching and single spectral image patch matching, and good generalization on image patch retrieval.
Collapse
|
11
|
Xie Y, Zeng X, Wang T, Yi Y. Online deep hashing for both uni-modal and cross-modal retrieval. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
Zhang Z, Li Z, Wei K, Pan S, Deng C. A survey on multimodal-guided visual content synthesis. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Yu G, Liu X, Wang J, Domeniconi C, Zhang X. Flexible Cross-Modal Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:304-314. [PMID: 33052870 DOI: 10.1109/tnnls.2020.3027729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hashing has been widely adopted for large-scale data retrieval in many domains due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities is readily available. This assumption is unrealistic in practical applications. In addition, existing methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (FlexCMH) to learn effective hashing codes from weakly paired data, whose correspondence across modalities is partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the structure of each cluster and, thus, to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes the potential correspondence, the cross-modal hashing functions derived from the correspondence, and a hashing quantitative loss in a unified objective function. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions and reinforce the reciprocal effects of the two objectives. Experiments on public multimodal data sets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it, indeed, offers a high degree of flexibility for practical cross-modal hashing tasks.
Collapse
|
14
|
Xiang X, Zhang Y, Jin L, Li Z, Tang J. Sub-Region Localized Hashing for Fine-Grained Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:314-326. [PMID: 34871171 DOI: 10.1109/tip.2021.3131042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fine-grained image hashing is challenging due to the difficulties of capturing discriminative local information to generate hash codes. On the one hand, existing methods usually extract local features with the dense attention mechanism by focusing on dense local regions, which cannot contain diverse local information for fine-grained hashing. On the other hand, hash codes of the same class suffer from large intra-class variation of fine-grained images. To address the above problems, this work proposes a novel sub-Region Localized Hashing (sRLH) to learn intra-class compact and inter-class separable hash codes that also contain diverse subtle local information for efficient fine-grained image retrieval. Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map. Different from localizing dense local regions, these peaks can guide the sub-region localization module to capture multifarious local discriminative information by paying close attention to dispersive local regions. To mitigate intra-class variations, hash codes of the same class are enforced to approach one common binary center. Meanwhile, the gram-schmidt orthogonalization is performed on the binary centers to make the hash codes inter-class separable. Extensive experimental results on four widely used fine-grained image retrieval datasets demonstrate the superiority of sRLH to several state-of-the-art methods. The source code of sRLH will be released at https://github.com/ZhangYajie-NJUST/sRLH.git.
Collapse
|
15
|
Hu P, Peng X, Zhu H, Lin J, Zhen L, Peng D. Joint Versus Independent Multiview Hashing for Cross-View Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4982-4993. [PMID: 33119532 DOI: 10.1109/tcyb.2020.3027614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.
Collapse
|
16
|
Hu W, Wu L, Jian M, Chen Y, Yu H. Cosine metric supervised deep hashing with balanced similarity. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.093] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Quadruplet-Based Deep Cross-Modal Hashing. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9968716. [PMID: 34306059 PMCID: PMC8270718 DOI: 10.1155/2021/9968716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 05/24/2021] [Accepted: 06/14/2021] [Indexed: 12/02/2022]
Abstract
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.
Collapse
|
18
|
Ma L, Li X, Shi Y, Huang L, Huang Z, Wu J. Learning discrete class-specific prototypes for deep semantic hashing. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.057] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
19
|
Xiao X, Chen Y, Gong YJ, Zhou Y. Prior Knowledge Regularized Multiview Self-Representation and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1325-1338. [PMID: 32310792 DOI: 10.1109/tnnls.2020.2984625] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
To learn the self-representation matrices/tensor that encodes the intrinsic structure of the data, existing multiview self-representation models consider only the multiview features and, thus, impose equal membership preference across samples. However, this is inappropriate in real scenarios since the prior knowledge, e.g., explicit labels, semantic similarities, and weak-domain cues, can provide useful insights into the underlying relationship of samples. Based on this observation, this article proposes a prior knowledge regularized multiview self-representation (P-MVSR) model, in which the prior knowledge, multiview features, and high-order cross-view correlation are jointly considered to obtain an accurate self-representation tensor. The general concept of "prior knowledge" is defined as the complement of multiview features, and the core of P-MVSR is to take advantage of the membership preference, which is derived from the prior knowledge, to purify and refine the discovered membership of the data. Moreover, P-MVSR adopts the same optimization procedure to handle different prior knowledge and, thus, provides a unified framework for weakly supervised clustering and semisupervised classification. Extensive experiments on real-world databases demonstrate the effectiveness of the proposed P-MVSR model.
Collapse
|
20
|
Chen Z, Fang Z, Sheng V, Zhao J, Fan W, Edwards A, Zhang K. Adaptive Robust Local Online Density Estimation for Streaming Data. INT J MACH LEARN CYB 2021; 12:1803-1824. [PMID: 34149955 DOI: 10.1007/s13042-021-01275-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Accurate online density estimation is crucial to numerous applications that are prevalent with streaming data. Existing online approaches for density estimation somewhat lack prompt adaptability and robustness when facing concept-drifting and noisy streaming data, resulting in delayed or even deteriorated approximations. To alleviate this issue, in this work, we first propose an adaptive local online kernel density estimator (ALoKDE) for real-time density estimation on data streams. ALoKDE consists of two tightly integrated strategies: (1) a statistical test for concept drift detection and (2) an adaptive weighted local online density estimation when a drift does occur. Specifically, using a weighted form, ALoKDE seeks to provide an unbiased estimation by factoring in the statistical hallmarks of the latest learned distribution and any potential distributional changes that could be introduced by each incoming instance. A robust variant of ALoKDE, i.e., R-ALoKDE, is further developed to effectively handle data streams with varied types/levels of noise. Moreover, we analyze the asymptotic properties of ALoKDE and R-ALoKDE, and also derive their theoretical error bounds regarding bias, variance, MSE and MISE. Extensive comparative studies on various artificial and real-world (noisy) streaming data demonstrate the efficacies of ALoKDE and R-ALoKDE in online density estimation and real-time classification (with noise).
Collapse
Affiliation(s)
- Zhong Chen
- Department of Computer Science, Xavier University of Louisiana, New Orleans LA, USA
| | - Zhide Fang
- Biostatistics, School of Public Health, LSU Health Sciences Center, New Orleans LA, USA
| | - Victor Sheng
- Department of Computer Science, Texas Tech University, Lubbock TX, USA
| | - Jiabin Zhao
- Cisco Services Technology Group, San Jose CA, USA
| | - Wei Fan
- Tencent Medical AI Lab, Palo Alto CA, USA
| | - Andrea Edwards
- Department of Computer Science, Xavier University of Louisiana, New Orleans LA, USA
| | - Kun Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans LA, USA
| |
Collapse
|
21
|
Zhao W, Guan Z, Luo H, Peng J, Fan J. Deep Multiple Instance Hashing for Fast Multi-Object Image Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7995-8007. [PMID: 34554911 DOI: 10.1109/tip.2021.3112011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) approach for multi-object image retrieval. Our DMIH approach, which leverages a popular CNN model to build the end-to-end relation between a raw image and the binary hash codes of its multiple objects, can support multi-object queries effectively and integrate object detection with hashing learning seamlessly. We treat object detection as a binary multiple instance learning (MIL) problem and such instances are automatically extracted from multi-scale convolutional feature maps. We also design a conditional random field (CRF) module to capture both the semantic and spatial relations among different class labels. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in a multi-task learning scheme. Finally, a two-level inverted index method is proposed to further speed up the retrieval of multi-object queries. Our DMIH approach outperforms state-of-the-arts on public benchmarks for object-based image retrieval and achieves promising results for multi-object queries.
Collapse
|
22
|
Xia L, Li R. Multi-stream neural network fused with local information and global information for HOI detection. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01794-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
23
|
IoT-enabled directed acyclic graph in spark cluster. JOURNAL OF CLOUD COMPUTING: ADVANCES, SYSTEMS AND APPLICATIONS 2020. [DOI: 10.1186/s13677-020-00195-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
AbstractReal-time data streaming fetches live sensory segments of the dataset in the heterogeneous distributed computing environment. This process assembles data chunks at a rapid encapsulation rate through a streaming technique that bundles sensor segments into multiple micro-batches and extracts into a repository, respectively. Recently, the acquisition process is enhanced with an additional feature of exchanging IoT devices’ dataset comprised of two components: (i) sensory data and (ii) metadata. The body of sensory data includes record information, and the metadata part consists of logs, heterogeneous events, and routing path tables to transmit micro-batch streams into the repository. Real-time acquisition procedure uses the Directed Acyclic Graph (DAG) to extract live query outcomes from in-place micro-batches through MapReduce stages and returns a result set. However, few bottlenecks affect the performance during the execution process, such as (i) homogeneous micro-batches formation only, (ii) complexity of dataset diversification, (iii) heterogeneous data tuples processing, and (iv) linear DAG workflow only. As a result, it produces huge processing latency and the additional cost of extracting event-enabled IoT datasets. Thus, the Spark cluster that processes Resilient Distributed Dataset (RDD) in a fast-pace using Random access memory (RAM) defies expected robustness in processing IoT streams in the distributed computing environment. This paper presents an IoT-enabled Directed Acyclic Graph (I-DAG) technique that labels micro-batches at the stage of building a stream event and arranges stream elements with event labels. In the next step, heterogeneous stream events are processed through the I-DAG workflow, which has non-linear DAG operation for extracting queries’ results in a Spark cluster. The performance evaluation shows that I-DAG resolves homogeneous IoT-enabled stream event issues and provides an effective stream event heterogeneous solution for IoT-enabled datasets in spark clusters.
Collapse
|
24
|
Kan S, Cen Y, Cen Y, Vladimir M, Li Y, He Z. Zero-Shot Learning to Index on Semantic Trees for Scalable Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:501-516. [PMID: 33186117 DOI: 10.1109/tip.2020.3036779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this study, we develop a new approach, called zero-shot learning to index on semantic trees (LTI-ST), for efficient image indexing and scalable image retrieval. Our method learns to model the inherent correlation structure between visual representations using a binary semantic tree from training images which can be effectively transferred to new test images from unknown classes. Based on predicted correlation structure, we construct an efficient indexing scheme for the whole test image set. Unlike existing image index methods, our proposed LTI-ST method has the following two unique characteristics. First, it does not need to analyze the test images in the query database to construct the index structure. Instead, it is directly predicted by a network learnt from the training set. This zero-shot capability is critical for flexible, distributed, and scalable implementation and deployment of the image indexing and retrieval services at large scales. Second, unlike the existing distance-based index methods, our index structure is learnt using the LTI-ST deep neural network with binary encoding and decoding on a hierarchical semantic tree. Our extensive experimental results on benchmark datasets and ablation studies demonstrate that the proposed LTI-ST method outperforms existing index methods by a large margin while providing the above new capabilities which are highly desirable in practice.
Collapse
|
25
|
Li H, He X, Yu Z, Luo J. Noise-robust image fusion with low-rank sparse decomposition guided by external patch prior. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
26
|
Semantic image segmentation with shared decomposition convolution and boundary reinforcement structure. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01671-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Zheng X, Chen X, Lu X. A Joint Relationship Aware Neural Network for Single-Image 3D Human Pose Estimation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:4747-4758. [PMID: 32070954 DOI: 10.1109/tip.2020.2972104] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper studies the task of 3D human pose estimation from a single RGB image, which is challenging without depth information. Recently many deep learning methods are proposed and achieve great improvements due to their strong representation learning. However, most existing methods ignore the relationship between joint features. In this paper, a joint relationship aware neural network is proposed to take both global and local joint relationship into consideration. First, a whole feature block representing all human body joints is extracted by a convolutional neural network. A Dual Attention Module (DAM) is applied on the whole feature block to generate attention weights. By exploiting the attention module, the global relationship between the whole joints is encoded. Second, the weighted whole feature block is divided into some individual joint features. To capture salient joint feature, the individual joint features are refined by individual DAMs. Finally, a joint angle prediction constraint is proposed to consider local joint relationship. Quantitative and qualitative experiments on 3D human pose estimation benchmarks demonstrate the effectiveness of the proposed method.
Collapse
|
28
|
Yao T, Yan L, Ma Y, Yu H, Su Q, Wang G, Tian Q. Fast discrete cross-modal hashing with semantic consistency. Neural Netw 2020; 125:142-152. [PMID: 32088568 DOI: 10.1016/j.neunet.2020.01.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 12/17/2019] [Accepted: 01/28/2020] [Indexed: 11/25/2022]
Abstract
Supervised cross-modal hashing has attracted widespread concentrations for large-scale retrieval task due to its promising retrieval performance. However, most existing works suffer from some of following issues. Firstly, most of them only leverage the pair-wise similarity matrix to learn hash codes, which may result in class information loss. Secondly, the pair-wise similarity matrix generally lead to high computing complexity and memory cost. Thirdly, most of them relax the discrete constraints during optimization, which generally results in large cumulative quantization error and consequent inferior hash codes. To address above problems, we present a Fast Discrete Cross-modal Hashing method in this paper, FDCH for short. Specifically, it firstly leverages both class labels and the pair-wise similarity matrix to learn a sharing Hamming space where the semantic consistency can be better preserved. Then we propose an asymmetric hash codes learning model to avoid the challenging issue of symmetric matrix factorization. Finally, an effective and efficient discrete optimal scheme is designed to generate discrete hash codes directly, and the computing complexity and memory cost caused by the pair-wise similarity matrix are reduced from O(n2) to O(n), where n denotes the size of training set. Extensive experiments conducted on three real world datasets highlight the superiority of FDCH compared with several cross-modal hashing methods and demonstrate its effectiveness and efficiency.
Collapse
Affiliation(s)
- Tao Yao
- Department of Information and Electrical Engineering, Ludong University, Yantai, 264000, China; Yantai Research Institute of New Generation Information Technology, Southwest Jiaotong University, 264000, China.
| | - Lianshan Yan
- Yantai Research Institute of New Generation Information Technology, Southwest Jiaotong University, 264000, China
| | - Yilan Ma
- Department of Information and Electrical Engineering, Ludong University, Yantai, 264000, China
| | - Hong Yu
- Department of Information and Electrical Engineering, Ludong University, Yantai, 264000, China
| | - Qingtang Su
- Department of Information and Electrical Engineering, Ludong University, Yantai, 264000, China
| | - Gang Wang
- Department of Information and Electrical Engineering, Ludong University, Yantai, 264000, China
| | - Qi Tian
- Huawei Noah's Ark Lab, 518129, China
| |
Collapse
|