1
|
Ali R, Zahran O, El-samie FEA, Eldin SS. Efficient Blind Signal Separation Algorithms for Wireless Multimedia Communication Systems.. [DOI: 10.21203/rs.3.rs-2869492/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Abstract
This paper studies the problem of multi-user blind signal separation (BSS) in wireless communications. The existing separation algorithms work on quadrature phase shift keying (QPSK). Through this work two proposed algorithms were presented to enhance the BSS performance. The first proposed algorithm uses wavelet denoising to remove noise from the received signals in time domain. It adopts different modulation techniques such as minimum shift keying (MSK), quadrature phase shift keying (QPSK), and Gaussian minimum shift keying (GMSK) then uses several BSS algorithms such as independent component analysis (ICA), principle component analysis (PCA), and multi user kurtosis (MUK) algorithms. The second proposed algorithm transfers the problem of BSS to transform domain and uses wavelet denoising to reduce noise effect on received mixture. BSS with Discrete Sine Transform (DST) and Discrete Cosine Transform (DCT) were investigated and compared to time domain performance. Minimum square error (MSE) and signal to noise ratio (SNR) were used as the evaluating metrics. Simulation results proved that in time domain, MUK with QPSK gives best performance and wavelet denoising was found to enhance the performance of BSS under all conditions. Signal separation in transform domain was found to give better performance than that in time domain due to the energy compaction process of these transforms and noise reduction due to their averaging effect.
Collapse
|
2
|
Feng Y, Yuan Y, Lu X. Person Reidentification via Unsupervised Cross-View Metric Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1849-1859. [PMID: 31021787 DOI: 10.1109/tcyb.2019.2909480] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Person reidentification (Re-ID) aims to match observations of individuals across multiple nonoverlapping camera views. Recently, metric learning-based methods have played important roles in addressing this task. However, metrics are mostly learned in supervised manners, of which the performance relies heavily on the quantity and quality of manual annotations. Meanwhile, metric learning-based algorithms generally project person features into a common subspace, in which the extracted features are shared by all views. However, it may result in information loss since these algorithms neglect the view-specific features. Besides, they assume person samples of different views are taken from the same distribution. Conversely, these samples are more likely to obey different distributions due to view condition changes. To this end, this paper proposes an unsupervised cross-view metric learning method based on the properties of data distributions. Specifically, person samples in each view are taken from a mixture of two distributions: one models common prosperities among camera views and the other focuses on view-specific properties. Based on this, we introduce a shared mapping to explore the shared features. Meanwhile, we construct view-specific mappings to extract and project view-related features into a common subspace. As a result, samples in the transformed subspace follow the same distribution and are equipped with comprehensive representations. In this paper, these mappings are learned in an unsupervised manner by clustering samples in the projected space. Experimental results on five cross-view datasets validate the effectiveness of the proposed method.
Collapse
|
3
|
Wang J, Li Y, Zhang Y, Miao Z, Zhang R. A heterogeneous branch and multi-level classification network for person re-identification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
4
|
Chang Z, Qin Z, Fan H, Su H, Yang H, Zheng S, Ling H. Weighted bilinear coding over salient body parts for person re-identification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
5
|
Optimized Mahalanobis-Taguchi System for High-Dimensional Small Sample Data Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020; 2020:4609423. [PMID: 32405295 PMCID: PMC7199641 DOI: 10.1155/2020/4609423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Revised: 12/15/2019] [Accepted: 12/28/2019] [Indexed: 01/01/2023]
Abstract
The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms.
Collapse
|
6
|
Tang Y, Yang X, Wang N, Song B, Gao X. CGAN-TM: A novel domain-to-domain transferring method for person re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5641-5651. [PMID: 32286985 DOI: 10.1109/tip.2020.2985545] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (re-ID) is a technique aiming to recognize person cross different cameras. Although some supervised methods have achieved favorable performance, they are far from practical application owing to the lack of labeled data. Thus, unsupervised person re-ID methods are in urgent need. Generally, the commonly used approach in existing unsupervised methods is to first utilize the source image dataset for generating a model in supervised manner, and then transfer the source image domain to the target image domain. However, images may lose their identity information after translation, and the distributions between different domains are far away. To solve these problems, we propose an image domain-to-domain translation method by keeping pedestrian's identity information and pulling closer the domains' distributions for unsupervised person re-ID tasks. Our work exploits the CycleGAN to transfer the existing labeled image domain to the unlabeled image domain. Specially, a Self-labeled Triplet Net is proposed to maintain the pedestrian identity information, and maximum mean discrepancy is introduced to pull the domain distribution closer. Extensive experiments have been conducted and the results demonstrate that the proposed method performs superiorly than the state-ofthe- art unsupervised methods on DukeMTMC-reID and Market- 1501.
Collapse
|
7
|
Wu F, Jing XY, Dong X, Hu R, Yue D, Wang L, Ji YM, Wang R, Chen G. Intraspectrum Discrimination and Interspectrum Correlation Analysis Deep Network for Multispectral Face Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1009-1022. [PMID: 30418895 DOI: 10.1109/tcyb.2018.2876591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Multispectral images contain rich recognition information since the multispectral camera can reveal information that is not visible to the human eye or to the conventional RGB camera. Due to this characteristic of multispectral images, multispectral face recognition has attracted lots of research interest. Although some multispectral face recognition methods have been presented in the last decade, how to fully and effectively explore the intraspectrum discriminant information and the useful interspectrum correlation information in multispectral face images for recognition has not been well studied. To boost the performance of multispectral face recognition, we propose an intraspectrum discrimination and interspectrum correlation analysis deep network (IDICN) approach. Multiple spectra are divided into several spectrum-sets, with each containing a group of spectra within a small spectral range. The IDICN network contains a set of spectrum-set-specific deep convolutional neural networks attempting to extract spectrum-set-specific features, followed by a spectrum pooling layer, whose target is to select a group of spectra with favorable discriminative abilities adaptively. IDICN jointly learns the nonlinear representations of the selected spectra, such that the intraspectrum Fisher loss and the interspectrum discriminant correlation are minimized. Experiments on the well-known Hong Kong Polytechnic University, Carnegie Mellon University, and the University of Western Australia multispectral face datasets demonstrate the superior performance of the proposed approach over several state-of-the-art methods.
Collapse
|
8
|
Li H, Zhou W, Yu Z, Yang B, Jin H. Person re-identification with dictionary learning regularized by stretching regularization and label consistency constraint. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
Xie D, Deng C, Li C, Liu X, Tao D. Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:3626-3637. [PMID: 31940536 DOI: 10.1109/tip.2020.2963957] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Owing to the advantages of low storage cost and high query efficiency, cross-modal hashing has received increasing attention recently. As failing to bridge the inherent modality gap between modalities, most existing cross-modal hashing methods have limited capability to explore the semantic consistency information between different modality data, leading to unsatisfactory search performance. To address this problem, we propose a novel deep hashing method named Multi-Task Consistency- Preserving Adversarial Hashing (CPAH) to fully explore the semantic consistency and correlation between different modalities for efficient cross-modal retrieval. First, we design a consistency refined module (CR) to divide the representations of different modality into two irrelevant parts, i.e., modality-common and modality-private representations. Then, a multi-task adversarial learning module (MA) is presented, which can make the modality-common representation of different modalities close to each other on feature distribution and semantic consistency. Finally, the compact and powerful hash codes can be generated from modality-common representation. Comprehensive evaluations conducted on three representative cross-modal benchmark datasets illustrate our method is superior to the state-of-the-art cross-modal hashing methods.
Collapse
|
10
|
Li H, Xu J, Zhu J, Tao D, Yu Z. Top distance regularized projection and dictionary learning for person re-identification. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.06.046] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Song J, Guo Y, Gao L, Li X, Hanjalic A, Shen HT. From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3047-3058. [PMID: 30130235 DOI: 10.1109/tnnls.2018.2851077] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Video captioning, in essential, is a complex natural process, which is affected by various uncertainties stemming from video content, subjective judgment, and so on. In this paper, we build on the recent progress in using encoder-decoder framework for video captioning and address what we find to be a critical deficiency of the existing methods that most of the decoders propagate deterministic hidden states. Such complex uncertainty cannot be modeled efficiently by the deterministic models. In this paper, we propose a generative approach, referred to as multimodal stochastic recurrent neural networks (MS-RNNs), which models the uncertainty observed in the data using latent stochastic variables. Therefore, MS-RNN can improve the performance of video captioning and generate multiple sentences to describe a video considering different random factors. Specifically, a multimodal long short-term memory (LSTM) is first proposed to interact with both visual and textual features to capture a high-level representation. Then, a backward stochastic LSTM is proposed to support uncertainty propagation by introducing latent variables. Experimental results on the challenging data sets, microsoft video description and microsoft research video-to-text, show that our proposed MS-RNN approach outperforms the state-of-the-art video captioning benchmarks.
Collapse
|
12
|
ILRA: Novelty Detection in Face-Based Intervener Re-Identification. Symmetry (Basel) 2019. [DOI: 10.3390/sym11091154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Transparency laws facilitate citizens to monitor the activities of political representatives. In this sense, automatic or manual diarization of parliamentary sessions is required, the latter being time consuming. In the present work, this problem is addressed as a person re-identification problem. Re-identification is defined as the process of matching individuals under different camera views. This paper, in particular, deals with open world person re-identification scenarios, where the captured probe in one camera is not always present in the gallery collected in another one, i.e., determining whether the probe belongs to a novel identity or not. This procedure is mandatory before matching the identity. In most cases, novelty detection is tackled applying a threshold founded in a linear separation of the identities. We propose a threshold-less approach to solve the novelty detection problem, which is based on a one-class classifier and therefore it does not need any user defined threshold. Unlike other approaches that combine audio-visual features, an Isometric LogRatio transformation of a posteriori (ILRA) probabilities is applied to local and deep computed descriptors extracted from the face, which exhibits symmetry and can be exploited in the re-identification process unlike audio streams. These features are used to train the one-class classifier to detect the novelty of the individual. The proposal is evaluated in real parliamentary session recordings that exhibit challenging variations in terms of pose and location of the interveners. The experimental evaluation explores different configuration sets where our system achieves significant improvement on the given scenario, obtaining an average F measure of 71.29% for online analyzed videos. In addition, ILRA performs better than face descriptors used in recent face-based closed world recognition approaches, achieving an average improvement of 1.6% with respect to a deep descriptor.
Collapse
|
13
|
|
14
|
|
15
|
|
16
|
QRKISS: A Two-Stage Metric Learning via QR-Decomposition and KISS for Person Re-Identification. Neural Process Lett 2019. [DOI: 10.1007/s11063-018-9820-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
17
|
Hong Y, Yang H, Li L, Chen L, Liu C. A cascaded multitask network with deformable spatial transform on person search. INT J ADV ROBOT SYST 2019. [DOI: 10.1177/1729881419858162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This article introduces a cascaded multitask framework to improve the performance of person search by fully utilizing the combination of pedestrian detection and person re-identification tasks. Inspired by Faster R-CNN, a Pre-extracting Net is used in the front part of the framework to produce the low-level feature maps of a query or gallery. Then, a well-designed Pedestrian Proposal Network called Deformable Pedestrian Space Transformer is introduced with affine transformation combined by parameterized sampler as well as deformable pooling dealing with the challenge of spatial variance of person re-identification. At last, a Feature Sharing Net, which consists of a convolution net and a fully connected layer, is applied to produce output for both detection and re-identification. Moreover, we compare several loss functions including a specially designed Online Instance Matching loss and triplet loss, which supervise the training process. Experiments on three data sets including CUHK-SYSU, PRW and SJTU318 are implemented and the results show that our work outperforms existing frameworks.
Collapse
Affiliation(s)
- Yuan Hong
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Labs of Digital Media Processing and Communication, Shanghai Jiao Tong University, Shanghai, China
| | - Hua Yang
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Labs of Digital Media Processing and Communication, Shanghai Jiao Tong University, Shanghai, China
| | - Liangqi Li
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Labs of Digital Media Processing and Communication, Shanghai Jiao Tong University, Shanghai, China
| | - Lin Chen
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Labs of Digital Media Processing and Communication, Shanghai Jiao Tong University, Shanghai, China
| | - Chuang Liu
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Labs of Digital Media Processing and Communication, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
18
|
Re-KISSME: A robust resampling scheme for distance metric learning in the presence of label noise. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.11.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
19
|
Nguyen B, De Baets B. Kernel Distance Metric Learning Using Pairwise Constraints for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:589-600. [PMID: 30235128 DOI: 10.1109/tip.2018.2870941] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Person re-identification is a fundamental task in many computer vision and image understanding systems. Due to appearance variations from different camera views, person re-identification still poses an important challenge. In the literature, KISSME has already been introduced as an effective distance metric learning method using pairwise constraints to improve the re-identification performance. Computationally, it only requires two inverse covariance matrix estimations. However, the linear transformation induced by KISSME is not powerful enough for more complex problems. We show that KISSME can be kernelized, resulting in a nonlinear transformation, which is suitable for many real-world applications. Moreover, the proposed kernel method can be used for learning distance metrics from structured objects without having a vectorial representation. The effectiveness of our method is validated on five publicly available data sets. To further apply the proposed kernel method efficiently when data are collected sequentially, we introduce a fast incremental version that learns a dissimilarity function in the feature space without estimating the inverse covariance matrices. The experiments show that the latter variant can obtain competitive results in a computationally efficient manner.
Collapse
|
20
|
Visual saliency based on extended manifold ranking and third-order optimization refinement. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2018.09.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
21
|
Yu Z, Yu J, Xiang C, Fan J, Tao D. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5947-5959. [PMID: 29993847 DOI: 10.1109/tnnls.2018.2817340] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual question answering (VQA) is challenging, because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multimodal feature fusion that is able to capture the complex interactions between multimodal features; and 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a "coattention" mechanism is developed using a deep neural network (DNN) architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multimodal feature fusion, a generalized multimodal factorized high-order pooling approach (MFH) is developed to achieve more effective fusion of multimodal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the Kullback-Leibler divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A DNN architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA data sets and win the runner-up in VQA Challenge 2017.
Collapse
|
22
|
He K, Zhou D, Zhang X, Nie R. Multi-focus: Focused region finding and multi-scale transform for image fusion. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.09.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
Du B, Wang S, Xu C, Wang N, Zhang L, Tao D. Multi-Task Learning for Blind Source Separation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4219-4231. [PMID: 29870343 DOI: 10.1109/tip.2018.2836324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Blind source separation (BSS) aims to discover the underlying source signals from a set of linear mixture signals without any prior information of the mixing system, which is a fundamental problem in signal and image processing field. Most of the state-of-the-art algorithms have independently handled the decompositions of mixture signals. In this paper, we propose a new algorithm named multi-task sparse model to solve the BSS problem. Source signals are characterized via sparse techniques. Meanwhile, we regard the decomposition of each mixture signal as a task and employ the idea of multi-task learning to discover connections between tasks for the accuracy improvement of the source signal separation. Theoretical analyses on the optimization convergence and sample complexity of the proposed algorithm are provided. Experimental results based on extensive synthetic and real-world data demonstrate the necessity of exploiting connections between mixture signals and the effectiveness of the proposed algorithm.
Collapse
|
24
|
Deng C, Chen Z, Liu X, Gao X, Tao D. Triplet-Based Deep Hashing Network for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3893-3903. [PMID: 29993656 DOI: 10.1109/tip.2018.2821921] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Given the benefits of its low storage requirements and high retrieval efficiency, hashing has recently received increasing attention. In particular, cross-modal hashing has been widely and successfully used in multimedia similarity search applications. However, almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information, leading to unsatisfactory retrieval performance. In this paper, we propose a tripletbased deep hashing (TDH) network for cross-modal retrieval. First, we utilize the triplet labels, which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally, graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets.
Collapse
|
25
|
Wang D, Tan X. Robust Distance Metric Learning via Bayesian Inference. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:1542-1553. [PMID: 29990222 DOI: 10.1109/tip.2017.2782366] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Distance metric learning (DML) has achieved great success in many computer vision tasks. However, most existing DML algorithms are based on point estimation, and thus are sensitive to the choice of training examples and tend to be over-fitting in the presence of label noise. In this paper, we present a robust DML algorithm based on Bayesian inference. In particular, our method is essentially a Bayesian extension to a previous classic DML method-large margin nearest neighbor classification and we use stochastic variational inference to estimate the posterior distribution of the transformation matrix. Furthermore, we theoretically show that the proposed algorithm is robust against label noise in the sense that an arbitrary point with label noise has bounded influence on the learnt model. With some reasonable assumptions, we derive a generalization error bound of this method in the presence of label noise. We also show that the DML hypothesis class in which our model lies is probably approximately correct-learnable and give the sample complexity. The effectiveness of the proposed method1 is demonstrated with state of the art performance on three popular data sets with different types of label noise.1 A MATLAB implementation of this method is made available at http://parnec.nuaa.edu.cn/xtan/Publication.htm.
Collapse
|
26
|
Zhao C, Chen Y, Wang X, Wong WK, Miao D, Lei J. Kernelized random KISS metric learning for person re-identification. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.08.064] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
27
|
|
28
|
Yu L, Huang Z, Shen F, Song J, Shen HT, Zhou X. Bilinear Optimized Product Quantization for Scalable Visual Content Analysis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5057-5069. [PMID: 28682253 DOI: 10.1109/tip.2017.2722224] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Product quantization (PQ) has been recognized as a useful technique to encode visual feature vectors into compact codes to reduce both the storage and computation cost. Recent advances in retrieval and vision tasks indicate that high-dimensional descriptors are critical to ensuring high accuracy on large-scale data sets. However, optimizing PQ codes with high-dimensional data is extremely time-consuming and memory-consuming. To solve this problem, in this paper, we present a novel PQ method based on bilinear projection, which can well exploit the natural data structure and reduce the computational complexity. Specifically, we learn a global bilinear projection for PQ, where we provide both non-parametric and parametric solutions. The non-parametric solution does not need any data distribution assumption. The parametric solution can avoid the problem of local optima caused by random initialization, and enjoys a theoretical error bound. Besides, we further extend this approach by learning locally bilinear projections to fit underlying data distributions. We show by extensive experiments that our proposed method, dubbed bilinear optimization product quantization, achieves competitive retrieval and classification accuracies while having significant lower time and space complexities.
Collapse
|
29
|
Yang Y, Li Z, Wang W, Tao D. An adaptive semi-supervised clustering approach via multiple density-based information. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
30
|
Vanrell SR, Milone DH, Rufiner HL, Vanrell SR, Milone DH, Rufiner HL. Assessment of Homomorphic Analysis for Human Activity Recognition From Acceleration Signals. IEEE J Biomed Health Inform 2017; 22:1001-1010. [PMID: 28682268 DOI: 10.1109/jbhi.2017.2722870] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Unobtrusive activity monitoring can provide valuable information for medical and sports applications. In recent years, human activity recognition has moved to wearable sensors to deal with unconstrained scenarios. Accelerometers are the preferred sensors due to their simplicity and availability. Previous studies have examined several classic techniques for extracting features from acceleration signals, including time-domain, time-frequency, frequency-domain, and other heuristic features. Spectral and temporal features are the preferred ones and they are generally computed from acceleration components, leaving the acceleration magnitude potential unexplored. In this study, a new type of feature extraction stage, based on homomorphic analysis, is proposed in order to exploit discriminative activity information present in acceleration signals. Homomorphic analysis can isolate the information about whole body dynamics and translate it into a compact representation, called cepstral coefficients. Experiments have explored several configurations of the proposed features, including size of representation, signals to be used, and fusion with other features. Cepstral features computed from acceleration magnitude obtained one of the highest recognition rates. In addition, a beneficial contribution was found when time-domain and moving pace information was included in the feature vector. Overall, the proposed system achieved a recognition rate of 91.21% on the publicly available SCUT-NAA dataset. To the best of our knowledge, this is the highest recognition rate on this dataset.
Collapse
|
31
|
Ji X, Cheng J, Tao D, Wu X, Feng W. The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.01.035] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
32
|
Shen F, Zhou X, Yang Y, Song J, Shen HT, Tao D. A Fast Optimization Method for General Binary Code Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5610-5621. [PMID: 28113975 DOI: 10.1109/tip.2016.2612883] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision, and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this paper, we propose a novel binary code optimization method, dubbed discrete proximal linearized minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this paper by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised ℓ2 loss encodes the whole NUS-WIDE database into 64-b binary codes within 10 s on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale data sets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.
Collapse
|
33
|
Hong R, Hu Z, Wang R, Wang M, Tao D. Multi-View Object Retrieval via Multi-Scale Topic Models. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5814-5827. [PMID: 28114066 DOI: 10.1109/tip.2016.2614132] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The increasing number of 3D objects in various applications has increased the requirement for effective and efficient 3D object retrieval methods, which attracted extensive research efforts in recent years. Existing works mainly focus on how to extract features and conduct object matching. With the increasing applications, 3D objects come from different areas. In such circumstances, how to conduct object retrieval becomes more important. To address this issue, we propose a multi-view object retrieval method using multi-scale topic models in this paper. In our method, multiple views are first extracted from each object, and then the dense visual features are extracted to represent each view. To represent the 3D object, multi-scale topic models are employed to extract the hidden relationship among these features with respect to varied topic numbers in the topic model. In this way, each object can be represented by a set of bag of topics. To compare the objects, we first conduct topic clustering for the basic topics from two data sets, and then generate the common topic dictionary for new representation. Then, the two objects can be aligned to the same common feature space for comparison. To evaluate the performance of the proposed method, experiments are conducted on two data sets. The 3D object retrieval experimental results and comparison with existing methods demonstrate the effectiveness of the proposed method.
Collapse
|