1
|
Shen F, Xie Y, Zhu J, Zhu X, Zeng H. GiT: Graph Interactive Transformer for Vehicle Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1039-1051. [PMID: 37022078 DOI: 10.1109/tip.2023.3238642] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Transformers are more and more popular in computer vision, which treat an image as a sequence of patches and learn robust global features from the sequence. However, pure transformers are not entirely suitable for vehicle re-identification because vehicle re-identification requires both robust global features and discriminative local features. For that, a graph interactive transformer (GiT) is proposed in this paper. In the macro view, a list of GiT blocks are stacked to build a vehicle re-identification model, in where graphs are to extract discriminative local features within patches and transformers are to extract robust global features among patches. In the micro view, graphs and transformers are in an interactive status, bringing effective cooperation between local and global features. Specifically, one current graph is embedded after the former level's graph and transformer, while the current transform is embedded after the current graph and the former level's transformer. In addition to the interaction between graphs and transforms, the graph is a newly-designed local correction graph, which learns discriminative local features within a patch by exploring nodes' relationships. Extensive experiments on three large-scale vehicle re-identification datasets demonstrate that our GiT method is superior to state-of-the-art vehicle re-identification approaches.
Collapse
|
2
|
Han X, Yu X, Li G, Zhao J, Pan G, Ye Q, Jiao J, Han Z. Rethinking Sampling Strategies for Unsupervised Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:29-42. [PMID: 36459604 DOI: 10.1109/tip.2022.3224325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Unsupervised person re-identification (re-ID) remains a challenging task. While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role. We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. We suggest that deteriorated over-fitting is an important factor causing poor performance, and enhancing statistical stability can rectify this problem. Inspired by that, a simple yet effective approach is proposed, termed group sampling, which gathers samples from the same class into groups. The model is thereby trained using normalized group samples, which helps alleviate the negative impact of individual samples. Group sampling updates the pipeline of pseudo-label generation by guaranteeing that samples are more efficiently classified into the correct classes. It regulates the representation learning process, enhancing statistical stability for feature representation in a progressive fashion. Extensive experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods and outperforms the current techniques under purely camera-agnostic settings. Code has been available at https://github.com/ucas-vg/GroupSampling.
Collapse
|
3
|
Meng J, Zheng WS, Lai JH, Wang L. Deep Graph Metric Learning for Weakly Supervised Person Re-Identification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:6074-6093. [PMID: 34048336 DOI: 10.1109/tpami.2021.3084613] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In conventional person re-identification (re-id), the images used for model training in the training probe set and training gallery set are all assumed to be instance-level samples that are manually labeled from raw surveillance video (likely with the assistance of detection) in a frame-by-frame manner. This labeling across multiple non-overlapping camera views from raw video surveillance is expensive and time consuming. To overcome these issues, we consider a weakly supervised person re-id modeling that aims to find the raw video clips where a given target person appears. In our weakly supervised setting, during training, given a sample of a person captured in one camera view, our weakly supervised approach aims to train a re-id model without further instance-level labeling for this person in another camera view. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. The weakly supervised person re-id is challenging since it not only suffers from the difficulties occurring in conventional person re-id (e.g., visual ambiguity and appearance variations caused by occlusions, pose variations, background clutter, etc.), but more importantly, is also challenged by weakly supervised information because the instance-level labels and the ground-truth locations for person instances (i.e., the ground-truth bounding boxes of person instances) are absent. To solve the weakly supervised person re-id problem, we develop deep graph metric learning (DGML). On the one hand, DGML measures the consistency between intra-video spatial graphs of consecutive frames, where the spatial graph captures neighborhood relationship about the detected person instances in each frame. On the other hand, DGML distinguishes the inter-video spatial graphs captured from different camera views at different sites simultaneously. To further explicitly embed weak supervision into the DGML and solve the weakly supervised person re-id problem, we introduce weakly supervised regularization (WSR), which utilizes multiple weak video-level labels to learn discriminative features by means of a weak identity loss and a cross-video alignment loss. We conduct extensive experiments to demonstrate the feasibility of the weakly supervised person re-id approach and its special cases (e.g., its bag-to-bag extension) and show that the proposed DGML is effective.
Collapse
|
4
|
Zhou S, Wang J, Shu J, Meng D, Wang L, Zheng N. Multinetwork Collaborative Feature Learning for Semisupervised Person Reidentification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4826-4839. [PMID: 33729954 DOI: 10.1109/tnnls.2021.3061164] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Person reidentification (Re-ID) aims at matching images of the same identity captured from the disjoint camera views, which remains a very challenging problem due to the large cross-view appearance variations. In practice, the mainstream methods usually learn a discriminative feature representation using a deep neural network, which needs a large number of labeled samples in the training process. In this article, we design a simple yet effective multinetwork collaborative feature learning (MCFL) framework to alleviate the data annotation requirement for person Re-ID, which can confidently estimate the pseudolabels of unlabeled sample pairs and consistently learn the discriminative features of input images. To keep the precision of pseudolabels, we further build a novel self-paced collaborative regularizer to extensively exchange the weight information of unlabeled sample pairs between different networks. Once the pseudolabels are correctly estimated, we take the corresponding sample pairs into the training process, which is beneficial to learn more discriminative features for person Re-ID. Extensive experimental results on the Market1501, DukeMTMC, and CUHK03 data sets have shown that our method outperforms most of the state-of-the-art approaches.
Collapse
|
5
|
Wang W, Dang Z, Hu Y, Fua P, Salzmann M. Robust Differentiable SVD. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5472-5487. [PMID: 33844626 DOI: 10.1109/tpami.2021.3072422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.
Collapse
|
6
|
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SCH. Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2872-2893. [PMID: 33497329 DOI: 10.1109/tpami.2021.3054775] [Citation(s) in RCA: 176] [Impact Index Per Article: 58.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for four different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.
Collapse
|
7
|
An Adaptively Attention-Driven Cascade Part-Based Graph Embedding Framework for UAV Object Re-Identification. REMOTE SENSING 2022. [DOI: 10.3390/rs14061436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the rapid development of unmanned aerial vehicles (UAVs), object re-identification (Re-ID) based on the UAV platforms has attracted increasing attention, and several excellent achievements have been shown in the traditional scenarios. However, object Re-ID in aerial imagery acquired from the UAVs is still a challenging task, which is mainly due to the reason that variable locations and diverse viewpoints in UAVs platform are always resulting in more appearance ambiguities among the intra-objects and inter-objects. To address the above issues, in this paper, we proposed an adaptively attention-driven cascade part-based graph embedding framework (AAD-CPGE) for UAV object Re-ID. The AAD-CPGE aims to optimally fuse node features and their topological characteristics on the multi-scale structured graphs of parts-based objects, and then adaptively learn the most correlated information for improving the object Re-ID performance. Specifically, we first executed GCNs on the parts-based cascade node feature graphs and topological feature graphs for acquiring multi-scale structured-graph feature representations. After that, we designed a self-attention-based module for adaptive node and topological features fusion on the constructed hierarchical parts-based graphs. Finally, these learning hybrid graph-structured features with the most correlation discriminative capability were applied for object Re-ID. Several experimental verifications on three widely used UAVs-based benchmark datasets were carried out, and comparison with some state-of-the-art object Re-ID approaches validated the effectiveness and benefits of our proposed AAD-CPGE Re-ID framework.
Collapse
|
8
|
Liu J, Liu K, Jin F, Gong L. Toward robust and adaptive pedestrian monitoring using CSI: design, implementation, and evaluation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07094-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Ye M, Shen J, Zhang X, Yuen PC, Chang SF. Augmentation Invariant and Instance Spreading Feature for Softmax Embedding. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:924-939. [PMID: 32750841 DOI: 10.1109/tpami.2020.3013379] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep embedding learning plays a key role in learning discriminative feature representations, where the visually similar samples are pulled closer and dissimilar samples are pushed away in the low-dimensional embedding space. This paper studies the unsupervised embedding learning problem by learning such a representation without using any category labels. This task faces two primary challenges: mining reliable positive supervision from highly similar fine-grained classes, and generalizing to unseen testing categories. To approximate the positive concentration and negative separation properties in category-wise supervised learning, we introduce a data augmentation invariant and instance spreading feature using the instance-wise supervision. We also design two novel domain-agnostic augmentation strategies to further extend the supervision in feature space, which simulates the large batch training using a small batch size and the augmented features. To learn such a representation, we propose a novel instance-wise softmax embedding, which directly perform the optimization over the augmented instance features with the binary discrmination softmax encoding. It significantly accelerates the learning speed with much higher accuracy than existing methods, under both seen and unseen testing categories. The unsupervised embedding performs well even without pre-trained network over samples from fine-grained categories. We also develop a variant using category-wise supervision, namely category-wise softmax embedding, which achieves competitive performance over the state-of-of-the-arts, without using any auxiliary information or restrict sample mining.
Collapse
|
10
|
Ye M, Li H, Du B, Shen J, Shao L, Hoi SCH. Collaborative Refining for Person Re-Identification With Label Noise. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:379-391. [PMID: 34874857 DOI: 10.1109/tip.2021.3131937] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Existing person re-identification (Re-ID) methods usually rely heavily on large-scale thoroughly annotated training data. However, label noise is unavoidable due to inaccurate person detection results or annotation errors in real scenes. It is extremely challenging to learn a robust Re-ID model with label noise since each identity has very limited annotated training samples. To avoid fitting to the noisy labels, we propose to learn a prefatory model using a large learning rate at the early stage with a self-label refining strategy, in which the labels and network are jointly optimized. To further enhance the robustness, we introduce an online co-refining (CORE) framework with dynamic mutual learning, where networks and label predictions are online optimized collaboratively by distilling the knowledge from other peer networks. Moreover, it also reduces the negative impact of noisy labels using a favorable selective consistency strategy. CORE has two primary advantages: it is robust to different noise types and unknown noise ratios; it can be easily trained without much additional effort on the architecture design. Extensive experiments on Re-ID and image classification demonstrate that CORE outperforms its counterparts by a large margin under both practical and simulated noise settings. Notably, it also improves the state-of-the-art unsupervised Re-ID performance under standard settings. Code is available at https://github.com/mangye16/ReID-Label-Noise.
Collapse
|
11
|
Chen S, Zhong X, Wu S, Sun Z, Liu W, Jia X, Xia H. Memory-attended semantic context-aware network for video captioning. Soft comput 2021. [DOI: 10.1007/s00500-021-06360-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
12
|
Ning X, Gong K, Li W, Zhang L. JWSAA: Joint weak saliency and attention aware for person re-identification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.05.106] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
13
|
Stable Median Centre Clustering for Unsupervised Domain Adaptation Person Re-Identification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:2883559. [PMID: 34335711 PMCID: PMC8321743 DOI: 10.1155/2021/2883559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 07/10/2021] [Accepted: 07/13/2021] [Indexed: 11/25/2022]
Abstract
The current unsupervised domain adaptation person re-identification (re-ID) method aims to solve the domain shift problem and applies prior knowledge learned from labelled data in the source domain to unlabelled data in the target domain for person re-ID. At present, the unsupervised domain adaptation person re-ID method based on pseudolabels has obtained state-of-the-art performance. This method obtains pseudolabels via a clustering algorithm and uses these pseudolabels to optimize a CNN model. Although it achieves optimal performance, the model cannot be further optimized due to the existence of noisy labels in the clustering process. In this paper, we propose a stable median centre clustering (SMCC) for the unsupervised domain adaptation person re-ID method. SMCC adaptively mines credible samples for optimization purposes and reduces the impact of label noise and outliers on training to improve the performance of the resulting model. In particular, we use the intracluster distance confidence measure of the sample and its K-reciprocal nearest neighbour cluster proportion in the clustering process to select credible samples and assign different weights according to the intracluster sample distance confidence of samples to measure the distances between different clusters, thereby making the clustering results more robust. The experiments show that our SMCC method can select credible and stable samples for training and improve performance of the unsupervised domain adaptation model. Our code is available at https://github.com/sunburst792/SMCC-method/tree/master.
Collapse
|
14
|
Liang W, Wang G, Lai J, Xie X. Homogeneous-to-Heterogeneous: Unsupervised Learning for RGB-Infrared Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6392-6407. [PMID: 34197322 DOI: 10.1109/tip.2021.3092578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
RGB-Infrared (RGB-IR) cross-modality person re-identification (re-ID) is attracting more and more attention due to requirements for 24-h scene surveillance. However, the high cost of labeling person identities of an RGB-IR dataset largely limits the scalability of supervised models in real-world scenarios. In this paper, we study the unsupervised RGB-IR person re-ID problem (or briefly uRGB-IR re-ID) in which no identity annotations are available in RGB-IR cross-modality datasets. Considering that intra-modality (i.e., RGB-RGB or IR-IR) re-ID is much easier than cross-modality re-ID and can provide shared knowledge for RGB-IR re-ID, we propose a two-stage method to solve the uRGB-IR re-ID, namely homogeneous-to-heterogeneous learning. In the first stage, the unsupervised self-learning method is conducted to learn the intra-modality feature representation and to generate the pseudo-labeled identities of person images separately for each modality. In the second stage, heterogeneous learning is used to learn a shared discriminative feature representation by distilling the knowledge from intra-modality pseudo-labels, to align two modalities via a modality-based consistent learning module, and finally to target modality-invariant learning via a pseudo-labeled positive instance selection module. With the use of homogeneous-to-heterogeneous learning, the proposed unsupervised framework greatly reduces the modality gap and thus learns a robust feature representation against RGB and infrared modalities, leading to promising accuracy. We also propose a novel cross-modality re-ranking approach that includes a self-modality search and a cycle-modality search to tailor the uRGB-IR re-ID. Unlike conventional re-ranking, the proposed re-ranking method takes a modality-based constraint into re-ranking and thus can select more reliable nearest neighbors, which greatly improves uRGB-IR re-ID. The experimental results demonstrate the superiority of our approach on the SYSU-MM01 and RegDB datasets.
Collapse
|
15
|
Hsu HM, Cai J, Wang Y, Hwang JN, Kim KJ. Multi-Target Multi-Camera Tracking of Vehicles Using Metadata-Aided Re-ID and Trajectory-Based Camera Link Model. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5198-5210. [PMID: 33999821 DOI: 10.1109/tip.2021.3078124] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this paper, we propose a novel framework for multi-target multi-camera tracking (MTMCT) of vehicles based on metadata-aided re-identification (MA-ReID) and the trajectory-based camera link model (TCLM). Given a video sequence and the corresponding frame-by-frame vehicle detections, we first address the isolated tracklets issue from single camera tracking (SCT) by the proposed traffic-aware single-camera tracking (TSCT). Then, after automatically constructing the TCLM, we solve MTMCT by the MA-ReID. The TCLM is generated from camera topological configuration to obtain the spatial and temporal information to improve the performance of MTMCT by reducing the candidate search of ReID. We also use the temporal attention model to create more discriminative embeddings of trajectories from each camera to achieve robust distance measures for vehicle ReID. Moreover, we train a metadata classifier for MTMCT to obtain the metadata feature, which is concatenated with the temporal attention based embeddings. Finally, the TCLM and hierarchical clustering are jointly applied for global ID assignment. The proposed method is evaluated on the CityFlow dataset, achieving IDF1 76.77%, which outperforms the state-of-the-art MTMCT methods.
Collapse
|
16
|
Wang X, Liu M, Raychaudhuri DS, Paul S, Wang Y, Roy-Chowdhury AK. Learning Person Re-Identification Models From Videos With Weak Supervision. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3017-3028. [PMID: 33571092 DOI: 10.1109/tip.2021.3056223] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Most person re-identification methods, being supervised techniques, suffer from the burden of massive annotation requirement. Unsupervised methods overcome this need for labeled data, but perform poorly compared to the supervised alternatives. In order to cope with this issue, we introduce the problem of learning person re-identification models from videos with weak supervision. The weak nature of the supervision arises from the requirement of video-level labels, i.e. person identities who appear in the video, in contrast to the more precise frame-level annotations. Towards this goal, we propose a multiple instance attention learning framework for person re-identification using such video-level labels. Specifically, we first cast the video person re-identification task into a multiple instance learning setting, in which person images in a video are collected into a bag. The relations between videos with similar labels can be utilized to identify persons, on top of that, we introduce a co-person attention mechanism which mines the similarity correlations between videos with person identities in common. The attention weights are obtained based on all person images instead of person tracklets in a video, making our learned model less affected by noisy annotations. Extensive experiments demonstrate the superiority of the proposed method over the related methods on two weakly labeled person re-identification datasets.
Collapse
|
17
|
Feng H, Chen M, Hu J, Shen D, Liu H, Cai D. Complementary Pseudo Labels for Unsupervised Domain Adaptation On Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2898-2907. [PMID: 33556009 DOI: 10.1109/tip.2021.3056212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, supervised person re-identification (re-ID) models have received increasing studies. However, these models trained on the source domain always suffer dramatic performance drop when tested on an unseen domain. Existing methods are primary to use pseudo labels to alleviate this problem. One of the most successful approaches predicts neighbors of each unlabeled image and then uses them to train the model. Although the predicted neighbors are credible, they always miss some hard positive samples, which may hinder the model from discovering important discriminative information of the unlabeled domain. In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels. The group pseudo labels are generated by transitively merging neighbors of different samples into a group to achieve higher recall. However, the merging operation may cause subgroups in the group due to imperfect neighbor predictions. To utilize these group pseudo labels properly, we propose using a similarity-aggregating loss to mitigate the influence of these subgroups by pulling the input sample towards the most similar embeddings. Extensive experiments on three large-scale datasets demonstrate that our method can achieve state-of-the-art performance under the unsupervised domain adaptation re-ID setting.
Collapse
|
18
|
Wang L, Ding R, Zhai Y, Zhang Q, Tang W, Zheng N, Hua G. Giant Panda Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2837-2849. [PMID: 33539294 DOI: 10.1109/tip.2021.3055627] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The lack of automatic tools to identify giant panda makes it hard to keep track of and manage giant pandas in wildlife conservation missions. In this paper, we introduce a new Giant Panda Identification (GPID) task, which aims to identify each individual panda based on an image. Though related to the human re-identification and animal classification problem, GPID is extraordinarily challenging due to subtle visual differences between pandas and cluttered global information. In this paper, we propose a new benchmark dataset iPanda-50 for GPID. The iPanda-50 consists of 6, 874 images from 50 giant panda individuals, and is collected from panda streaming videos. We also introduce a new Feature-Fusion Network with Patch Detector (FFN-PD) for GPID. The proposed FFN-PD exploits the patch detector to detect discriminative local patches without using any part annotations or extra location sub-networks, and builds a hierarchical representation by fusing both global and local features to enhance the inter-layer patch feature interactions. Specifically, an attentional cross-channel pooling is embedded in the proposed FFN-PD to improve the identify-specific patch detectors. Experiments performed on the iPanda-50 datasets demonstrate the proposed FFN-PD significantly outperforms competing methods. Besides, experiments on other fine-grained recognition datasets (i.e., CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that the proposed FFN-PD outperforms existing state-of-the-art methods.
Collapse
|
19
|
Shen D, Zhao S, Hu J, Feng H, Cai D, He X. ES-Net: Erasing Salient Parts to Learn More in Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1676-1686. [PMID: 33382657 DOI: 10.1109/tip.2020.3046904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As an instance-level recognition problem, re-identification (re-ID) requires models to capture diverse features. However, with continuous training, re-ID models pay more and more attention to the salient areas. As a result, the model may only focus on few small regions with salient representations and ignore other important information. This phenomenon leads to inferior performance, especially when models are evaluated on small inter-identity variation data. In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image. ES-Net proposes a novel method to locate the salient areas by the confidence of objects and erases them efficiently in a training batch. Meanwhile, to mitigate the over-erasing problem, this paper uses a trainable pooling layer P-pooling that generalizes global max and global average pooling. Experiments are conducted on two specific re-identification tasks (i.e., Person re-ID, Vehicle re-ID). Our ES-Net outperforms state-of-the-art methods on three Person re-ID benchmarks and two Vehicle re-ID benchmarks. Specifically, mAP / Rank-1 rate: 88.6% / 95.7% on Market1501, 78.8% / 89.2% on DuckMTMC-reID, 57.3% / 80.9% on MSMT17, 81.9% / 97.0% on Veri-776, respectively. Rank-1 / Rank-5 rate: 83.6% / 96.9% on VehicleID (Small), 79.9% / 93.5% on VehicleID (Medium), 76.9% / 90.7% on VehicleID (Large), respectively. Moreover, the visualized salient areas show human-interpretable visual explanations for the ranking results.
Collapse
|
20
|
Kim D, Pathak S, Moro A, Yamashita A, Asama H. Self-supervised optical flow derotation network for rotation estimation of a spherical camera. Adv Robot 2020. [DOI: 10.1080/01691864.2020.1857305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Dabae Kim
- Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Sarthak Pathak
- Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Alessandro Moro
- Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Atsushi Yamashita
- Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Hajime Asama
- Department of Precision Engineering, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
21
|
Wang W, Pei W, Cao Q, Liu S, Lu G, Tai YW. Push for Center Learning via Orthogonalization and Subspace Masking for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:907-920. [PMID: 33259297 DOI: 10.1109/tip.2020.3036720] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Person re-identification aims to identify whether pairs of images belong to the same person or not. This problem is challenging due to large differences in camera views, lighting and background. One of the mainstream in learning CNN features is to design loss functions which reinforce both the class separation and intra-class compactness. In this paper, we propose a novel Orthogonal Center Learning method with Subspace Masking for person re-identification. We make the following contributions: 1) we develop a center learning module to learn the class centers by simultaneously reducing the intra-class differences and inter-class correlations by orthogonalization; 2) we introduce a subspace masking mechanism to enhance the generalization of the learned class centers; and 3) we propose to integrate the average pooling and max pooling in a regularizing manner that fully exploits their powers. Extensive experiments show that our proposed method consistently outperforms the state-of-the-art methods on large-scale ReID datasets including Market-1501, DukeMTMC-ReID, CUHK03 and MSMT17.
Collapse
|
22
|
Diallo B, Urruty T, Bourdon P, Fernandez-Maloigne C. Robust forgery detection for compressed images using CNN supervision. FORENSIC SCIENCE INTERNATIONAL: REPORTS 2020. [DOI: 10.1016/j.fsir.2020.100112] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
23
|
Zhang J, Niu L, Zhang L. Person Re-Identification With Reinforced Attribute Attention Selection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:603-616. [PMID: 33186114 DOI: 10.1109/tip.2020.3036762] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (Re-ID) aims to match pedestrian images across various scenes in video surveillance. There are a few works using attribute information to boost Re-ID performance. Specifically, those methods leverage attribute information to boost Re-ID performance by introducing auxiliary tasks like verifying the image level attribute information of two pedestrian images or recognizing identity level attributes. Identity level attribute annotations cost less manpower and are well-fitted for person re-identification task compared with image-level attribute annotations. However, the identity attribute information may be very noisy due to incorrect attribute annotation or lack of discriminativeness to distinguish different persons, which is probably unhelpful for the Re-ID task. In this paper, we propose a novel Attribute Attentional Block (AAB), which can be integrated into any backbone network or framework. Our AAB adopts reinforcement learning to drop noisy attributes based on our designed reward and then utilizes aggregated attribute attention of the remaining attributes to facilitate the Re-ID task. Experimental results demonstrate that our proposed method achieves state-of-the-art results on three benchmark datasets.
Collapse
|
24
|
Pang Y, Cao J, Li Y, Xie J, Sun H, Gong J. TJU-DHD: A Diverse High-Resolution Dataset for Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:207-219. [PMID: 33141669 DOI: 10.1109/tip.2020.3034487] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Vehicles, pedestrians, and riders are the most important and interesting objects for the perception modules of self-driving vehicles and video surveillance. However, the state-of-the-art performance of detecting such important objects (esp. small objects) is far from satisfying the demand of practical systems. Large-scale, rich-diversity, and high-resolution datasets play an important role in developing better object detection methods to satisfy the demand. Existing public large-scale datasets such as MS COCO collected from websites do not focus on the specific scenarios. Moreover, the popular datasets (e.g., KITTI and Citypersons) collected from the specific scenarios are limited in the number of images and instances, the resolution, and the diversity. To attempt to solve the problem, we build a diverse high-resolution dataset (called TJU-DHD). The dataset contains 115354 high-resolution images (52% images have a resolution of 1624×1200 pixels and 48% images have a resolution of at least 2, 560×1.440 pixels) and 709 330 labeled objects in total with a large variance in scale and appearance. Meanwhile, the dataset has a rich diversity in season variance, illumination variance, and weather variance. In addition, a new diverse pedestrian dataset is further built. With the four different detectors (i.e., the one-stage RetinaNet, anchor-free FCOS, two-stage FPN, and Cascade R-CNN), experiments about object detection and pedestrian detection are conducted. We hope that the newly built dataset can help promote the research on object detection and pedestrian detection in these two scenes. The dataset is available at https://github.com/tjubiit/TJU-DHD.
Collapse
|
25
|
Liu M, Qu L, Nie L, Liu M, Duan L, Chen B. Iterative Local-Global Collaboration Learning towards One-Shot Video Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9360-9372. [PMID: 33006929 DOI: 10.1109/tip.2020.3026625] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Video person re-identification (video Re-ID) plays an important role in surveillance video analysis and has gained increasing attention recently. However, existing supervised methods require vast labeled identities across cameras, resulting in poor scalability in practical applications. Although some unsupervised approaches have been exploited for video Re-ID, they are still in their infancy due to the complex nature of learning discriminative features on unlabelled data. In this paper, we focus on one-shot video Re-ID and present an iterative local-global collaboration learning approach to learning robust and discriminative person representations. Specifically, it jointly considers the global video information and local frame sequence information to better capture the diverse appearance of the person for feature learning and pseudo-label estimation. Moreover, as the cross-entropy loss may induce the model to focus on identity-irrelevant factors, we introduce the variational information bottleneck as a regularization term to train the model together. It can help filter undesirable information and characterize subtle differences among persons. Since accuracy cannot always be guaranteed for pseudo-labels, we adopt a dynamic selection strategy to select part of pseudo-labeled data with higher confidence to update the training set and re-train the learning model. During training, our method iteratively executes the feature learning, pseudo-label estimation, and dynamic sample selection until all the unlabeled data have been seen. Extensive experiments on two public datasets, i.e., DukeMTMC-VideoReID and MARS, have verified the superiority of our model to several cutting-edge competitors.
Collapse
|
26
|
|
27
|
Ye M, Lan X, Leng Q, Shen J. Cross-Modality Person Re-Identification via Modality-aware Collaborative Ensemble Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9387-9399. [PMID: 32746238 DOI: 10.1109/tip.2020.2998275] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Visible thermal person re-identification (VT-ReID) is a challenging cross-modality pedestrian retrieval problem due to the large intra-class variations and modality discrepancy across different cameras. Existing VT-ReID methods mainly focus on learning cross-modality sharable feature representations by handling the modality-discrepancy in feature level. However, the modality difference in classifier level has received much less attention, resulting in limited discriminability. In this paper, we propose a novel modality-aware collaborative ensemble (MACE) learning method with middle-level sharable two-stream network (MSTN) for VT-ReID, which handles the modality-discrepancy in both feature level and classifier level. In feature level, MSTN achieves much better performance than existing methods by capturing sharable discriminative middlelevel features in convolutional layers. In classifier level, we introduce both modality-specific and modality-sharable identity classifiers for two modalities to handle the modality discrepancy. To utilize the complementary information among different classifiers, we propose an ensemble learning scheme to incorporate the modality sharable classifier and the modality specific classifiers. In addition, we introduce a collaborative learning strategy, which regularizes modality-specific identity predictions and the ensemble outputs. Extensive experiments on two cross-modality datasets demonstrate that the proposed method outperforms current state-of-the-art by a large margin, achieving rank- 1/mAP accuracy 51.64%/50.11% on the SYSU-MM01 dataset, and 72.37%/69.09% on the RegDB dataset.
Collapse
|
28
|
|
29
|
Tang Y, Yang X, Wang N, Song B, Gao X. CGAN-TM: A novel domain-to-domain transferring method for person re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5641-5651. [PMID: 32286985 DOI: 10.1109/tip.2020.2985545] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (re-ID) is a technique aiming to recognize person cross different cameras. Although some supervised methods have achieved favorable performance, they are far from practical application owing to the lack of labeled data. Thus, unsupervised person re-ID methods are in urgent need. Generally, the commonly used approach in existing unsupervised methods is to first utilize the source image dataset for generating a model in supervised manner, and then transfer the source image domain to the target image domain. However, images may lose their identity information after translation, and the distributions between different domains are far away. To solve these problems, we propose an image domain-to-domain translation method by keeping pedestrian's identity information and pulling closer the domains' distributions for unsupervised person re-ID tasks. Our work exploits the CycleGAN to transfer the existing labeled image domain to the unlabeled image domain. Specially, a Self-labeled Triplet Net is proposed to maintain the pedestrian identity information, and maximum mean discrepancy is introduced to pull the domain distribution closer. Extensive experiments have been conducted and the results demonstrate that the proposed method performs superiorly than the state-ofthe- art unsupervised methods on DukeMTMC-reID and Market- 1501.
Collapse
|
30
|
Lin Y, Wu Y, Yan C, Xu M, Yang Y. Unsupervised Person Re-identification via Cross-camera Similarity Exploration. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5481-5490. [PMID: 32248102 DOI: 10.1109/tip.2020.2982826] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Most person re-identification (re-ID) approaches are based on supervised learning, which requires manually annotated data. However, it is not only resource-intensive to acquire identity annotation but also impractical for large-scale data. To relieve this problem, we propose a cross-camera unsupervised approach that makes use of unsupervised style-transferred images to jointly optimize a convolutional neural network (CNN) and the relationship among the individual samples for person re-ID. Our algorithm considers two fundamental facts in the re- ID task, i.e., variance across diverse cameras and similarity within the same identity. In this paper, we propose an iterative framework which overcomes the camera variance and achieves across-camera similarity exploration. Specifically, we apply an unsupervised style transfer model to generate style-transferred training images with different camera styles. Then we iteratively exploit the similarity within the same identity from both the original and the style-transferred data. We start with considering each training image as a different class to initialize the Convolutional Neural Network (CNN) model. Then we measure the similarity and gradually group similar samples into one class, which increases similarity within each identity. We also introduce a diversity regularization term in the clustering to balance the cluster distribution. The experimental results demonstrate that our algorithm is not only superior to state-of-the-art unsupervised re-ID approaches, but also performs favorably compared with other competing unsupervised domain adaptation methods (UDA) and semi-supervised learning methods.
Collapse
|
31
|
Ruan W, Liang C, Yu Y, Chen J, Hu R. SIST: Online Scale-Adaptive Object tracking with Stepwise Insight. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
32
|
Instance Hard Triplet Loss for In-video Person Re-identification. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10062198] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Traditional Person Re-identification (ReID) methods mainly focus on cross-camera scenarios, while identifying a person in the same video/camera from adjacent subsequent frames is also an important question, for example, in human tracking and pose tracking. We try to address this unexplored in-video ReID problem with a new large-scale video-based ReID dataset called PoseTrack-ReID with full images available and a new network structure called ReID-Head, which can extract multi-person features efficiently in real time and can be integrated with both one-stage and two-stage human or pose detectors. A new loss function is also required to solve this new in-video problem. Hence, a triplet-based loss function with an online hard example mining designed to distinguish persons in the same video/group is proposed, called instance hard triplet loss, which can be applied in both cross-camera ReID and in-video ReID. Compared with the widely-used batch hard triplet loss, our proposed loss achieves competitive performance and saves more than 30% of the training time. We also propose an automatic reciprocal identity association method, so we can train our model in an unsupervised way, which further extends the potential applications of in-video ReID. The PoseTrack-ReID dataset and code will be publicly released.
Collapse
|
33
|
|
34
|
Gao C, Wang J, Liu L, Yu JG, Sang N. Superpixel-Based Temporally Aligned Representation for Video-Based Person Re-Identification. SENSORS 2019; 19:s19183861. [PMID: 31500196 PMCID: PMC6766808 DOI: 10.3390/s19183861] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 09/03/2019] [Accepted: 09/03/2019] [Indexed: 11/29/2022]
Abstract
Most existing person re-identification methods focus on matching still person images across non-overlapping camera views. Despite their excellent performance in some circumstances, these methods still suffer from occlusion and the changes of pose, viewpoint or lighting. Video-based re-id is a natural way to overcome these problems, by exploiting space–time information from videos. One of the most challenging problems in video-based person re-identification is temporal alignment, in addition to spatial alignment. To address the problem, we propose an effective superpixel-based temporally aligned representation for video-based person re-identification, which represents a video sequence only using one walking cycle. Particularly, we first build a candidate set of walking cycles by extracting motion information at superpixel level, which is more robust than that at the pixel level. Then, from the candidate set, we propose an effective criterion to select the walking cycle most matching the intrinsic periodicity property of walking persons. Finally, we propose a temporally aligned pooling scheme to describe the video data in the selected walking cycle. In addition, to characterize the individual still images in the cycle, we propose a superpixel-based representation to improve spatial alignment. Extensive experimental results on three public datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Changxin Gao
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Jin Wang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Leyuan Liu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079, China.
| | - Jin-Gang Yu
- School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China.
| | - Nong Sang
- Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|