1
|
Tan H, Xu K, Tao P, Liu X. Adversarial perturbation and defense for generalizable person re-identification. Neural Netw 2025; 186:107287. [PMID: 40010295 DOI: 10.1016/j.neunet.2025.107287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 12/10/2024] [Accepted: 02/13/2025] [Indexed: 02/28/2025]
Abstract
In the Domain Generalizable Person Re-Identification (DG Re-ID) task, the quality of identity-relevant descriptor is crucial for domain generalization performance. However, for hard-matching samples, it is difficult to separate high-quality identity-relevant feature from identity-irrelevant feature. It will inevitably affect the domain generalization performance. Thus, in this paper, we try to enhance the model's ability to separate identity-relevant feature from identity-irrelevant feature of hard matching samples, to achieve high-performance domain generalization. To this end, we propose an Adversarial Perturbation and Defense (APD) Re-identification Method. In the APD, to synthesize hard matching samples, we introduce a Metric-Perturbation Generation Network (MPG-Net) grounded in the concept of metric adversariality. In the MPG-Net, we try to perturb the metric relationship of samples in the latent space, while preserving the essential visual details of the original samples. Then, to capture high-quality identity-relevant feature, we propose a Semantic Purification Network (SP-Net). The hard matching samples synthesized by MPG-Net is used to train the SP-Net. In the SP-Net, we further design the Semantic Self-perturbation and Defense (SSD) Scheme, to better disentangle and purify identity-relevant feature from these hard matching samples. Above all, through extensive experimentation, we validate the effectiveness of the APD method in the DG Re-ID task.
Collapse
Affiliation(s)
- Hongchen Tan
- Institute of Future Technology, Dalian University of Technology, Dalian, Dalian 116024, China.
| | - Kaiqiang Xu
- School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China.
| | | | - Xiuping Liu
- School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
2
|
Li S, Li F, Li J, Li H, Zhang B, Tao D, Gao X. Logical Relation Inference and Multiview Information Interaction for Domain Adaptation Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14770-14782. [PMID: 37307174 DOI: 10.1109/tnnls.2023.3281504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Domain adaptation person re-identification (Re-ID) is a challenging task, which aims to transfer the knowledge learned from the labeled source domain to the unlabeled target domain. Recently, some clustering-based domain adaptation Re-ID methods have achieved great success. However, these methods ignore the inferior influence on pseudo-label prediction due to the different camera styles. The reliability of the pseudo-label plays a key role in domain adaptation Re-ID, while the different camera styles bring great challenges for pseudo-label prediction. To this end, a novel method is proposed, which bridges the gap of different cameras and extracts more discriminative features from an image. Specifically, an intra-to-intermechanism is introduced, in which samples from their own cameras are first grouped and then aligned at the class level across different cameras followed by our logical relation inference (LRI). Thanks to these strategies, the logical relationship between simple classes and hard classes is justified, preventing sample loss caused by discarding the hard samples. Furthermore, we also present a multiview information interaction (MvII) module that takes features of different images from the same pedestrian as patch tokens, obtaining the global consistency of a pedestrian that contributes to the discriminative feature extraction. Unlike the existing clustering-based methods, our method employs a two-stage framework that generates reliable pseudo-labels from the views of the intracamera and intercamera, respectively, to differentiate the camera styles, subsequently increasing its robustness. Extensive experiments on several benchmark datasets show that the proposed method outperforms a wide range of state-of-the-art methods. The source code has been released at https://github.com/lhf12278/LRIMV.
Collapse
|
3
|
Li Y, Liu Y, Zhang H, Zhao C, Wei Z, Miao D. Occlusion-Aware Transformer With Second-Order Attention for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3200-3211. [PMID: 38687652 DOI: 10.1109/tip.2024.3393360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Person re-identification (ReID) typically encounters varying degrees of occlusion in real-world scenarios. While previous methods have addressed this using handcrafted partitions or external cues, they often compromise semantic information or increase network complexity. In this paper, we propose a new method from a novel perspective, termed as OAT. Specifically, we first use a Transformer backbone with multiple class tokens for diverse pedestrian feature learning. Given that the self-attention mechanism in the Transformer solely focuses on low-level feature correlations, neglecting higher-order relations among different body parts or regions. Thus, we propose the Second-Order Attention (SOA) module to capture more comprehensive features. To address computational efficiency, we further derive approximation formulations for implementing second-order attention. Observing that the importance of semantics associated with different class tokens varies due to the uncertainty of the location and size of occlusion, we propose the Entropy Guided Fusion (EGF) module for multiple class tokens. By conducting uncertainty analysis on each class token, higher weights are assigned to those with lower information entropy, while lower weights are assigned to class tokens with higher entropy. The dynamic weight adjustment can mitigate the impact of occlusion-induced uncertainty on feature learning, thereby facilitating the acquisition of discriminative class token representations. Extensive experiments have been conducted on occluded and holistic person re-identification datasets, which demonstrate the effectiveness of our proposed method.
Collapse
|
4
|
Han X, Yu X, Li G, Zhao J, Pan G, Ye Q, Jiao J, Han Z. Rethinking Sampling Strategies for Unsupervised Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:29-42. [PMID: 36459604 DOI: 10.1109/tip.2022.3224325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Unsupervised person re-identification (re-ID) remains a challenging task. While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role. We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function. We suggest that deteriorated over-fitting is an important factor causing poor performance, and enhancing statistical stability can rectify this problem. Inspired by that, a simple yet effective approach is proposed, termed group sampling, which gathers samples from the same class into groups. The model is thereby trained using normalized group samples, which helps alleviate the negative impact of individual samples. Group sampling updates the pipeline of pseudo-label generation by guaranteeing that samples are more efficiently classified into the correct classes. It regulates the representation learning process, enhancing statistical stability for feature representation in a progressive fashion. Extensive experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods and outperforms the current techniques under purely camera-agnostic settings. Code has been available at https://github.com/ucas-vg/GroupSampling.
Collapse
|
5
|
Miao J, Wu Y, Yang Y. Identifying Visible Parts via Pose Estimation for Occluded Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4624-4634. [PMID: 33651698 DOI: 10.1109/tnnls.2021.3059515] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We focus on the occlusion problem in person re-identification (re-id), which is one of the main challenges in real-world person retrieval scenarios. Previous methods on the occluded re-id problem usually assume that only the probes are occluded, thereby removing occlusions by manually cropping. However, this may not always hold in practice. This article relaxes this assumption and investigates a more general occlusion problem, where both the probe and gallery images could be occluded. The key to this challenging problem is depressing the noise information by identifying bodies and occlusions. We propose to incorporate the pose information into the re-id framework, which benefits the model in three aspects. First, it provides the location of the body. We then design a Pose-Masked Feature Branch to make our model focus on the body region only and filter those noise features brought by occlusions. Second, the estimated pose reveals which body parts are visible, giving us a hint to construct more informative person features. We propose a Pose-Embedded Feature Branch to adaptively re-calibrate channel-wise feature responses based on the visible body parts. Third, in testing, the estimated pose indicates which regions are informative and reliable for both probe and gallery images. Then we explicitly split the extracted spatial feature into parts. Only part features from those commonly visible parts are utilized in the retrieval. To better evaluate the performances of the occluded re-id, we also propose a large-scale data set for the occluded re-id with more than 35 000 images, namely Occluded-DukeMTMC. Extensive experiments show our approach surpasses previous methods on the occluded, partial, and non-occluded re-id data sets.
Collapse
|
6
|
Pan X, Luo H, Jiang W, Zhang J, Gu J, Li P. SFGN: Representing the sequence with one super frame for video person re-identification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
Wu L, Liu D, Zhang W, Chen D, Ge Z, Boussaid F, Bennamoun M, Shen J. Pseudo-Pair Based Self-Similarity Learning for Unsupervised Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4803-4816. [PMID: 35830405 DOI: 10.1109/tip.2022.3186746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations. Unlike conventional unsupervised re-ID methods that use pseudo labels based on global clustering, we construct patch surrogate classes as initial supervision, and propose to assign pseudo labels to images through the pairwise gradient-guided similarity separation. This can cluster images in pseudo pairs, and the pseudos can be updated during training. Based on pseudo pairs, we propose to improve the generalization of similarity function via a novel self-similarity learning:it learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity. The intra-similarity learning is based on channel attention to detect diverse local features from an image. The inter-similarity learning employs a deformable convolution with a non-local block to align patches for cross-image similarity. Experimental results on several re-ID benchmark datasets demonstrate the superiority of the proposed method over the state-of-the-arts.
Collapse
|
8
|
Domain adaptive person re-identification with memory-based circular ranking. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03602-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
9
|
Jia X, Zhong X, Ye M, Liu W, Huang W. Complementary Data Augmentation for Cloth-Changing Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4227-4239. [PMID: 35727784 DOI: 10.1109/tip.2022.3183469] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
This paper studies the challenging person re-identification (Re-ID) task under the cloth-changing scenario, where the same identity (ID) suffers from uncertain cloth changes. To learn cloth- and ID-invariant features, it is crucial to collect abundant training data with varying clothes, which is difficult in practice. To alleviate the reliance on rich data collection, we reinforce the feature learning process by designing powerful complementary data augmentation strategies, including positive and negative data augmentation. Specifically, the positive augmentation fulfills the ID space by randomly patching the person images with different clothes, simulating rich appearance to enhance the robustness against clothes variations. For negative augmentation, its basic idea is to randomly generate out-of-distribution synthetic samples by combining various appearance and posture factors from real samples. The designed strategies seamlessly reinforce the feature learning without additional information introduction. Extensive experiments conducted on both cloth-changing and -unchanging tasks demonstrate the superiority of our proposed method, consistently improving the accuracy over various baselines.
Collapse
|
10
|
Liu T, Lin Y, Du B. Unsupervised Person Re-Identification With Stochastic Training Strategy. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4240-4250. [PMID: 35724288 DOI: 10.1109/tip.2022.3181811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Unsupervised person re-identification (re-ID) has attracted increasing research interests because of its scalability and possibility for real-world applications. State-of-the-art unsupervised re-ID methods usually follow a clustering-based strategy, which generates pseudo labels by clustering and maintains a memory to store instance features and represent the centroid of the clusters for contrastive learning. This approach suffers two problems. First, the centroid generated by unsupervised learning may not be a perfect prototype. Forcing images to get closer to the centroid emphasizes the result of clustering, which could accumulate clustering errors during iterations. Second, previous instance memory based methods utilize features updated at different training iterations to represent one centroid, these features are inconsistent due to the change of encoder. To this end, we propose an unsupervised re-ID approach with a stochastic learning strategy. Specifically, we adopt a stochastic updated memory, where a random instance from a cluster is used to update the cluster-level memory for contrastive learning. In this way, the relationship between randomly selected pair of images are learned to avoid the training bias caused by unreliable pseudo labels. By picking a sole last seen sample to directly update each cluster center, the stochastic memory is also always up-to-date for classifying to keep the consistency. Besides, to relieve the issue of camera variance, a unified distance matrix is proposed during clustering, where the distance bias from different camera domains is reduced and the variances of identities are emphasized. Our proposed method outperforms the state-of-the-arts in all the common unsupervised and UDA re-ID tasks. The code will be available at https://github.com/lithium770/Unsupervised-Person-re-ID-with-Stochastic-Training-Strategy.
Collapse
|
11
|
Ge Y, Zhu F, Chen D, Zhao R, Wang X, Li H. Structured Domain Adaptation With Online Relation Regularization for Unsupervised Person Re-ID. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:258-271. [PMID: 35584071 DOI: 10.1109/tnnls.2022.3173489] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Unsupervised domain adaptation (UDA) aims at adapting the model trained on a labeled source-domain dataset to an unlabeled target-domain dataset. The task of UDA on open-set person reidentification (re-ID) is even more challenging as the identities (classes) do not have overlap between the two domains. One major research direction was based on domain translation, which, however, has fallen out of favor in recent years due to inferior performance compared with pseudo-label-based methods. We argue that domain translation has great potential on exploiting valuable source-domain data but the existing methods did not provide proper regularization on the translation process. Specifically, previous methods only focus on maintaining the identities of the translated images while ignoring the intersample relations during translation. To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term. During training, the person feature encoder is optimized to model intersample relations on-the-fly for supervising relation-consistency domain translation, which in turn improves the encoder with informative translated images. The encoder can be further improved with pseudo labels, where the source-to-target translated images with ground-truth identities and target-domain images with pseudo identities are jointly used for training. In the experiments, our proposed framework is shown to achieve state-of-the-art performance on multiple UDA tasks of person re-ID. With the synthetic→real translated images from our structured domain-translation network, we achieved second place in the Visual Domain Adaptation Challenge (VisDA) in 2020.
Collapse
|
12
|
Zheng X, Gong T, Lu X, Li X. Human action recognition by multiple spatial clues network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Guo Y, Feng F, Hao X, Chen X. JAC-Net: Joint learning with adaptive exploration and concise attention for unsupervised domain adaptive person re-identification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Yu J, Oh H. Graph-structure based multi-label prediction and classification for unsupervised person re-identification. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
15
|
Khan SU, Haq IU, Khan N, Muhammad K, Hijji M, Baik SW. Learning to rank: An intelligent system for person reidentification. INT J INTELL SYST 2022. [DOI: 10.1002/int.22820] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
| | | | | | - Khan Muhammad
- Visual Analytics for Knowledge Laboratory (VIS2KNOW Lab), School of Convergence, College of Computing and Informatics Sungkyunkwan University Seoul South Korea
| | - Mohammad Hijji
- Faculty of Computers & Information Technology University of Tabuk Tabuk Saudi Arabia
| | | |
Collapse
|
16
|
Zhang Q, Lai J, Feng Z, Xie X. Seeing Like a Human: Asynchronous Learning With Dynamic Progressive Refinement for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:352-365. [PMID: 34807829 DOI: 10.1109/tip.2021.3128330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Learning discriminative and rich features is an important research task for person re-identification. Previous studies have attempted to capture global and local features at the same time and layer of the model in a non-interactive manner, which are called synchronous learning. However, synchronous learning leads to high similarity, and further defects in model performance. To this end, we propose asynchronous learning based on the human visual perception mechanism. Asynchronous learning emphasizes the time asynchrony and space asynchrony of feature learning and achieves mutual promotion and cyclical interaction for feature learning. Furthermore, we design a dynamic progressive refinement module to improve local features with the guidance of global features. The dynamic property allows this module to adaptively adjust the network parameters according to the input image, in both the training and testing stage. The progressive property narrows the semantic gap between the global and local features, which is due to the guidance of global features. Finally, we have conducted several experiments on four datasets, including Market1501, CUHK03, DukeMTMC-ReID, and MSMT17. The experimental results show that asynchronous learning can effectively improve feature discrimination and achieve strong performance.
Collapse
|
17
|
LABNet: Local graph aggregation network with class balanced loss for vehicle re-identification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
18
|
Xie K, Wu Y, Xiao J, Li J, Xiao G, Cao Y. Unsupervised person re-identification via K-reciprocal encoding and style transfer. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01376-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Yaghoubi E, Borza D, Aruna Kumar S, Proença H. Person re-identification: Implicitly defining the receptive fields of deep learning classification frameworks. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.01.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
Learning domain invariant and specific representation for cross-domain person re-identification. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02107-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
21
|
Ainam JP, Qin K, Owusu JW, Lu G. Unsupervised domain adaptation for person re-identification with iterative soft clustering. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106644] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|