1
|
Zhu K, Guo H, Zhang S, Wang Y, Liu J, Wang J, Tang M. AAformer: Auto-Aligned Transformer for Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17307-17317. [PMID: 37624720 DOI: 10.1109/tnnls.2023.3301856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Abstract
In person re-identification (re-ID), extracting part-level features from person images has been verified to be crucial to offer fine-grained information. Most of the existing CNN-based methods only locate the human parts coarsely, or rely on pretrained human parsing models and fail in locating the identifiable nonhuman parts (e.g., knapsack). In this article, we introduce an alignment scheme in transformer architecture for the first time and propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level. We introduce the "Part tokens ([PART]s)," which are learnable vectors, to extract part features in the transformer. A [PART] only interacts with a local subset of patches in self-attention and learns to be the part representation. To adaptively group the image patches into different subsets, we design the auto-alignment. Auto-alignment employs a fast variant of optimal transport (OT) algorithm to online cluster the patch embeddings into several groups with the [PART]s as their prototypes. AAformer integrates the part alignment into the self-attention and the output [PART]s can be directly used as part features for retrieval. Extensive experiments validate the effectiveness of [PART]s and the superiority of AAformer over various state-of-the-art methods.
Collapse
|
2
|
Lian Y, Huang W, Liu S, Guo P, Zhang Z, Durrani TS. Person Re-Identification Using Local Relation-Aware Graph Convolutional Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:8138. [PMID: 37836968 PMCID: PMC10575217 DOI: 10.3390/s23198138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023]
Abstract
Local feature extractions have been verified to be effective for person re-identification (re-ID) in recent literature. However, existing methods usually rely on extracting local features from single part of a pedestrian while neglecting the relationship of local features among different pedestrian images. As a result, local features contain limited information from one pedestrian image, and cannot benefit from other pedestrian images. In this paper, we propose a novel approach named Local Relation-Aware Graph Convolutional Network (LRGCN) to learn the relationship of local features among different pedestrian images. In order to completely describe the relationship of local features among different pedestrian images, we propose overlap graph and similarity graph. The overlap graph formulates the edge weight as the overlap node number in the node's neighborhoods so as to learn robust local features, and the similarity graph defines the edge weight as the similarity between the nodes to learn discriminative local features. To propagate the information for different kinds of nodes effectively, we propose the Structural Graph Convolution (SGConv) operation. Different from traditional graph convolution operations where all nodes share the same parameter matrix, SGConv learns different parameter matrices for the node itself and its neighbor nodes to improve the expressive power. We conduct comprehensive experiments to verify our method on four large-scale person re-ID databases, and the overall results show LRGCN exceeds the state-of-the-art methods.
Collapse
Affiliation(s)
- Yu Lian
- Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin 300387, China
| | - Wenmin Huang
- Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin 300387, China
| | - Shuang Liu
- Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin 300387, China
| | - Peng Guo
- CATARC (Tianjin) Automotive Engineering Research Institute Co., Ltd., Tianjin 300300, China
| | - Zhong Zhang
- Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin 300387, China
| | - Tariq S. Durrani
- Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow G1 1QE, UK
| |
Collapse
|
3
|
Huang M, Hou C, Yang Q, Wang Z. Reasoning and Tuning: Graph Attention Network for Occluded Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1568-1582. [PMID: 37027759 DOI: 10.1109/tip.2023.3247159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Occluded person re-identification (re-id) aims to match occluded person images to holistic ones. Most existing works focus on matching collective-visible body parts by discarding the occluded parts. However, only preserving the collective-visible body parts causes great semantic loss for occluded images, decreasing the confidence of feature matching. On the other hand, we observe that the holistic images can provide the missing semantic information for occluded images of the same identity. Thus, compensating the occluded image with its holistic counterpart has the potential for alleviating the above limitation. In this paper, we propose a novel Reasoning and Tuning Graph Attention Network (RTGAT), which learns complete person representations of occluded images by jointly reasoning the visibility of body parts and compensating the occluded parts for the semantic loss. Specifically, we self-mine the semantic correlation between part features and the global feature to reason the visibility scores of body parts. Then we introduce the visibility scores as the graph attention, which guides Graph Convolutional Network (GCN) to fuzzily suppress the noise of occluded part features and propagate the missing semantic information from the holistic image to the occluded image. We finally learn complete person representations of occluded images for effective feature matching. Experimental results on occluded benchmarks demonstrate the superiority of our method.
Collapse
|
4
|
Xu F, Ma B, Chang H, Shan S. PRDP: Person Reidentification With Dirty and Poor Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11014-11026. [PMID: 34473639 DOI: 10.1109/tcyb.2021.3105970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we propose a novel method to simultaneously solve the data problem of dirty quality and poor quantity for person reidentification (ReID). Dirty quality refers to the wrong labels in image annotations. Poor quantity means that some identities have very few images (FewIDs). Training with these mislabeled data or FewIDs with triplet loss will lead to low generalization performance. To solve the label error problem, we propose a weighted label correction based on cross-entropy (wLCCE) strategy. Specifically, according to the influence range of the wrong labels, we first classify the mislabeled images into point label error and set label error. Then, we propose a weighted triplet loss (WTL) to correct the two label errors, respectively. To alleviate the poor quantity issue, we propose a feature simulation based on autoencoder (FSAE) method to generate some virtual samples for FewID. For the authenticity of the simulated features, we transfer the difference pattern of identities with multiple images (MultIDs) to FewIDs by training an autoencoder (AE)-based simulator. In this way, the FewIDs obtain richer expressions to distinguish from other identities. By dealing with a dirty and poor data problem, we can learn more robust ReID models using the triplet loss. We conduct extensive experiments on two public person ReID datasets: 1) Market-1501 and 2) DukeMTMC-reID, to verify the effectiveness of our approach.
Collapse
|
5
|
Yu F, Jiang X, Gong Y, Zheng WS, Zheng F, Sun X. Conditional Feature Embedding by Visual Clue Correspondence Graph for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6188-6199. [PMID: 36126030 DOI: 10.1109/tip.2022.3206617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although Person Re-Identification has made impressive progress, difficult cases like occlusion, change of view-point, and similar clothing still bring great challenges. In order to tackle these challenges, extracting discriminative feature representation is crucial. Most of the existing methods focus on extracting ReID features from individual images separately. However, when matching two images, we propose that the ReID features of a query image should be dynamically adjusted based on the contextual information from the gallery image it matches. We call this type of ReID features conditional feature embedding. In this paper, we propose a novel ReID framework that extracts conditional feature embedding based on the aligned visual clues between image pairs, called Clue Alignment based Conditional Embedding (CACE-Net). CACE-Net applies an attention module to build a detailed correspondence graph between crucial visual clues in image pairs and uses discrepancy-based GCN to embed the obtained complex correspondence information into the conditional features. The experiments show that CACE-Net achieves state-of-the-art performance on three public datasets.
Collapse
|
6
|
Ding C, Wang K, Wang P, Tao D. Multi-Task Learning With Coarse Priors for Robust Part-Aware Person Re-Identification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:1474-1488. [PMID: 32946381 DOI: 10.1109/tpami.2020.3024900] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Part-level representations are important for robust person re-identification (ReID), but in practice feature quality suffers due to the body part misalignment problem. In this paper, we present a robust, compact, and easy-to-use method called the Multi-task Part-aware Network (MPN), which is designed to extract semantically aligned part-level features from pedestrian images. MPN solves the body part misalignment problem via multi-task learning (MTL) in the training stage. More specifically, it builds one main task (MT) and one auxiliary task (AT) for each body part on the top of the same backbone model. The ATs are equipped with a coarse prior of the body part locations for training images. ATs then transfer the concept of the body parts to the MTs via optimizing the MT parameters to identify part-relevant channels from the backbone model. Concept transfer is accomplished by means of two novel alignment strategies: namely, parameter space alignment via hard parameter sharing and feature space alignment in a class-wise manner. With the aid of the learned high-quality parameters, MTs can independently extract semantically aligned part-level features from relevant channels in the testing stage. MPN has three key advantages: 1) it does not need to conduct body part detection in the inference stage; 2) its model is very compact and efficient for both training and testing; 3) in the training stage, it requires only coarse priors of body part locations, which are easy to obtain. Systematic experiments on four large-scale ReID databases demonstrate that MPN consistently outperforms state-of-the-art approaches by significant margins.
Collapse
|
7
|
Zhang C, Chen P, Lei T, Wu Y, Meng H. What-Where-When Attention Network for video-based person re-identification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Zhang G, Ge Y, Dong Z, Wang H, Zheng Y, Chen S. Deep High-Resolution Representation Learning for Cross-Resolution Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8913-8925. [PMID: 34705643 DOI: 10.1109/tip.2021.3120054] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Person re-identification (re-ID) tackles the problem of matching person images with the same identity from different cameras. In practical applications, due to the differences in camera performance and distance between cameras and persons of interest, captured person images usually have various resolutions. This problem, named Cross-Resolution Person Re-identification, presents a great challenge for the accurate person matching. In this paper, we propose a Deep High-Resolution Pseudo-Siamese Framework (PS-HRNet) to solve the above problem. Specifically, we first improve the VDSR by introducing existing channel attention (CA) mechanism and harvest a new module, i.e., VDSR-CA, to restore the resolution of low-resolution images and make full use of the different channel information of feature maps. Then we reform the HRNet by designing a novel representation head, HRNet-ReID, to extract discriminating features. In addition, a pseudo-siamese framework is developed to reduce the difference of feature distributions between low-resolution images and high-resolution images. The experimental results on five cross-resolution person datasets verify the effectiveness of our proposed approach. Compared with the state-of-the-art methods, the proposed PS-HRNet improves the Rank-1 accuracy by 3.4%, 6.2%, 2.5%,1.1% and 4.2% on MLR-Market-1501, MLR-CUHK03, MLR-VIPeR, MLR-DukeMTMC-reID, and CAVIAR datasets, respectively, which demonstrates the superiority of our method in handling the Cross-Resolution Person Re-ID task. Our code is available at https://github.com/zhguoqing.
Collapse
|
9
|
Wang K, Wang P, Ding C, Tao D. Batch Coherence-Driven Network for Part-Aware Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3405-3418. [PMID: 33651691 DOI: 10.1109/tip.2021.3060909] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing part-aware person re-identification methods typically employ two separate steps: namely, body part detection and part-level feature extraction. However, part detection introduces an additional computational cost and is inherently challenging for low-quality images. Accordingly, in this work, we propose a simple framework named Batch Coherence-Driven Network (BCD-Net) that bypasses body part detection during both the training and testing phases while still learning semantically aligned part features. Our key observation is that the statistics in a batch of images are stable, and therefore that batch-level constraints are robust. First, we introduce a batch coherence-guided channel attention (BCCA) module that highlights the relevant channels for each respective part from the output of a deep backbone model. We investigate channel-part correspondence using a batch of training images, then impose a novel batch-level supervision signal that helps BCCA to identify part-relevant channels. Second, the mean position of a body part is robust and consequently coherent between batches throughout the training process. Accordingly, we introduce a pair of regularization terms based on the semantic consistency between batches. The first term regularizes the high responses of BCD-Net for each part on one batch in order to constrain it within a predefined area, while the second encourages the aggregate of BCD-Net's responses for all parts covering the entire human body. The above constraints guide BCD-Net to learn diverse, complementary, and semantically aligned part-level features. Extensive experimental results demonstrate that BCD-Net consistently achieves state-of-the-art performance on four large-scale ReID benchmarks.
Collapse
|
10
|
Sun J, Li Y, Chen H, Peng Y, Zhu J. Unsupervised Cross Domain Person Re-Identification by Multi-Loss Optimization Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2935-2946. [PMID: 33560987 DOI: 10.1109/tip.2021.3056889] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Unsupervised cross domain (UCD) person re-identification (re-ID) aims to apply a model trained on a labeled source domain to an unlabeled target domain. It faces huge challenges as the identities have no overlap between these two domains. At present, most UCD person re-ID methods perform "supervised learning" by assigning pseudo labels to the target domain, which leads to poor re-ID performance due to the pseudo label noise. To address this problem, a multi-loss optimization learning (MLOL) model is proposed for UCD person re-ID. In addition to using the information of clustering pseudo labels from the perspective of supervised learning, two losses are designed from the view of similarity exploration and adversarial learning to optimize the model. Specifically, in order to alleviate the erroneous guidance brought by the clustering error to the model, a ranking-average-based triplet loss learning and a neighbor-consistency-based loss learning are developed. Combining these losses to optimize the model results in a deep exploration of the intra-domain relation within the target domain. The proposed model is evaluated on three popular person re-ID datasets, Market-1501, DukeMTMC-reID, and MSMT17. Experimental results show that our model outperforms the state-of-the-art UCD re-ID methods with a clear advantage.
Collapse
|