1
|
Tang S, Zhou Y, Li J, Liu C, Shi J. Attention-Guided Sample-Based Feature Enhancement Network for Crowded Pedestrian Detection Using Vision Sensors. SENSORS (BASEL, SWITZERLAND) 2024; 24:6350. [PMID: 39409392 PMCID: PMC11478508 DOI: 10.3390/s24196350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/26/2024] [Accepted: 08/27/2024] [Indexed: 10/20/2024]
Abstract
Occlusion presents a major obstacle in the development of pedestrian detection technologies utilizing computer vision. This challenge includes both inter-class occlusion caused by environmental objects obscuring pedestrians, and intra-class occlusion resulting from interactions between pedestrians. In complex and variable urban settings, these compounded occlusion patterns critically limit the efficacy of both one-stage and two-stage pedestrian detectors, leading to suboptimal detection performance. To address this, we introduce a novel architecture termed the Attention-Guided Feature Enhancement Network (AGFEN), designed within the deep convolutional neural network framework. AGFEN improves the semantic information of high-level features by mapping it onto low-level feature details through sampling, creating an effect comparable to mask modulation. This technique enhances both channel-level and spatial-level features concurrently without incurring additional annotation costs. Furthermore, we transition from a traditional one-to-one correspondence between proposals and predictions to a one-to-multiple paradigm, facilitating non-maximum suppression using the prediction set as the fundamental unit. Additionally, we integrate these methodologies by aggregating local features between regions of interest (RoI) through the reuse of classification weights, effectively mitigating false positives. Our experimental evaluations on three widely used datasets demonstrate that AGFEN achieves a 2.38% improvement over the baseline detector on the CrowdHuman dataset, underscoring its effectiveness and potential for advancing pedestrian detection technologies.
Collapse
Affiliation(s)
- Shuyuan Tang
- State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; (Y.Z.); (J.L.); (C.L.); (J.S.)
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China
| | - Yiqing Zhou
- State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; (Y.Z.); (J.L.); (C.L.); (J.S.)
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China
| | - Jintao Li
- State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; (Y.Z.); (J.L.); (C.L.); (J.S.)
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China
| | - Chang Liu
- State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; (Y.Z.); (J.L.); (C.L.); (J.S.)
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China
| | - Jinglin Shi
- State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China; (Y.Z.); (J.L.); (C.L.); (J.S.)
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China
| |
Collapse
|
2
|
Lin Z, Pei W, Chen F, Zhang D, Lu G. Pedestrian Detection by Exemplar-Guided Contrastive Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2003-2016. [PMID: 35839180 DOI: 10.1109/tip.2022.3189803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Typical methods for pedestrian detection focus on either tackling mutual occlusions between crowded pedestrians, or dealing with the various scales of pedestrians. Detecting pedestrians with substantial appearance diversities such as different pedestrian silhouettes, different viewpoints or different dressing, remains a crucial challenge. Instead of learning each of these diverse pedestrian appearance features individually as most existing methods do, we propose to perform contrastive learning to guide the feature learning in such a way that the semantic distance between pedestrians with different appearances in the learned feature space is minimized to eliminate the appearance diversities, whilst the distance between pedestrians and background is maximized. To facilitate the efficiency and effectiveness of contrastive learning, we construct an exemplar dictionary with representative pedestrian appearances as prior knowledge to construct effective contrastive training pairs and thus guide contrastive learning. Besides, the constructed exemplar dictionary is further leveraged to evaluate the quality of pedestrian proposals during inference by measuring the semantic distance between the proposal and the exemplar dictionary. Extensive experiments on both daytime and nighttime pedestrian detection validate the effectiveness of the proposed method.
Collapse
|
3
|
Cao J, Pang Y, Xie J, Khan FS, Shao L. From Handcrafted to Deep Features for Pedestrian Detection: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4913-4934. [PMID: 33929956 DOI: 10.1109/tpami.2021.3076733] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and a deep experimental analysis. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at https://github.com/JialeCao001/PedSurvey.
Collapse
|
4
|
Wang T, Wan L, Tang L, Liu M. MGA-YOLOv4: a multi-scale pedestrian detection method based on mask-guided attention. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03061-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
Selective region enlargement network for fast object detection in high resolution images. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
6
|
Que L, Zhang T, Guo H, Jia C, Gong Y, Chang L, Zhou J. A Lightweight Pedestrian Detection Engine with Two-Stage Low-Complexity Detection Network and Adaptive Region Focusing Technique. SENSORS 2021; 21:s21175851. [PMID: 34502741 PMCID: PMC8434331 DOI: 10.3390/s21175851] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/15/2021] [Accepted: 08/23/2021] [Indexed: 11/16/2022]
Abstract
Pedestrian detection has been widely used in applications such as video surveillance and intelligent robots. Recently, deep learning-based pedestrian detection engines have attracted lots of attention. However, the computational complexity of these engines is high, which makes them unsuitable for hardware- and power-constrained mobile applications, such as drones for surveillance. In this paper, we propose a lightweight pedestrian detection engine with a two-stage low-complexity detection network and adaptive region focusing technique, to reduce the computational complexity in pedestrian detection, while maintaining sufficient detection accuracy. The proposed pedestrian detection engine has significantly reduced the number of parameters (0.73 M) and operations (1.04 B), while achieving a comparable precision (85.18%) and miss rate (25.16%) to many existing designs. Moreover, the proposed engine, together with YOLOv3 and YOLOv3-Tiny, has been implemented on a Xilinx FPGA Zynq7020 for comparison. It is able to achieve 16.3 Fps while consuming 0.59 W, which outperforms the results of YOLOv3 (5.3 Fps, 2.43 W) and YOLOv3-Tiny (12.8 Fps, 0.95 W).
Collapse
|
7
|
Xie J, Pang Y, Khan MH, Anwer RM, Khan FS, Shao L. Mask-Guided Attention Network and Occlusion-Sensitive Hard Example Mining for Occluded Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3872-3884. [PMID: 33275581 DOI: 10.1109/tip.2020.3040854] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved on standard pedestrians, the performance on heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other pedestrians and inter-class occlusions caused by other objects, such as cars and bicycles. These result in a multitude of occlusion patterns. We propose an approach for occluded pedestrian detection with the following contributions. First, we introduce a novel mask-guided attention network that fits naturally into popular pedestrian detection pipelines. Our attention network emphasizes on visible pedestrian regions while suppressing the occluded ones by modulating full body features. Second, we propose the occlusion-sensitive hard example mining method and occlusion-sensitive loss that mines hard samples according to the occlusion level and assigns higher weights to the detection errors occurring at highly occluded pedestrians. Third, we empirically demonstrate that weak box-based segmentation annotations provide reasonable approximation to their dense pixel-wise counterparts. Experiments are performed on CityPersons, Caltech and ETH datasets. Our approach sets a new state-of-the-art on all three datasets. Our approach obtains an absolute gain of 10.3% in log-average miss rate, compared with the best reported results on the heavily occluded HO pedestrian set of the CityPersons test set. Code and models are available at: https://github.com/Leotju/MGAN.
Collapse
|
8
|
Shao X, Wang Q, Yang W, Chen Y, Xie Y, Shen Y, Wang Z. Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet. SENSORS 2021; 21:s21051820. [PMID: 33807795 PMCID: PMC7961544 DOI: 10.3390/s21051820] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 02/22/2021] [Accepted: 03/02/2021] [Indexed: 11/16/2022]
Abstract
The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.
Collapse
Affiliation(s)
- Xiaotao Shao
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China; (X.S.); (Q.W.); (W.Y.); (Z.W.)
| | - Qing Wang
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China; (X.S.); (Q.W.); (W.Y.); (Z.W.)
| | - Wei Yang
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China; (X.S.); (Q.W.); (W.Y.); (Z.W.)
| | - Yun Chen
- Shanghai Aerospace Control Technology Institute, Shanghai 201109, China;
| | - Yi Xie
- Beijing Xinghang Mechanical-Electrical Equipment Co., Ltd., Beijing 100074, China;
| | - Yan Shen
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China; (X.S.); (Q.W.); (W.Y.); (Z.W.)
- Correspondence:
| | - Zhongli Wang
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China; (X.S.); (Q.W.); (W.Y.); (Z.W.)
| |
Collapse
|
9
|
Jiao Y, Yao H, Xu C. SAN: Selective Alignment Network for Cross-Domain Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2155-2167. [PMID: 33471752 DOI: 10.1109/tip.2021.3049948] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-domain pedestrian detection, which has been attracting much attention, assumes that the training and test images are drawn from different data distributions. Existing methods focus on aligning the descriptions of whole candidate instances between source and target domains. Since there exists a giant visual difference among the candidate instances, aligning whole candidate instances between two domains cannot overcome the inter-instance difference. Compared with aligning the whole candidate instances, we consider that aligning each type of instances separately is a more reasonable manner. Therefore, we propose a novel Selective Alignment Network for cross-domain pedestrian detection, which consists of three components: a Base Detector, an Image-Level Adaptation Network, and an Instance-Level Adaptation Network. The Image-Level Adaptation Network and Instance-Level Adaptation Network can be regarded as the global-level and local-level alignments, respectively. Similar to the Faster R-CNN, the Base Detector, which is composed of a Feature module, an RPN module and a Detection module, is used to infer a robust pedestrian detector with the annotated source data. Once obtaining the image description extracted by the Feature module, the Image-Level Adaptation Network is proposed to align the image description with an adversarial domain classifier. Given the candidate proposals generated by the RPN module, the Instance-Level Adaptation Network firstly clusters the source candidate proposals into several groups according to their visual features, and thus generates the pseudo label for each candidate proposal. After generating the pseudo labels, we align the source and target domains by maximizing and minimizing the discrepancy between the prediction of two classifiers iteratively. Extensive evaluations on several benchmarks demonstrate the effectiveness of the proposed approach for cross-domain pedestrian detection.
Collapse
|
10
|
Ji Z, Liu X, Pang Y, Ouyang W, Li X. Few-Shot Human-Object Interaction Recognition With Semantic-Guided Attentive Prototypes Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1648-1661. [PMID: 33382652 DOI: 10.1109/tip.2020.3046861] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of few-shot learning that learns a robust model from a few instances, we formulate HOI as a few-shot task in a meta-learning framework to alleviate the above challenges. Due to the fact that the intrinsical characteristic of HOI is diverse and interactive, we propose a Semantic-guided Attentive Prototypes Network (SAPNet) framework to learn a semantic-guided metric space where HOI recognition can be performed by computing distances to attentive prototypes of each class. Specifically, the model generates attentive prototypes guided by the category names of actions and objects, which highlight the commonalities of images from the same class in HOI. In addition, we design two alternative prototypes calculation methods, i.e., Prototypes Shift (PS) approach and Hallucinatory Graph Prototypes (HGP) approach, which explore to learn a suitable category prototypes representations in HOI. Finally, in order to realize the task of few-shot HOI, we reorganize 2 HOI benchmark datasets with 2 split strategies, i.e., HICO-NN, TUHOI-NN, HICO-NF, and TUHOI-NF. Extensive experimental results on these datasets have demonstrated the effectiveness of our proposed SAPNet approach.
Collapse
|
11
|
Tang Y, Li B, Liu M, Chen B, Wang Y, Ouyang W. AutoPedestrian: An Automatic Data Augmentation and Loss Function Search Scheme for Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8483-8496. [PMID: 34618670 DOI: 10.1109/tip.2021.3115672] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Pedestrian detection is a challenging and hot research topic in the field of computer vision, especially for the crowded scenes where occlusion happens frequently. In this paper, we propose a novel AutoPedestrian scheme that automatically augments the pedestrian data and searches for suitable loss functions, aiming for better performance of pedestrian detection especially in crowded scenes. To our best knowledge, it is the first work to automatically search the optimal policy of data augmentation and loss function jointly for the pedestrian detection. To achieve the goal of searching the optimal augmentation scheme and loss function jointly, we first formulate the data augmentation policy and loss function as probability distributions based on different hyper-parameters. Then, we apply a double-loop scheme with importance-sampling to solve the optimization problem of data augmentation and loss function types efficiently. Comprehensive experiments on two popular benchmarks of CrowdHuman and CityPersons show the effectiveness of our proposed method. In particular, we achieve 40.58% in MR on CrowdHuman datasets and 11.3% in MR on CityPersons reasonable subset, yielding new state-of-the-art results on these two datasets.
Collapse
|