1
|
Manzoor S, An YC, In GG, Zhang Y, Kim S, Kuc TY. SPT: Single Pedestrian Tracking Framework with Re-Identification-Based Learning Using the Siamese Model. SENSORS (BASEL, SWITZERLAND) 2023; 23:4906. [PMID: 37430819 DOI: 10.3390/s23104906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/11/2023] [Accepted: 04/14/2023] [Indexed: 07/12/2023]
Abstract
Pedestrian tracking is a challenging task in the area of visual object tracking research and it is a vital component of various vision-based applications such as surveillance systems, human-following robots, and autonomous vehicles. In this paper, we proposed a single pedestrian tracking (SPT) framework for identifying each instance of a person across all video frames through a tracking-by-detection paradigm that combines deep learning and metric learning-based approaches. The SPT framework comprises three main modules: detection, re-identification, and tracking. Our contribution is a significant improvement in the results by designing two compact metric learning-based models using Siamese architecture in the pedestrian re-identification module and combining one of the most robust re-identification models for data associated with the pedestrian detector in the tracking module. We carried out several analyses to evaluate the performance of our SPT framework for single pedestrian tracking in the videos. The results of the re-identification module validate that our two proposed re-identification models surpass existing state-of-the-art models with increased accuracies of 79.2% and 83.9% on the large dataset and 92% and 96% on the small dataset. Moreover, the proposed SPT tracker, along with six state-of-the-art (SOTA) tracking models, has been tested on various indoor and outdoor video sequences. A qualitative analysis considering six major environmental factors verifies the effectiveness of our SPT tracker under illumination changes, appearance variations due to pose changes, changes in target position, and partial occlusions. In addition, quantitative analysis based on experimental results also demonstrates that our proposed SPT tracker outperforms the GOTURN, CSRT, KCF, and SiamFC trackers with a success rate of 79.7% while beating the DiamSiamRPN, SiamFC, CSRT, GOTURN, and SiamMask trackers with an average of 18 tracking frames per second.
Collapse
Affiliation(s)
- Sumaira Manzoor
- Creative Algorithms and Sensor Evolution Laboratory, Suwon 16419, Republic of Korea
| | - Ye-Chan An
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Gun-Gyo In
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Yueyuan Zhang
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Sangmin Kim
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Tae-Yong Kuc
- Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| |
Collapse
|
2
|
Lin Z, Pei W, Chen F, Zhang D, Lu G. Pedestrian Detection by Exemplar-Guided Contrastive Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:2003-2016. [PMID: 35839180 DOI: 10.1109/tip.2022.3189803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Typical methods for pedestrian detection focus on either tackling mutual occlusions between crowded pedestrians, or dealing with the various scales of pedestrians. Detecting pedestrians with substantial appearance diversities such as different pedestrian silhouettes, different viewpoints or different dressing, remains a crucial challenge. Instead of learning each of these diverse pedestrian appearance features individually as most existing methods do, we propose to perform contrastive learning to guide the feature learning in such a way that the semantic distance between pedestrians with different appearances in the learned feature space is minimized to eliminate the appearance diversities, whilst the distance between pedestrians and background is maximized. To facilitate the efficiency and effectiveness of contrastive learning, we construct an exemplar dictionary with representative pedestrian appearances as prior knowledge to construct effective contrastive training pairs and thus guide contrastive learning. Besides, the constructed exemplar dictionary is further leveraged to evaluate the quality of pedestrian proposals during inference by measuring the semantic distance between the proposal and the exemplar dictionary. Extensive experiments on both daytime and nighttime pedestrian detection validate the effectiveness of the proposed method.
Collapse
|
3
|
Small-Scale and Occluded Pedestrian Detection Using Multi Mapping Feature Extraction Function and Modified Soft-NMS. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9325803. [PMID: 36268150 PMCID: PMC9578842 DOI: 10.1155/2022/9325803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 06/28/2022] [Accepted: 09/14/2022] [Indexed: 11/17/2022]
Abstract
In autonomous driving and Intelligent transportation systems, pedestrian detection is vital in reducing traffic accidents. However, detecting small-scale and occluded pedestrians is challenging due to the ineffective utilization of the low-feature content of small-scale objects. The main reasons behind this are the stochastic nature of weight initialization and the greedy nature of nonmaximum suppression. To overcome the aforesaid issues, this work proposes a multifocus feature extractor module by fusing feature maps extracted from the Gaussian and Xavier mapping function to enhance the effective receptive field. We also employ a focused attention feature selection on a higher layer feature map of the single shot detector (SSD) region proposal module to blend with its low-layer feature to tackle the vanishing of the feature detail due to convolution and pooling operation. In addition, this work proposes a decaying nonmaximum suppression function considering score and Intersection Over Union (IOU) parameters to tackle high miss rates caused by greedy nonmaximum suppression used by SSD. Extensive experiments have been conducted on the Caltech pedestrian dataset with the original annotations and the improved annotations. Experimental results demonstrate the effectiveness of the proposed method, particularly for small and occluded pedestrians.
Collapse
|
4
|
Cao J, Pang Y, Xie J, Khan FS, Shao L. From Handcrafted to Deep Features for Pedestrian Detection: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4913-4934. [PMID: 33929956 DOI: 10.1109/tpami.2021.3076733] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Pedestrian detection is an important but challenging problem in computer vision, especially in human-centric tasks. Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances in pedestrian detection. First, we provide a detailed review of single-spectral pedestrian detection that includes handcrafted features based methods and deep features based approaches. For handcrafted features based methods, we present an extensive review of approaches and find that handcrafted features with large freedom degrees in shape and space have better performance. In the case of deep features based approaches, we split them into pure CNN based methods and those employing both handcrafted and CNN based features. We give the statistical analysis and tendency of these methods, where feature enhanced, part-aware, and post-processing methods have attracted main attention. In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance. Furthermore, we introduce some related datasets and evaluation metrics, and a deep experimental analysis. We conclude this survey by emphasizing open problems that need to be addressed and highlighting various future directions. Researchers can track an up-to-date list at https://github.com/JialeCao001/PedSurvey.
Collapse
|
5
|
Song X, Li G, Yang L, Zhu L, Hou C, Xiong Z. Real and Pseudo Pedestrian Detection Method with CA-YOLOv5s Based on Stereo Image Fusion. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1091. [PMID: 36010755 PMCID: PMC9407357 DOI: 10.3390/e24081091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/04/2022] [Accepted: 08/04/2022] [Indexed: 06/15/2023]
Abstract
With the development of convolutional neural networks, the effect of pedestrian detection has been greatly improved by deep learning models. However, the presence of pseudo pedestrians will lead to accuracy reduction in pedestrian detection. To solve the problem that the existing pedestrian detection algorithms cannot distinguish pseudo pedestrians from real pedestrians, a real and pseudo pedestrian detection method with CA-YOLOv5s based on stereo image fusion is proposed in this paper. Firstly, the two-view images of the pedestrian are captured by a binocular stereo camera. Then, a proposed CA-YOLOv5s pedestrian detection algorithm is used for the left-view and right-view images, respectively, to detect the respective pedestrian regions. Afterwards, the detected left-view and right-view pedestrian regions are matched to obtain the feature point set, and the 3D spatial coordinates of the feature point set are calculated with Zhengyou Zhang's calibration method. Finally, the RANSAC plane-fitting algorithm is adopted to extract the 3D features of the feature point set, and the real and pseudo pedestrian detection is achieved by the trained SVM. The proposed real and pseudo pedestrian detection method with CA-YOLOv5s based on stereo image fusion effectively solves the pseudo pedestrian detection problem and efficiently improves the accuracy. Experimental results also show that for the dataset with real and pseudo pedestrians, the proposed method significantly outperforms other existing pedestrian detection algorithms in terms of accuracy and precision.
Collapse
Affiliation(s)
- Xiaowei Song
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
- Dongjing Avenue Campus, Kaifeng University, Kaifeng 475004, China
| | - Gaoyang Li
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Lei Yang
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Luxiao Zhu
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Chunping Hou
- School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
| | - Zixiang Xiong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
6
|
PRNet++: Learning towards generalized occluded pedestrian detection via progressive refinement network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.056] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
7
|
Towards the design of vision-based intelligent vehicle system: methodologies and challenges. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00713-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Sun B, Ren Y, Lu X. Semisupervised Consistent Projection Metric Learning for Person Reidentification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:738-747. [PMID: 32310811 DOI: 10.1109/tcyb.2020.2979262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person reidentification is a hot topic in the computer vision field. Many efforts have been paid on modeling a discriminative distance metric. However, existing metric-learning-based methods are a lack of generalization. In this article, the poor generalization of the metric model is argued as the biased estimation problem that the independent identical distribution hypothesis is not valid. The verification experimental result shows that there is a sharp difference between the training and test samples in the metric subspace. A semisupervised consistent projection metric-learning method is proposed to ease the biased estimation problem by learning a consistent constrained metric subspace in which the identified pairs are forced to follow the distribution of the positive training pairs. First, a semisupervised method is proposed to generate potential matching pairs from the k -nearest neighbors of test samples. The potential matching pairs are used to estimate the distances' distribution center of the positive test pairs. Second, the metric subspace is improved by forcing this estimation to be close to the center of the positive training pairs. Finally, extensive experiments are conducted on five datasets and the results demonstrate that the proposed method reaches the best performance, especially on the rank-1 identification rate.
Collapse
|
9
|
Saponara S, Elhanashi A, Zheng Q. Developing a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19. JOURNAL OF REAL-TIME IMAGE PROCESSING 2022; 19:551-563. [PMID: 35222727 PMCID: PMC8863101 DOI: 10.1007/s11554-022-01203-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 01/24/2022] [Indexed: 05/13/2023]
Abstract
COVID-19 is a virus, which is transmitted through small droplets during speech, sneezing, coughing, and mostly by inhalation between individuals in close contact. The pandemic is still ongoing and causes people to have an acute respiratory infection which has resulted in many deaths. The risks of COVID-19 spread can be eliminated by avoiding physical contact among people. This research proposes real-time AI platform for people detection, and social distancing classification of individuals based on thermal camera. YOLOv4-tiny is proposed in this research for object detection. It is a simple neural network architecture, which makes it suitable for low-cost embedded devices. The proposed model is a better option compared to other approaches for real-time detection. An algorithm is also implemented to monitor social distancing using a bird's-eye perspective. The proposed approach is applied to videos acquired through thermal cameras for people detection, social distancing classification, and at the same time measuring the skin temperature for the individuals. To tune up the proposed model for individual detection, the training stage is carried out by thermal images with various indoor and outdoor environments. The final prototype algorithm has been deployed in a low-cost Nvidia Jetson devices (Xavier and Jetson Nano) which are composed of fixed camera. The proposed approach is suitable for a surveillance system within sustainable smart cities for people detection, social distancing classification, and body temperature measurement. This will help the authorities to visualize the fulfillment of the individuals with social distancing and simultaneously monitoring their skin temperature.
Collapse
Affiliation(s)
- Sergio Saponara
- Dip. Ingegneria Informazione, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy
| | - Abdussalam Elhanashi
- Dip. Ingegneria Informazione, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy
| | - Qinghe Zheng
- School of Information Science and Engineering, Shandong University, Jinan, China
| |
Collapse
|
10
|
Chu F, Cao J, Shao Z, Pang Y. Illumination-Guided Transformer-Based Network for Multispectral Pedestrian Detection. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20497-5_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
11
|
Region NMS-based deep network for gigapixel level pedestrian detection with two-step cropping. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.10.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
12
|
Galvao LG, Abbod M, Kalganova T, Palade V, Huda MN. Pedestrian and Vehicle Detection in Autonomous Vehicle Perception Systems-A Review. SENSORS (BASEL, SWITZERLAND) 2021; 21:7267. [PMID: 34770575 PMCID: PMC8587128 DOI: 10.3390/s21217267] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/16/2021] [Accepted: 10/23/2021] [Indexed: 11/16/2022]
Abstract
Autonomous Vehicles (AVs) have the potential to solve many traffic problems, such as accidents, congestion and pollution. However, there are still challenges to overcome, for instance, AVs need to accurately perceive their environment to safely navigate in busy urban scenarios. The aim of this paper is to review recent articles on computer vision techniques that can be used to build an AV perception system. AV perception systems need to accurately detect non-static objects and predict their behaviour, as well as to detect static objects and recognise the information they are providing. This paper, in particular, focuses on the computer vision techniques used to detect pedestrians and vehicles. There have been many papers and reviews on pedestrians and vehicles detection so far. However, most of the past papers only reviewed pedestrian or vehicle detection separately. This review aims to present an overview of the AV systems in general, and then review and investigate several detection computer vision techniques for pedestrians and vehicles. The review concludes that both traditional and Deep Learning (DL) techniques have been used for pedestrian and vehicle detection; however, DL techniques have shown the best results. Although good detection results have been achieved for pedestrians and vehicles, the current algorithms still struggle to detect small, occluded, and truncated objects. In addition, there is limited research on how to improve detection performance in difficult light and weather conditions. Most of the algorithms have been tested on well-recognised datasets such as Caltech and KITTI; however, these datasets have their own limitations. Therefore, this paper recommends that future works should be implemented on more new challenging datasets, such as PIE and BDD100K.
Collapse
Affiliation(s)
- Luiz G. Galvao
- Department of Electronic and Electrical Engineering, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK; (M.A.); (T.K.)
| | - Maysam Abbod
- Department of Electronic and Electrical Engineering, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK; (M.A.); (T.K.)
| | - Tatiana Kalganova
- Department of Electronic and Electrical Engineering, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK; (M.A.); (T.K.)
| | - Vasile Palade
- Centre for Data Science, Coventry University, Priory Road, Coventry CV1 5FB, UK;
| | - Md Nazmul Huda
- Department of Electronic and Electrical Engineering, Brunel University London, Kingston Ln, Uxbridge UB8 3PH, UK; (M.A.); (T.K.)
| |
Collapse
|
13
|
Abstract
Object detection in uncrewed aerial vehicle (UAV) images has been a longstanding challenge in the field of computer vision. Specifically, object detection in drone images is a complex task due to objects of various scales such as humans, buildings, water bodies, and hills. In this paper, we present an implementation of ensemble transfer learning to enhance the performance of the base models for multiscale object detection in drone imagery. Combined with a test-time augmentation pipeline, the algorithm combines different models and applies voting strategies to detect objects of various scales in UAV images. The data augmentation also presents a solution to the deficiency of drone image datasets. We experimented with two specific datasets in the open domain: the VisDrone dataset and the AU-AIR Dataset. Our approach is more practical and efficient due to the use of transfer learning and two-level voting strategy ensemble instead of training custom models on entire datasets. The experimentation shows significant improvement in the mAP for both VisDrone and AU-AIR datasets by employing the ensemble transfer learning method. Furthermore, the utilization of voting strategies further increases the 3reliability of the ensemble as the end-user can select and trace the effects of the mechanism for bounding box predictions.
Collapse
|
14
|
Tian D, Han Y, Wang B, Guan T, Wei W. A Review of Intelligent Driving Pedestrian Detection Based on Deep Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5410049. [PMID: 34335717 PMCID: PMC8318761 DOI: 10.1155/2021/5410049] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/12/2021] [Indexed: 11/18/2022]
Abstract
Pedestrian detection is a specific application of object detection. Compared with general object detection, it shows similarities and unique characteristics. In addition, it has important application value in the fields of intelligent driving and security monitoring. In recent years, with the rapid development of deep learning, pedestrian detection technology has also made great progress. However, there still exists a huge gap between it and human perception. Meanwhile, there are still a lot of problems, and there remains a lot of room for research. Regarding the application of pedestrian detection in intelligent driving technology, it is of necessity to ensure its real-time performance. Additionally, it is necessary to lighten the model while ensuring detection accuracy. This paper first briefly describes the development process of pedestrian detection and then concentrates on summarizing the research results of pedestrian detection technology in the deep learning stage. Subsequently, by summarizing the pedestrian detection dataset and evaluation criteria, the core issues of the current development of pedestrian detection are analyzed. Finally, the next possible development direction of pedestrian detection technology is explained at the end of the paper.
Collapse
Affiliation(s)
- Di Tian
- School of Automobile, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Yi Han
- School of Automobile, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Biyao Wang
- School of Automobile, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Tian Guan
- School of Automobile, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Wei Wei
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, China
| |
Collapse
|
15
|
Khan MA, Mittal M, Goyal LM, Roy S. A deep survey on supervised learning based human detection and activity classification methods. MULTIMEDIA TOOLS AND APPLICATIONS 2021; 80:27867-27923. [DOI: 10.1007/s11042-021-10811-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 03/03/2021] [Accepted: 03/10/2021] [Indexed: 08/25/2024]
|
16
|
Low-Altitude Remote Sensing Opium Poppy Image Detection Based on Modified YOLOv3. REMOTE SENSING 2021. [DOI: 10.3390/rs13112130] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Poppy is a special medicinal plant. Its cultivation requires legal approval and strict supervision. Unauthorized cultivation of opium poppy is forbidden. Low-altitude inspection of poppy illegal cultivation through unmanned aerial vehicle is featured with the advantages of time-saving and high efficiency. However, a large amount of inspection image data collected need to be manually screened and analyzed. This process not only consumes a lot of manpower and material resources, but is also subjected to omissions and errors. In response to such a problem, this paper proposed an inspection method by adding a larger-scale detection box on the basis of the original YOLOv3 algorithm to improve the accuracy of small target detection. Specifically, ResNeXt group convolution was utilized to reduce the number of model parameters, and an ASPP module was added before the small-scale detection box to improve the model’s ability to extract local features and obtain contextual information. The test results on a self-created dataset showed that: the mAP (mean average precision) indicator of the Global Multiscale-YOLOv3 model was 0.44% higher than that of the YOLOv3 (MobileNet) algorithm; the total number of parameters of the proposed model was only 13.75% of that of the original YOLOv3 model and 35.04% of that of the lightweight network YOLOv3 (MobileNet). Overall, the Global Multiscale-YOLOv3 model had a reduced number of parameters and increased recognition accuracy. It provides technical support for the rapid and accurate image processing in low-altitude remote sensing poppy inspection.
Collapse
|
17
|
Xie J, Pang Y, Khan MH, Anwer RM, Khan FS, Shao L. Mask-Guided Attention Network and Occlusion-Sensitive Hard Example Mining for Occluded Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3872-3884. [PMID: 33275581 DOI: 10.1109/tip.2020.3040854] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved on standard pedestrians, the performance on heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other pedestrians and inter-class occlusions caused by other objects, such as cars and bicycles. These result in a multitude of occlusion patterns. We propose an approach for occluded pedestrian detection with the following contributions. First, we introduce a novel mask-guided attention network that fits naturally into popular pedestrian detection pipelines. Our attention network emphasizes on visible pedestrian regions while suppressing the occluded ones by modulating full body features. Second, we propose the occlusion-sensitive hard example mining method and occlusion-sensitive loss that mines hard samples according to the occlusion level and assigns higher weights to the detection errors occurring at highly occluded pedestrians. Third, we empirically demonstrate that weak box-based segmentation annotations provide reasonable approximation to their dense pixel-wise counterparts. Experiments are performed on CityPersons, Caltech and ETH datasets. Our approach sets a new state-of-the-art on all three datasets. Our approach obtains an absolute gain of 10.3% in log-average miss rate, compared with the best reported results on the heavily occluded HO pedestrian set of the CityPersons test set. Code and models are available at: https://github.com/Leotju/MGAN.
Collapse
|
18
|
Abstract
Feature-based pedestrian detection method is currently the mainstream direction to solve the problem of pedestrian detection. In this kind of method, whether the appropriate feature can be extracted is the key to the comprehensive performance of the whole pedestrian detection system. It is believed that the appearance of a pedestrian can be better captured by the combination of edge/local shape feature and texture feature. In this field, the current method is to simply concatenate HOG (histogram of oriented gradient) features and LBP (local binary pattern) features extracted from an image to produce a new feature with large dimension. This kind of method achieves better performance at the cost of increasing the number of features. In this paper, Choquet integral based on the signed fuzzy measure is introduced to fuse HOG and LBP descriptors in parallel that is expected to improve accuracy without increasing feature dimensions. The parameters needed in the whole fusion process are optimized by a training algorithm based on genetic algorithm. This architecture has three advantages. Firstly, because the fusion of HOG and LBP features is parallel, the dimensions of the new features are not increased. Secondly, the speed of feature fusion is fast, thus reducing the time of pedestrian detection. Thirdly, the new features after fusion have the advantages of HOG and LBP features, which is helpful to improve the detection accuracy. The series of experimentation with the architecture proposed in this paper reaches promising and satisfactory results.
Collapse
|
19
|
Hsu WY, Lin WY. Ratio-and-Scale-Aware YOLO for Pedestrian Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:934-947. [PMID: 33242306 DOI: 10.1109/tip.2020.3039574] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Current deep learning methods seldom consider the effects of small pedestrian ratios and considerable differences in the aspect ratio of input images, which results in low pedestrian detection performance. This study proposes the ratio-and-scale-aware YOLO (RSA-YOLO) method to solve the aforementioned problems. The following procedure is adopted in this method. First, ratio-aware mechanisms are introduced to dynamically adjust the input layer length and width hyperparameters of YOLOv3, thereby solving the problem of considerable differences in the aspect ratio. Second, intelligent splits are used to automatically and appropriately divide the original images into two local images. Ratio-aware YOLO (RA-YOLO) is iteratively performed on the two local images. Because the original and local images produce low- and high-resolution pedestrian detection information after RA-YOLO, respectively, this study proposes new scale-aware mechanisms in which multiresolution fusion is used to solve the problem of misdetection of remarkably small pedestrians in images. The experimental results indicate that the proposed method produces favorable results for images with extremely small objects and those with considerable differences in the aspect ratio. Compared with the original YOLOs (i.e., YOLOv2 and YOLOv3) and several state-of-the-art approaches, the proposed method demonstrated a superior performance for the VOC 2012 comp4, INRIA, and ETH databases in terms of the average precision, intersection over union, and lowest log-average miss rate.
Collapse
|
20
|
Liu T, Luo W, Ma L, Huang JJ, Stathaki T, Dai T. Coupled Network for Robust Pedestrian Detection With Gated Multi-Layer Feature Extraction and Deformable Occlusion Handling. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:754-766. [PMID: 33237856 DOI: 10.1109/tip.2020.3038371] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Pedestrian detection methods have been significantly improved with the development of deep convolutional neural networks. Nevertheless, detecting ismall-scaled pedestrians and occluded pedestrians remains a challenging problem. In this paper, we propose a pedestrian detection method with a couple-network to simultaneously address these two issues. One of the sub-networks, the gated multi-layer feature extraction sub-network, aims to adaptively generate discriminative features for pedestrian candidates in order to robustly detect pedestrians with large variations on scale. The second sub-network targets on handling the occlusion problem of pedestrian detection by using deformable regional region of interest (RoI)-pooling. We investigate two different gate units for the gated sub-network, namely, the channel-wise gate unit and the spatio-wise gate unit, which can enhance the representation ability of the regional convolutional features among the channel dimensions or across the spatial domain, repetitively. Ablation studies have validated the effectiveness of both the proposed gated multi-layer feature extraction sub-network and the deformable occlusion handling sub-network. With the coupled framework, our proposed pedestrian detector achieves promising results on both two pedestrian datasets, especially on detecting small or occluded pedestrians. On the CityPersons dataset, the proposed detector achieves the lowest missing rates (i.e. 40.78% and 34.60%) on detecting small and occluded pedestrians, surpassing the second best comparison method by 6.0% and 5.87%, respectively.
Collapse
|
21
|
Pang Y, Cao J, Li Y, Xie J, Sun H, Gong J. TJU-DHD: A Diverse High-Resolution Dataset for Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:207-219. [PMID: 33141669 DOI: 10.1109/tip.2020.3034487] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Vehicles, pedestrians, and riders are the most important and interesting objects for the perception modules of self-driving vehicles and video surveillance. However, the state-of-the-art performance of detecting such important objects (esp. small objects) is far from satisfying the demand of practical systems. Large-scale, rich-diversity, and high-resolution datasets play an important role in developing better object detection methods to satisfy the demand. Existing public large-scale datasets such as MS COCO collected from websites do not focus on the specific scenarios. Moreover, the popular datasets (e.g., KITTI and Citypersons) collected from the specific scenarios are limited in the number of images and instances, the resolution, and the diversity. To attempt to solve the problem, we build a diverse high-resolution dataset (called TJU-DHD). The dataset contains 115354 high-resolution images (52% images have a resolution of 1624×1200 pixels and 48% images have a resolution of at least 2, 560×1.440 pixels) and 709 330 labeled objects in total with a large variance in scale and appearance. Meanwhile, the dataset has a rich diversity in season variance, illumination variance, and weather variance. In addition, a new diverse pedestrian dataset is further built. With the four different detectors (i.e., the one-stage RetinaNet, anchor-free FCOS, two-stage FPN, and Cascade R-CNN), experiments about object detection and pedestrian detection are conducted. We hope that the newly built dataset can help promote the research on object detection and pedestrian detection in these two scenes. The dataset is available at https://github.com/tjubiit/TJU-DHD.
Collapse
|
22
|
Guo Q, Liu Q, Wang W, Zhang Y, Kang Q. A fast occluded passenger detector based on MetroNet and Tiny MetroNet. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.05.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
23
|
Jiang X, Zhang L, Lv P, Guo Y, Zhu R, Li Y, Pang Y, Li X, Zhou B, Xu M. Learning Multi-Level Density Maps for Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2705-2715. [PMID: 31562106 DOI: 10.1109/tnnls.2019.2933920] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
People in crowd scenes often exhibit the characteristic of imbalanced distribution. On the one hand, people size varies largely due to the camera perspective. People far away from the camera look smaller and are likely to occlude each other, whereas people near to the camera look larger and are relatively sparse. On the other hand, the number of people also varies greatly in the same or different scenes. This article aims to develop a novel model that can accurately estimate the crowd count from a given scene with imbalanced people distribution. To this end, we have proposed an effective multi-level convolutional neural network (MLCNN) architecture that first adaptively learns multi-level density maps and then fuses them to predict the final output. Density map of each level focuses on dealing with people of certain sizes. As a result, the fusion of multi-level density maps is able to tackle the large variation in people size. In addition, we introduce a new loss function named balanced loss (BL) to impose relatively BL feedback during training, which helps further improve the performance of the proposed network. Furthermore, we introduce a new data set including 1111 images with a total of 49 061 head annotations. MLCNN is easy to train with only one end-to-end training stage. Experimental results demonstrate that our MLCNN achieves state-of-the-art performance. In particular, our MLCNN reaches a mean absolute error (MAE) of 242.4 on the UCF_CC_50 data set, which is 37.2 lower than the second-best result.
Collapse
|
24
|
Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques and Embedded Platforms—A Comprehensive Review. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10093280] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In recent years there has been remarkable progress in one computer vision application area: object detection. One of the most challenging and fundamental problems in object detection is locating a specific object from the multiple objects present in a scene. Earlier traditional detection methods were used for detecting the objects with the introduction of convolutional neural networks. From 2012 onward, deep learning-based techniques were used for feature extraction, and that led to remarkable breakthroughs in this area. This paper shows a detailed survey on recent advancements and achievements in object detection using various deep learning techniques. Several topics have been included, such as Viola–Jones (VJ), histogram of oriented gradient (HOG), one-shot and two-shot detectors, benchmark datasets, evaluation metrics, speed-up techniques, and current state-of-art object detectors. Detailed discussions on some important applications in object detection areas, including pedestrian detection, crowd detection, and real-time object detection on Gpu-based embedded systems have been presented. At last, we conclude by identifying promising future directions.
Collapse
|
25
|
|
26
|
Cao J, Pang Y, Han J, Gao B, Li X. Taking a Look at Small-Scale Pedestrians and Occluded Pedestrians. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3143-3152. [PMID: 31831419 DOI: 10.1109/tip.2019.2957927] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Small-scale pedestrian detection and occluded pedestrian detection are two challenging tasks. However, most state-of-the-art methods merely handle one single task each time, thus giving rise to relatively poor performance when the two tasks, in practice, are required simultaneously. In this paper, it is found that small-scale pedestrian detection and occluded pedestrian detection actually have a common problem, i.e., an inaccurate location problem. Therefore, solving this problem enables to improve the performance of both tasks. To this end, we pay more attention to the predicted bounding box with worse location precision and extract more contextual information around objects, where two modules (i.e., location bootstrap and semantic transition) are proposed. The location bootstrap is used to reweight regression loss, where the loss of the predicted bounding box far from the corresponding ground-truth is upweighted and the loss of the predicted bounding box near the corresponding ground-truth is downweighted. Additionally, the semantic transition adds more contextual information and relieves semantic inconsistency of the skip-layer fusion. Since the location bootstrap is not used at the test stage and the semantic transition is lightweight, the proposed method does not add many extra computational costs during inference. Experiments on the challenging CityPersons and Caltech datasets show that the proposed method outperforms the state-of-the-art methods on the small-scale pedestrians and occluded pedestrians (e.g., 5.20% and 4.73% improvements on the Caltech).
Collapse
|
27
|
Dual-Resolution Dual-Path Convolutional Neural Networks for Fast Object Detection. SENSORS 2019; 19:s19143111. [PMID: 31337121 PMCID: PMC6679249 DOI: 10.3390/s19143111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 07/08/2019] [Accepted: 07/11/2019] [Indexed: 11/17/2022]
Abstract
Downsampling input images is a simple trick to speed up visual object-detection algorithms, especially on robotic vision and applied mobile vision systems. However, this trick comes with a significant decline in accuracy. In this paper, dual-resolution dual-path Convolutional Neural Networks (CNNs), named DualNets, are proposed to bump up the accuracy of those detection applications. In contrast to previous methods that simply downsample the input images, DualNets explicitly take dual inputs in different resolutions and extract complementary visual features from these using dual CNN paths. The two paths in a DualNet are a backbone path and an auxiliary path that accepts larger inputs and then rapidly downsamples them to relatively small feature maps. With the help of the carefully designed auxiliary CNN paths in DualNets, auxiliary features are extracted from the larger input with controllable computation. Auxiliary features are then fused with the backbone features using a proposed progressive residual fusion strategy to enrich feature representation.This architecture, as the feature extractor, is further integrated with the Single Shot Detector (SSD) to accomplish latency-sensitive visual object-detection tasks. We evaluate the resulting detection pipeline on Pascal VOC and MS COCO benchmarks. Results show that the proposed DualNets can raise the accuracy of those CNN detection applications that are sensitive to computation payloads.
Collapse
|
28
|
A Building Extraction Approach Based on the Fusion of LiDAR Point Cloud and Elevation Map Texture Features. REMOTE SENSING 2019. [DOI: 10.3390/rs11141636] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Building extraction is an important way to obtain information in urban planning, land management, and other fields. As remote sensing has various advantages such as large coverage and real-time capability, it becomes an essential approach for building extraction. Among various remote sensing technologies, the capability of providing 3D features makes the LiDAR point cloud become a crucial means for building extraction. However, the LiDAR point cloud has difficulty distinguishing objects with similar heights, in which case texture features are able to extract different objects in a 2D image. In this paper, a building extraction method based on the fusion of point cloud and texture features is proposed, and the texture features are extracted by using an elevation map that expresses the height of each point. The experimental results show that the proposed method obtains better extraction results than that of other texture feature extraction methods and ENVI software in all experimental areas, and the extraction accuracy is always higher than 87%, which is satisfactory for some practical work.
Collapse
|
29
|
Zhang B, Zhao Q, Feng W, Lyu S. AlphaMEX: A smarter global pooling method for convolutional neural networks. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.079] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
30
|
|
31
|
Hou L, Liu Q, Chen Z, Xu J. Human Detection in Intelligent Video Surveillance: A Review. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2018. [DOI: 10.20965/jaciii.2018.p1056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the rapid development of networked video surveillance systems, human detection is essential. These tasks are not only inherently challenging due to changing human appearance, but also have enormous potentials for a wide range of practical applications, such as security and surveillance. This review paper extensively surveys the current progress made toward human detection in intelligent video surveillance. The algorithms presented in this paper are classified as either human detection without classifier training or human detection with classifier training. In the core techniques of human detection without classifier training, three critical processing stages are discussed including background subtraction, Gaussian mixture model (GMM) and skin color model. In the core techniques of human detection with classifier training, two main types are mentioned including holistic human detector, and part-based human detector. Our survey aims to address existing problems, challenges and future research directions based on the analyses of the current progress made toward human detection techniques in computer vision.
Collapse
|
32
|
Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps. SENSORS 2018; 18:s18041063. [PMID: 29614807 PMCID: PMC5948919 DOI: 10.3390/s18041063] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 03/25/2018] [Accepted: 03/26/2018] [Indexed: 11/17/2022]
Abstract
The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, there are two shortcomings in MRFC which limit its accuracy. One is that the receptive field correspondence in different scales is weak. Another is that the features used are not scale invariance. In this paper, two solutions are proposed to tackle with the two shortcomings respectively. Specifically, scale-aware pooling is proposed to make a better receptive field correspondence, and soft decision tree is proposed to relive scale variance problem. When coupled with efficient sliding window classification strategy, our detector achieves fast detecting speed at the same time with state-of-the-art accuracy.
Collapse
|
33
|
Cong R, Lei J, Fu H, Huang Q, Cao X, Hou C. Co-Saliency Detection for RGBD Images Based on Multi-Constraint Feature Matching and Cross Label Propagation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:568-579. [PMID: 29053455 DOI: 10.1109/tip.2017.2763819] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Co-saliency detection aims at extracting the common salient regions from an image group containing two or more relevant images. It is a newly emerging topic in computer vision community. Different from the most existing co-saliency methods focusing on RGB images, this paper proposes a novel co-saliency detection model for RGBD images, which utilizes the depth information to enhance identification of co-saliency. First, the intra saliency map for each image is generated by the single image saliency model, while the inter saliency map is calculated based on the multi-constraint feature matching, which represents the constraint relationship among multiple images. Then, the optimization scheme, namely cross label propagation, is used to refine the intra and inter saliency maps in a cross way. Finally, all the original and optimized saliency maps are integrated to generate the final co-saliency result. The proposed method introduces the depth information and multi-constraint feature matching to improve the performance of co-saliency detection. Moreover, the proposed method can effectively exploit any existing single image saliency model to work well in co-saliency scenarios. Experiments on two RGBD co-saliency datasets demonstrate the effectiveness of our proposed model.
Collapse
|
34
|
|