1
|
Lin H, Hong X, Ma Z, Wang Y, Meng D. Multidimensional Measure Matching for Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9112-9126. [PMID: 39190524 DOI: 10.1109/tnnls.2024.3435854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
This article addresses the challenge of scale variations in crowd-counting problems from a multidimensional measure-theoretic perspective. We start by formulating crowd counting as a measure-matching problem, based on the assumption that discrete measures can express the scattered ground truth and the predicted density map. In this context, we introduce the Sinkhorn counting loss and extend it to the semi-balanced form, which alleviates the problems including entropic bias, distance destruction, and amount constraints. We then model the measure matching under the multidimensional space, in order to learn the counting from both location and scale. To achieve this, we extend the traditional 2-D coordinate support to 3-D, incorporating an additional axis to represent scale information, where a pyramid-based structure will be leveraged to learn the scale value for the predicted density. Extensive experiments on four challenging crowd-counting datasets, namely, ShanghaiTech A, UCF-QNRF, JHU++, and NWPU have validated the proposed method. Code is released at https://github.com/LoraLinH/Multidimensional-Measure-Matching-for-Crowd-Counting.
Collapse
|
2
|
Chen Z, Zhang S, Zheng X, Zhao X, Kong Y. Crowd Counting Based on Multiscale Spatial Guided Perception Aggregation Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17465-17478. [PMID: 37610898 DOI: 10.1109/tnnls.2023.3304348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Crowd counting has received extensive attention in the field of computer vision, and methods based on deep convolutional neural networks (CNNs) have made great progress in this task. However, challenges such as scale variation, nonuniform distribution, complex background, and occlusion in crowded scenes hinder the performance of these networks in crowd counting. In order to overcome these challenges, this article proposes a multiscale spatial guidance perception aggregation network (MGANet) to achieve efficient and accurate crowd counting. MGANet consists of three parts: multiscale feature extraction network (MFEN), spatial guidance network (SGN), and attention fusion network (AFN). Specifically, to alleviate the scale variation problem in crowded scenes, MFEN is introduced to enhance the scale adaptability and effectively capture multiscale features in scenes with drastic scale variation. To address the challenges of nonuniform distribution and complex background in population, an SGN is proposed. The SGN includes two parts: the spatial context network (SCN) and the guidance perception network (GPN). SCN is used to capture the detailed semantic information between the multiscale feature positions extracted by MFEN, and improve the ability of deep structured information exploration. At the same time, the dependence relationship between the spatial remote context is established to enhance the receptive field. GPN is used to enhance the information exchange between channels and guide the network to select appropriate multiscale features and spatial context semantic features. AFN is used to adaptively measure the importance of the above different features, and obtain accurate and effective feature representations from them. In addition, this article proposes a novel region-adaptive loss function, which optimizes the regions with large recognition errors in the image, and alleviates the inconsistency between the training target and the evaluation metric. In order to evaluate the performance of the proposed method, extensive experiments were carried out on challenging benchmarks including ShanghaiTech Part A and Part B, UCF-CC-50, UCF-QNRF, and JHU-CROWD++. Experimental results show that the proposed method has good performance on all four datasets. Especially on ShanghaiTech Part A and Part B, CUCF-QNRF, and JHU-CROWD++ datasets, compared with the state-of-the-art methods, our proposed method achieves superior recognition performance and better robustness.
Collapse
|
3
|
Zang H, Peng Y, Zhou M, Li G, Zheng G, Shen H. Automatic detection and counting of wheat spike based on DMseg-Count. Sci Rep 2024; 14:29676. [PMID: 39613805 PMCID: PMC11607314 DOI: 10.1038/s41598-024-80244-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 11/18/2024] [Indexed: 12/01/2024] Open
Abstract
The automatic detection and counting of wheat spike images are of great significance for yield prediction and variety evaluation. Therefore, accurate and timely estimation of spike numbers is crucial for wheat production. However, in actual production, due to the susceptibility of wheat spike images to factors such as lighting conditions, shooting angles, occlusion, and overlap, the contour and features of wheat spike is unclear, which affects the accuracy of automatic detection and counting of wheat spike. In order to solve the above problems and further improve the accuracy of wheat spike counting, an improved wheat spike counting model DMseg-Count was proposed by enhancing local contextual supervision information based on existing target object counting model DM-Count. Firstly, wheat spike local segmentation branch was introduced to improve the network architecture of DM-Count, so as to extract the local contextual supervision information of wheat spike. Secondly, an element-by-element point multiplication mechanism was designed to fuse global and local contextual supervision information of wheat spike. Finally, the total loss function was constructed to optimize the model. The test results showed that the mean absolute error (MAE) and root mean square error (RMSE) of the proposed DMseg-Count model were 5.79 and 7.54, respectively, which were 9.76 and 10.91 higher than the standard distribution matching for crowd counting (DM-Count) model. Compared with other deep learning models, the proposed DMseg-Count model can detect wheat spike image in challenging situations, and has better computer vision processing capabilities and performance evaluation detection effect. In summary, the proposed DMseg-Count model can effectively detect wheat spike and has good counting performance, which provides a new method for automatic counting of wheat spike and yield prediction in complex field environments.
Collapse
Affiliation(s)
- Hecang Zang
- Institute of Agricultural Information Technology, Henan Academy of Agricultural Sciences, Zhengzhou, 450002, China
- Huanghuaihai Key Laboratory of Intelligent Agricultural Technology, Ministry of Agriculture and Rural Areas, Zhengzhou, 450002, China
| | - Yilong Peng
- Institute of Agricultural Information Technology, Henan Academy of Agricultural Sciences, Zhengzhou, 450002, China
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China
| | - Meng Zhou
- Institute of Agricultural Information Technology, Henan Academy of Agricultural Sciences, Zhengzhou, 450002, China
- Huanghuaihai Key Laboratory of Intelligent Agricultural Technology, Ministry of Agriculture and Rural Areas, Zhengzhou, 450002, China
| | - Guoqiang Li
- Institute of Agricultural Information Technology, Henan Academy of Agricultural Sciences, Zhengzhou, 450002, China
- Huanghuaihai Key Laboratory of Intelligent Agricultural Technology, Ministry of Agriculture and Rural Areas, Zhengzhou, 450002, China
| | - Guoqing Zheng
- Institute of Agricultural Information Technology, Henan Academy of Agricultural Sciences, Zhengzhou, 450002, China
- Huanghuaihai Key Laboratory of Intelligent Agricultural Technology, Ministry of Agriculture and Rural Areas, Zhengzhou, 450002, China
| | - Hualei Shen
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China.
| |
Collapse
|
4
|
Chen Y, Wang Q, Yang J, Chen B, Xiong H, Du S. Learning Discriminative Features for Crowd Counting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3749-3764. [PMID: 38848225 DOI: 10.1109/tip.2024.3408609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high-level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM). The MPM randomly masks feature vectors in the feature map and then reconstructs them, allowing the model to learn about what is present in the masked regions and improving the model's ability to localize objects in high-density regions. The CLM pulls targets close to each other and pushes them far away from background in the feature space, enabling the model to discriminate foreground objects from background. Additionally, the proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection, where dense scenes or cluttered environments pose challenges to accurate localization. The proposed two modules are plug-and-play, incorporating the proposed modules into existing models can potentially boost their performance in these scenarios.
Collapse
|
5
|
Chen Y, Yang J, Chen B, Du S, Hua G. Tolerating Annotation Displacement in Dense Object Counting via Point Annotation Probability Map. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:6359-6372. [PMID: 37971907 DOI: 10.1109/tip.2023.3331908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation displacement properly which arises from the human annotation process and may lead to different distributions. We conjecture that it would be beneficial to consider the annotation displacement in the dense object counting task. To obtain strong robustness against annotation displacement, generalized Gaussian distribution (GGD) function with a tunable bandwidth and shape parameter is exploited to form the learning target point annotation probability map, PAPM. Specifically, we first present a hand-designed PAPM method (HD-PAPM), in which we design a function based on GGD to tolerate the annotation displacement. For end-to-end training, the hand-designed PAPM may not be optimal for the particular network and dataset. An adaptively learned PAPM method (AL-PAPM) is proposed. To improve the robustness to annotation displacement, we design an effective transport cost function based on GGD. The proposed PAPM is capable of integration with other methods. We also combine PAPM with P2PNet through modifying the matching cost matrix, forming P2P-PAPM. This could also improve the robustness to annotation displacement of P2PNet. Extensive experiments show the superiority of our proposed methods.
Collapse
|
6
|
Vecchi JT, Mullan S, Lopez JA, Rhomberg M, Yamamoto A, Hallam A, Lee A, Sonka M, Hansen MR. Sensitivity of CNN image analysis to multifaceted measurements of neurite growth. BMC Bioinformatics 2023; 24:320. [PMID: 37620759 PMCID: PMC10464248 DOI: 10.1186/s12859-023-05444-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 08/11/2023] [Indexed: 08/26/2023] Open
Abstract
Quantitative analysis of neurite growth and morphology is essential for understanding the determinants of neural development and regeneration, however, it is complicated by the labor-intensive process of measuring diverse parameters of neurite outgrowth. Consequently, automated approaches have been developed to study neurite morphology in a high-throughput and comprehensive manner. These approaches include computer-automated algorithms known as 'convolutional neural networks' (CNNs)-powerful models capable of learning complex tasks without the biases of hand-crafted models. Nevertheless, their complexity often relegates them to functioning as 'black boxes.' Therefore, research in the field of explainable AI is imperative to comprehend the relationship between CNN image analysis output and predefined morphological parameters of neurite growth in order to assess the applicability of these machine learning approaches. In this study, drawing inspiration from the field of automated feature selection, we investigate the correlation between quantified metrics of neurite morphology and the image analysis results from NeuriteNet-a CNN developed to analyze neurite growth. NeuriteNet accurately distinguishes images of neurite growth based on different treatment groups within two separate experimental systems. These systems differentiate between neurons cultured on different substrate conditions and neurons subjected to drug treatment inhibiting neurite outgrowth. By examining the model's function and patterns of activation underlying its classification decisions, we discover that NeuriteNet focuses on aspects of neuron morphology that represent quantifiable metrics distinguishing these groups. Additionally, it incorporates factors that are not encompassed by neuron morphology tracing analyses. NeuriteNet presents a novel tool ideally suited for screening morphological differences in heterogeneous neuron groups while also providing impetus for targeted follow-up studies.
Collapse
Affiliation(s)
- Joseph T Vecchi
- Department of Molecular Physiology and Biophysics, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
- Department of Otolaryngology Head-Neck Surgery, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Sean Mullan
- Iowa Institute for Biomedical Imaging, Electrical and Computer Engineering, University of Iowa, Iowa City, IA, USA
| | - Josue A Lopez
- Department of Neuroscience, University of Texas-Austin, Austin, TX, USA
| | - Madeline Rhomberg
- Department of Otolaryngology Head-Neck Surgery, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | | | - Annabelle Hallam
- Department of Otolaryngology Head-Neck Surgery, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Amy Lee
- Department of Neuroscience, University of Texas-Austin, Austin, TX, USA
| | - Milan Sonka
- Iowa Institute for Biomedical Imaging, Electrical and Computer Engineering, University of Iowa, Iowa City, IA, USA
| | - Marlan R Hansen
- Department of Molecular Physiology and Biophysics, Carver College of Medicine, University of Iowa, Iowa City, IA, USA.
- Department of Otolaryngology Head-Neck Surgery, Carver College of Medicine, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
7
|
Wang J, Gao J, Yuan Y, Wang Q. Crowd Localization From Gaussian Mixture Scoped Knowledge and Scoped Teacher. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1802-1814. [PMID: 37028355 DOI: 10.1109/tip.2023.3251727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Crowd localization is to predict each instance head position in crowd scenarios. Since the distance of pedestrians being to the camera are variant, there exists tremendous gaps among scales of instances within an image, which is called the intrinsic scale shift. The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic. To this end, the paper concentrates on access to tackle the chaos of the scale distribution incurred by intrinsic scale shift. We propose Gaussian Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely, the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution and decouples the mixture model into sub-normal distributions to regularize the chaos within the sub-distributions. Then, an alignment is introduced to regularize the chaos among sub-distributions. However, despite that GMS is effective in regularizing the data distribution, it amounts to dislodging the hard samples in training set, which incurs overfitting. We assert that it is blamed on the block of transferring the latent knowledge exploited by GMS from data to model. Therefore, a Scoped Teacher playing a role of bridge in knowledge transform is proposed. What' s more, the consistency regularization is also introduced to implement knowledge transform. To that effect, the further constraints are deployed on Scoped Teacher to derive feature consistence between teacher and student end. With proposed GMS and Scoped Teacher implemented on four mainstream datasets of crowd localization, the extensive experiments demonstrate the superiority of our work. Moreover, comparing with existing crowd locators, our work achieves state-of-the-art via F1-measure comprehensively on four datasets.
Collapse
|
8
|
Zheng Z, Ni N, Xie G, Zhu A, Wu Y, Yang T. HARNet: Hierarchical adaptive regression with location recovery for crowd counting. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Li H, Liu L, Yang K, Liu S, Gao J, Zhao B, Zhang R, Hou J. Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6032-6047. [PMID: 36103439 DOI: 10.1109/tip.2022.3205210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighborhood attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism. Based on the multi-focus GNA, we develop a unified neural network called GNANet to accurately locate head centers in video clips by fully aggregating spatial-temporal information via a scene modeling module and a context cross-attention module. Moreover, to facilitate future researches in this field, we introduce a large-scale crowd video benchmark named VSCrowd (https://github.com/HopLee6/VSCrowd), which consists of 60K+ frames captured in various surveillance scenes and 2M+ head annotations. Finally, we conduct extensive experiments on three datasets including our VSCrowd, and the experiment results show that the proposed method is capable to achieve state-of-the-art performance for both video crowd localization and counting.
Collapse
|
10
|
Liang L, Zhao H, Zhou F, Zhang Q, Song Z, Shi Q. SC2Net: Scale-aware Crowd Counting Network with Pyramid Dilated Convolution. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03648-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
11
|
Zhou L, Wang P, Li W, Leng J, Lei B. Semantic-refined spatial pyramid network for crowd counting. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Scale and Background Aware Asymmetric Bilateral Network for Unconstrained Image Crowd Counting. MATHEMATICS 2022. [DOI: 10.3390/math10071053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This paper attacks the two challenging problems of image-based crowd counting, that is, scale variation and complex background. To that end, we present a novel crowd counting method, called the Scale and Background aware Asymmetric Bilateral Network (SBAB-Net), which is able to handle scale variation and background noise in a unified framework. Specifically, the proposed SBAB-Net contains three main components, a pre-trained backbone convolutional neural network (CNN) as the feature extractor and two asymmetric branches to generate a density map. These two asymmetric branches have different structures and use features from different semantic layers. One branch is densely connected stacked dilated convolution (DCSDC) sub-network with different dilation rates, which relies on one deep feature layer and can handle scale variation. The other branch is parameter-free densely connected stacked pooling (DCSP) sub-network with various pooling kernels and strides, which relies on shallow feature and can fuse features with several receptive fields to reduce the impact of background noise. Two sub-networks are fused by the attention mechanism to generate the final density map. Extensive experimental results on three widely-used benchmark datasets have demonstrated the effectiveness and superiority of our proposed method: (1) We achieve competitive counting performance compared to state-of-the-art methods; (2) Compared with baseline, the MAE and MSE are decreased by at least 6.3% and 11.3%, respectively.
Collapse
|
13
|
Zhu P, Peng T, Du D, Yu H, Zhang L, Hu Q. Graph Regularized Flow Attention Network for Video Animal Counting From Drones. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5339-5351. [PMID: 34048343 DOI: 10.1109/tip.2021.3082297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this paper, we propose a large-scale video based animal counting dataset collected by drones (AnimalDrone) for agriculture and wildlife protection. The dataset consists of two subsets, i.e., PartA captured on site by drones and PartB collected from the Internet, with rich annotations of more than 4 million objects in 53, 644 frames and corresponding attributes in terms of density, altitude and view. Moreover, we develop a new graph regularized flow attention network (GFAN) to perform density map estimation in dense crowds of video clips with arbitrary crowd density, perspective, and flight altitude. Specifically, our GFAN method leverages optical flow to warp the multi-scale feature maps in sequential frames to exploit the temporal relations, and then combines the enhanced features to predict the density maps. Moreover, we introduce the multi-granularity loss function including pixel-wise density loss and region-wise count loss to enforce the network to concentrate on discriminative features for different scales of objects. Meanwhile, the graph regularizer is imposed on the density maps of multiple consecutive frames to maintain temporal coherency. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method, compared with several state-of-the-art counting algorithms. The AnimalDrone dataset is available at https://github.com/VisDrone/AnimalDrone.
Collapse
|