1
|
Veluri RC, Khan S, Sankareswaran SP, Shabaz M, Farouk A, Innab N. Modified M‐RCNN approach for abandoned object detection in public places. EXPERT SYSTEMS 2025; 42. [DOI: 10.1111/exsy.13648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 05/18/2024] [Indexed: 12/09/2024]
Abstract
AbstractDetection of abandoned and stationary objects like luggage, boxes, machinery, and so forth, in public places is one of the challenging and critical tasks in the video surveillance system. These objects may contain weapons, bombs, or other explosive materials that threaten the public. Though various applications have been developed to detect stationary objects, different challenges, like occlusions, changes in geometrical features of things, and so forth, are still to be addressed. Considering the complexity of scenarios in public places and the variety of objects, a context‐aware model is developed based on mask region‐based convolution network (M‐RCNN) for detecting abandoned objects. A modified convolution operation is implemented in the Backbone network to understand features from geometric variations near objects. These modified operation layers can be adapted based on geometric interpretations to extract required features. Finally, a bounding box operation is performed to locate the abandoned object and mask the particular thing. Experiments have been performed on the benchmark dataset like ABODA and our dataset, which shows that an mAP of 0. 0.699 is achieved for model 1, 0.675 is achieved for model 2, and 0.734 mAP is completed for model 3. An ablation analysis has also been performed and compared with other state‐of‐the‐art methods. Based on the results, the proposed model better detects abandoned objects than existing state‐of‐the‐art methods.
Collapse
Affiliation(s)
| | - Shakir Khan
- College of Computer and Information Sciences Imam Mohammad Ibn Saud Islamic University (IMSIU) Riyadh Saudi Arabia
- University Centre for Research and Development Chandigarh University Mohali India
| | | | - Mohammad Shabaz
- Computer Science Engineering Model Institute of Engineering and Technology Jammu India
| | - Ahmed Farouk
- Department of Computer Science, Faculty of Computers and Artificial Intelligence South Valley University Hurghada Egypt
| | - Nisreen Innab
- Department of Computer Science and Information Systems, College of Applied Sciences AlMaarefa University Riyadh Saudi Arabia
| |
Collapse
|
2
|
Cao C, Zhang H, Lu Y, Wang P, Zhang Y. Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:224-239. [PMID: 39283792 DOI: 10.1109/tpami.2024.3461718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.
Collapse
|
3
|
Pu Y, Wu X, Yang L, Wang S. Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:4923-4936. [PMID: 39236124 DOI: 10.1109/tip.2024.3451935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution networks or self-attention mechanisms alongside multiple instance learning (MIL)-based classification loss to model temporal relations and learn discriminative features. However, these approaches are limited in two aspects: 1) Multi-branch parallel architectures, while capturing multi-scale temporal dependencies, inevitably lead to increased parameter and computational costs. 2) The binarized MIL constraint only ensures the interclass separability while neglecting the fine-grained discriminability within anomalous classes. To this end, we introduce a novel WS-VAD framework that focuses on efficient temporal modeling and anomaly innerclass discriminability. We first construct a Temporal Context Aggregation (TCA) module that simultaneously captures local-global dependencies by reusing an attention matrix along with adaptive context fusion. In addition, we propose a Prompt-Enhanced Learning (PEL) module that incorporates semantic priors using knowledge-based prompts to boost the discrimination of visual features while ensuring separability across anomaly subclasses. The proposed components have been validated through extensive experiments, which demonstrate superior performance on three challenging datasets, UCF-Crime, XD-Violence and ShanghaiTech, with fewer parameters and reduced computational effort. Notably, our method can significantly improve the detection accuracy for certain anomaly subclasses and reduced the false alarm rate. Our code is available at: https://github.com/yujiangpu20/PEL4VAD.
Collapse
|
4
|
Liu T, Lam KM, Kong J. Distilling Privileged Knowledge for Anomalous Event Detection From Weakly Labeled Videos. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12627-12641. [PMID: 37037244 DOI: 10.1109/tnnls.2023.3263966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Weakly supervised video anomaly detection (WS-VAD) aims to identify the snippets involving anomalous events in long untrimmed videos, with solely text video-level binary labels. A typical paradigm among the existing text WS-VAD methods is to employ multiple modalities as inputs, e.g., RGB, optical flow, and audio, as they can provide sufficient discriminative clues that are robust to the diverse, complicated real-world scenes. However, such a pipeline has high reliance on the availability of multiple modalities and is computationally expensive and storage demanding in processing long sequences, which limits its use in some applications. To address this dilemma, we propose a privileged knowledge distillation (KD) framework dedicated to the WS-VAD task, which can maintain the benefits of exploiting additional modalities, while avoiding the need for using multimodal data in the inference phase. We argue that the performance of the privileged KD framework mainly depends on two factors: 1) the effectiveness of the multimodal teacher network and 2) the completeness of the useful information transfer. To obtain a reliable teacher network, we propose a text cross-modal interactive learning strategy and an anomaly normal discrimination loss, which target learning task-specific cross-modal features and encourage the separability of anomalous and normal representations, respectively. Furthermore, we design both representation- and text logits-level distillation loss functions, which force the unimodal student network to distill abundant privileged knowledge from the text well-trained multimodal teacher network, in a snippet-to-video fashion. Extensive experimental results on three public benchmarks demonstrate that the proposed privileged KD framework can train a lightweight yet effective detector, for localizing anomaly events under the supervision of video-level annotations.
Collapse
|
5
|
Yu H, Zhang X, Wang Y, Huang Q, Yin B. Fine-Grained Accident Detection: Database and Algorithm. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:1059-1069. [PMID: 38265894 DOI: 10.1109/tip.2024.3355812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
This paper presents a novel fine-grained task for traffic accident analysis. Accident detection in surveillance or dashcam videos is a common task in the field of traffic accident analysis by using videos. However, common accident detection does not analyze the specific particulars of the accident, only identifies the accident's existence or occurrence time in a video. In this paper, we define the novel fine-grained accident detection task which contains fine-grained accident classification, temporal-spatial occurrence region localization, and accident severity estimation. A transformer-based framework combining the RGB and optical flow information of videos is proposed for fine-grained accident detection. Additionally, we introduce a challenging Fine-grained Accident Detection (FAD) database that covers multiple tasks in surveillance videos which places more emphasis on the overall perspective. Experimental results demonstrate that our model could effectively extract the video features for multiple tasks, indicating that current traffic accident analysis has limitations in dealing with the FAD task and that further research is indeed needed.
Collapse
|
6
|
Sun L, Wang Z, Zhang Y, Wang G. A Feature-Trajectory-Smoothed High-Speed Model for Video Anomaly Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:1612. [PMID: 36772652 PMCID: PMC9921103 DOI: 10.3390/s23031612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/20/2023] [Accepted: 01/27/2023] [Indexed: 06/18/2023]
Abstract
High-speed detection of abnormal frames in surveillance videos is essential for security. This paper proposes a new video anomaly-detection model, namely, feature trajectory-smoothed long short-term memory (FTS-LSTM). This model trains an LSTM autoencoder network to generate future frames on normal video streams, and uses the FTS detector and generation error (GE) detector to detect anomalies on testing video streams. FTS loss is a new indicator in the anomaly-detection area. In the training stage, the model applies a feature trajectory smoothness (FTS) loss to constrain the LSTM layer. This loss enables the LSTM layer to learn the temporal regularity of video streams more precisely. In the detection stage, the model utilizes the FTS loss and the GE loss as two detectors to detect anomalies. By cascading the FTS detector and the GE detector to detect anomalies, the model achieves a high speed and competitive anomaly-detection performance on multiple datasets.
Collapse
Affiliation(s)
- Li Sun
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
| | - Zhiguo Wang
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
| | - Yujin Zhang
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
| | - Guijin Wang
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
- Shanghai AI Laboratory, Shanghai 200232, China
| |
Collapse
|
7
|
Zhang P, Lu Y. Research on Anomaly Detection of Surveillance Video Based on Branch-Fusion Net and CSAM. SENSORS (BASEL, SWITZERLAND) 2023; 23:1385. [PMID: 36772423 PMCID: PMC9919792 DOI: 10.3390/s23031385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/18/2023] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
As the monitor probes are used more and more widely these days, the task of detecting abnormal behaviors in surveillance videos has gained widespread attention. The generalization ability and parameter overhead of the model affect how accurate the detection result is. To deal with the poor generalization ability and high parameter overhead of the model in existing anomaly detection methods, we propose a three-dimensional multi-branch convolutional fusion network, named "Branch-Fusion Net". The network is designed with a multi-branch structure not only to significantly reduce parameter overhead but also to improve the generalization ability by understanding the input feature map from different perspectives. To ignore useless features during the model training, we propose a simple yet effective Channel Spatial Attention Module (CSAM), which sequentially focuses attention on key channels and spatial feature regions to suppress useless features and enhance important features. We combine the Branch-Fusion Net and the CSAM as a local feature extraction network and use the Bi-Directional Gated Recurrent Unit (Bi-GRU) to extract global feature information. The experiments are validated on a self-built Crimes-mini dataset, and the accuracy of anomaly detection in surveillance videos reaches 93.55% on the test set. The result shows that the model proposed in the paper significantly improves the accuracy of anomaly detection in surveillance videos with low parameter overhead.
Collapse
Affiliation(s)
| | - Yuanyao Lu
- School of Information Science and Technology, North China University of Technology, Beijing 100144, China
| |
Collapse
|
8
|
Wang X, Che Z, Jiang B, Xiao N, Yang K, Tang J, Ye J, Wang J, Qi Q. Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:2301-2312. [PMID: 34086581 DOI: 10.1109/tnnls.2021.3083152] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Video anomaly detection is commonly used in many applications, such as security surveillance, and is very challenging. A majority of recent video anomaly detection approaches utilize deep reconstruction models, but their performance is often suboptimal because of insufficient reconstruction error differences between normal and abnormal video frames in practice. Meanwhile, frame prediction-based anomaly detection methods have shown promising performance. In this article, we propose a novel and robust unsupervised video anomaly detection method by frame prediction with a proper design which is more in line with the characteristics of surveillance videos. The proposed method is equipped with a multipath ConvGRU-based frame prediction network that can better handle semantically informative objects and areas of different scales and capture spatial-temporal dependencies in normal videos. A noise tolerance loss is introduced during training to mitigate the interference caused by background noise. Extensive experiments have been conducted on the CUHK Avenue, ShanghaiTech Campus, and UCSD Pedestrian datasets, and the results show that our proposed method outperforms existing state-of-the-art approaches. Remarkably, our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
Collapse
|
9
|
Bielak P, Kajdanowicz T, Chawla NV. AttrE2vec: Unsupervised attributed edge representation learning. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
10
|
Mu H, Sun R, Yuan G, Shi G. Positive unlabeled learning‐based anomaly detection in videos. INT J INTELL SYST 2021. [DOI: 10.1002/int.22437] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Huiyu Mu
- College of Information and Electrical Engineering China Agricultural University Beijing China
| | - Ruizhi Sun
- College of Information and Electrical Engineering China Agricultural University Beijing China
- Scientific Research Base for Integrated Technologies of Precision Agriculture (Animal Husbandry) Ministry of Agriculture Beijing China
| | - Gang Yuan
- College of Information and Electrical Engineering China Agricultural University Beijing China
| | - Guoqing Shi
- College of Information and Electrical Engineering China Agricultural University Beijing China
| |
Collapse
|
11
|
Rituraj, Tiwari A, Chaudhury S, Singh S, Saurav S. Video Classification using SlowFast Network via Fuzzy rule. 2021 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE) 2021. [DOI: 10.1109/fuzz45933.2021.9494542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
12
|
A One-Dimensional Non-Intrusive and Privacy-Preserving Identification System for Households. ELECTRONICS 2021. [DOI: 10.3390/electronics10050559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In many ambient-intelligence applications, including intelligent homes and cities, awareness of an inhabitant’s presence and identity is of great importance. Such an identification system should be non-intrusive and therefore seamless for the user, especially if our goal is ubiquitous and pervasive surveillance. However, due to privacy concerns and regulatory restrictions, such a system should also strive to preserve the user’s privacy as much as possible. In this paper, a novel identification system is presented based on a network of laser sensors, each attached on top of the room entry. Its sensor modality, a one-dimensional depth sensor, was chosen with privacy in mind. Each sensor is mounted on the top of a doorway, facing towards the entrance, at an angle. This position allows acquiring the user’s body shape while the user is crossing the doorway, and the classification is performed by classical machine learning methods. The system is non-intrusive, non-intrusive and preserves privacy—it omits specific user-sensitive information such as activity, facial expression or clothing. No video or audio data are required. The feasibility of such a system was tested on a nearly 4000-person, publicly available database of anthropometric measurements to analyze the relationships among accuracy, measured data and number of residents, while the evaluation of the system was conducted in a real-world scenario on 18 subjects. The evaluation was performed on a closed dataset with a 10-fold cross validation and showed 98.4% accuracy for all subjects. The accuracy for groups of five subjects averaged 99.1%. These results indicate that a network of one-dimensional depth sensors is suitable for the identification task with purposes such as surveillance and intelligent ambience.
Collapse
|
13
|
Du C, Yuan J, Dong J, Li L, Chen M, Li T. GPU based parallel optimization for real time panoramic video stitching. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2019.06.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
14
|
Mu N, Xu X, Zhang X. Finding autofocus region in low contrast surveillance images using CNN-based saliency algorithm. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.04.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
|
16
|
Zhang C, Lin Y, Zhu L, Liu A, Zhang Z, Huang F. CNN-VWII: An efficient approach for large-scale video retrieval by image queries. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.03.015] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|