1
|
Fan J, Chen M, Gu Z, Yang J, Wu H, Wu J. SSIM over MSE: A new perspective for video anomaly detection. Neural Netw 2025; 185:107115. [PMID: 39855001 DOI: 10.1016/j.neunet.2024.107115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 12/26/2024] [Accepted: 12/28/2024] [Indexed: 01/27/2025]
Abstract
Video anomaly detection plays a crucial role in ensuring public safety. Its goal is to detect abnormal patterns contained in video frames. Most existing models distinguish the anomalies based on the Mean Squared Error (MSE), which is hard to align with human perception, resulting in discrepancies between model-detected anomalies and those recognized by humans. Unlike the Human Visual System (HVS), those models are trained to prioritize texture over shape, which leads to poor model interpretability and limited performance. To address these limitations, we propose to optimize the video anomaly detection models from the perspective of human visual relevance. The optimization infrastructure includes a novel Structural Similarity Index (SSIM) based loss, a novel anomaly score calculation method based on SSIM, and a spatial-temporal enhancement block in 3D convolution (STE-3D). SSIM loss helps the model emphasize shape information in videos rather than texture. An anomaly score method based on SSIM evaluates video frames to align more closely with human visual perception. STE-3D improves the model's capacity to capture spatial-temporal features and compensates for the deficiency of the SSIM loss in capturing temporal features. STE-3D is lightweight in design and seamlessly integrated into existing video anomaly detection models based on 3D convolution. Extensive experiments and ablation studies were conducted in four challenging video anomaly detection benchmarks,i.e., UCSD Ped1, UCSD Ped2, CUHK Avenue, and ShanghaiTech. The experimental results validate the efficacy of the proposed approaches in improving video anomaly detection performance.
Collapse
Affiliation(s)
- Jin Fan
- Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China; Zhejiang Provincial Key Laboratory of Internet in Discrete Industries, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China; Research and Development Center of Transport Industry of New Generation of Artificial Intelligence Technology, Hangzhou, 310018, Zhejiang, China
| | - Miao Chen
- Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Zhangyu Gu
- Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Jiajun Yang
- Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China
| | - Huifeng Wu
- Department of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China; Zhejiang Provincial Key Laboratory of Internet in Discrete Industries, Hangzhou Dianzi University, Hangzhou, 310018, Zhejiang, China.
| | - Jia Wu
- Department of Computing, Macquarie University, Sydney, 4627345, New South Wales, Australia
| |
Collapse
|
2
|
Cao C, Zhang H, Lu Y, Wang P, Zhang Y. Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:224-239. [PMID: 39283792 DOI: 10.1109/tpami.2024.3461718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.
Collapse
|
3
|
Wu H, Yang J, Yuan MD, Li X. Heuristic Optimal Scheduling for Road Traffic Incident Detection Under Computational Constraints. SENSORS (BASEL, SWITZERLAND) 2024; 24:7221. [PMID: 39598998 PMCID: PMC11598636 DOI: 10.3390/s24227221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 10/31/2024] [Accepted: 11/07/2024] [Indexed: 11/29/2024]
Abstract
The intelligent monitoring of road surveillance videos is a crucial tool for detecting and predicting traffic anomalies, swiftly identifying road safety risks, rapidly addressing potential hazards, and preventing accidents or secondary incidents. With the vast number of surveillance cameras in operation, conducting traditional real-time video analysis across all cameras at once requires substantial computational resources. Alternatively, methods that employ periodic camera patrol analysis frequently overlook a significant number of anomalous traffic events, thereby hindering the effectiveness of traffic event detection. To overcome these challenges, this paper introduces a heuristic optimal scheduling approach designed to enhance traffic event detection efficiency while operating within limited computational resources. This method leverages historical data and prior knowledge to compute a weighted event feature value for each camera, providing a quantitative measure of its detection efficiency. To optimize resource allocation, a cyclic elimination mechanism is implemented to exclude low-performing cameras, enabling the dynamic reallocation of resources to higher-performing cameras, thereby enhancing overall detection performance. Finally, the effectiveness of the proposed method is validated through a case study conducted in a representative region of a major metropolitan city in China. The results revealed a substantial improvement in traffic event detection efficiency, with increases of 40%, 28%, 17%, and 28% across different time periods when compared to the pre-optimized state. Furthermore, the proposed method outperformed existing resource scheduling algorithms in terms of average load degree, load balance degree, and higher computational resource utilization. By avoiding the common issues of resource wastage and insufficiency often found in static allocation models, this approach offers greater flexibility and adaptability in computational resource scheduling, thereby effectively addressing the practical demands of traffic anomaly detection and early warning systems.
Collapse
Affiliation(s)
- Hao Wu
- The Smart City Research Institute of China Electronics Technology Group Corporation, Shenzhen 518038, China; (H.W.); (J.Y.)
- Guangdong Provincial Key Laboratory of Intelligent Urban Security Monitoring and Smart City Planning, Guangzhou 510200, China
| | - Jiahao Yang
- The Smart City Research Institute of China Electronics Technology Group Corporation, Shenzhen 518038, China; (H.W.); (J.Y.)
- Guangdong Provincial Key Laboratory of Intelligent Urban Security Monitoring and Smart City Planning, Guangzhou 510200, China
| | - Ming-Dong Yuan
- The Smart City Research Institute of China Electronics Technology Group Corporation, Shenzhen 518038, China; (H.W.); (J.Y.)
- Guangdong Provincial Key Laboratory of Intelligent Urban Security Monitoring and Smart City Planning, Guangzhou 510200, China
| | - Xin Li
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China;
| |
Collapse
|
4
|
Zaheer MZ, Mahmood A, Astrid M, Lee SI. Clustering Aided Weakly Supervised Training to Detect Anomalous Events in Surveillance Videos. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14085-14098. [PMID: 37235464 DOI: 10.1109/tnnls.2023.3274611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Formulating learning systems for the detection of real-world anomalous events using only video-level labels is a challenging task mainly due to the presence of noisy labels as well as the rare occurrence of anomalous events in the training data. We propose a weakly supervised anomaly detection system that has multiple contributions including a random batch selection mechanism to reduce interbatch correlation and a normalcy suppression block (NSB) which learns to minimize anomaly scores over normal regions of a video by utilizing the overall information available in a training batch. In addition, a clustering loss block (CLB) is proposed to mitigate the label noise and to improve the representation learning for the anomalous and normal regions. This block encourages the backbone network to produce two distinct feature clusters representing normal and anomalous events. An extensive analysis of the proposed approach is provided using three popular anomaly detection datasets including UCF-Crime, ShanghaiTech, and UCSD Ped2. The experiments demonstrate the superior anomaly detection capability of our approach.
Collapse
|
5
|
Liu T, Lam KM, Kong J. Distilling Privileged Knowledge for Anomalous Event Detection From Weakly Labeled Videos. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12627-12641. [PMID: 37037244 DOI: 10.1109/tnnls.2023.3263966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Weakly supervised video anomaly detection (WS-VAD) aims to identify the snippets involving anomalous events in long untrimmed videos, with solely text video-level binary labels. A typical paradigm among the existing text WS-VAD methods is to employ multiple modalities as inputs, e.g., RGB, optical flow, and audio, as they can provide sufficient discriminative clues that are robust to the diverse, complicated real-world scenes. However, such a pipeline has high reliance on the availability of multiple modalities and is computationally expensive and storage demanding in processing long sequences, which limits its use in some applications. To address this dilemma, we propose a privileged knowledge distillation (KD) framework dedicated to the WS-VAD task, which can maintain the benefits of exploiting additional modalities, while avoiding the need for using multimodal data in the inference phase. We argue that the performance of the privileged KD framework mainly depends on two factors: 1) the effectiveness of the multimodal teacher network and 2) the completeness of the useful information transfer. To obtain a reliable teacher network, we propose a text cross-modal interactive learning strategy and an anomaly normal discrimination loss, which target learning task-specific cross-modal features and encourage the separability of anomalous and normal representations, respectively. Furthermore, we design both representation- and text logits-level distillation loss functions, which force the unimodal student network to distill abundant privileged knowledge from the text well-trained multimodal teacher network, in a snippet-to-video fashion. Extensive experimental results on three public benchmarks demonstrate that the proposed privileged KD framework can train a lightweight yet effective detector, for localizing anomaly events under the supervision of video-level annotations.
Collapse
|
6
|
Li C, Li H, Zhang G. Cross-modality integration framework with prediction, perception and discrimination for video anomaly detection. Neural Netw 2024; 172:106138. [PMID: 38266473 DOI: 10.1016/j.neunet.2024.106138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/04/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024]
Abstract
Video anomaly detection is an important task for public security in the multimedia field. It aims to distinguish events that deviate from normal patterns. As important semantic representation, the textual information can effectively perceive different contents for anomaly detection. However, most existing methods primarily rely on visual modality, with limited incorporation of textual modality in anomaly detection. In this paper, a cross-modality integration framework (CIForAD) is proposed for anomaly detection, which combines both textual and visual modalities for prediction, perception and discrimination. Firstly, a feature fusion prediction (FUP) module is designed to predict the target regions by fusing the visual features and textual features for prompting, which can amplify the discriminative distance. Then an image-text semantic perception (ISP) module is developed to judge semantic consistency by associating the fine-grained visual features with textual features, where a strategy of local training and global inference is introduced to perceive local details and global semantic correlation. Finally, a self-supervised time attention discrimination (TAD) module is built to explore the inter-frame relation and further distinguish abnormal sequences from normal sequences. Extensive experiments on the three challenging benchmarks indicate that our CIForAD obtains state-of-the-art anomaly detection performance.
Collapse
Affiliation(s)
- Chaobo Li
- School of Information Science and Technology, Nantong University, Nantong 226019, China
| | - Hongjun Li
- School of Information Science and Technology, Nantong University, Nantong 226019, China.
| | - Guoan Zhang
- School of Information Science and Technology, Nantong University, Nantong 226019, China
| |
Collapse
|
7
|
Grcić M, Bevandić P, Kalafatić Z, Šegvić S. Dense Out-of-Distribution Detection by Robust Learning on Synthetic Negative Data. SENSORS (BASEL, SWITZERLAND) 2024; 24:1248. [PMID: 38400405 PMCID: PMC10892056 DOI: 10.3390/s24041248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 02/01/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024]
Abstract
Standard machine learning is unable to accommodate inputs which do not belong to the training distribution. The resulting models often give rise to confident incorrect predictions which may lead to devastating consequences. This problem is especially demanding in the context of dense prediction since input images may be only partially anomalous. Previous work has addressed dense out-of-distribution detection by discriminative training with respect to off-the-shelf negative datasets. However, real negative data may lead to over-optimistic evaluation due to possible overlap with test anomalies. To this end, we extend this approach by generating synthetic negative patches along the border of the inlier manifold. We leverage a jointly trained normalizing flow due to a coverage-oriented learning objective and the capability to generate samples at different resolutions. We detect anomalies according to a principled information-theoretic criterion which can be consistently applied through training and inference. The resulting models set the new state of the art on benchmarks for out-of-distribution detection in road-driving scenes and remote sensing imagery despite minimal computational overhead.
Collapse
Affiliation(s)
| | | | | | - Siniša Šegvić
- Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia; (M.G.); (P.B.); (Z.K.)
| |
Collapse
|
8
|
Lee J, Koo H, Kim S, Ko H. Cognitive Refined Augmentation for Video Anomaly Detection in Weak Supervision. SENSORS (BASEL, SWITZERLAND) 2023; 24:58. [PMID: 38202920 PMCID: PMC10781148 DOI: 10.3390/s24010058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 12/12/2023] [Accepted: 12/18/2023] [Indexed: 01/12/2024]
Abstract
Weakly supervised video anomaly detection is a methodology that assesses anomaly levels in individual frames based on labeled video data. Anomaly scores are computed by evaluating the deviation of distances derived from frames in an unbiased state. Weakly supervised video anomaly detection encounters the formidable challenge of false alarms, stemming from various sources, with a major contributor being the inadequate reflection of frame labels during the learning process. Multiple instance learning has been a pivotal solution to this issue in previous studies, necessitating the identification of discernible features between abnormal and normal segments. Simultaneously, it is imperative to identify shared biases within the feature space and cultivate a representative model. In this study, we introduce a novel multiple instance learning framework anchored on a memory unit, which augments features based on memory and effectively bridges the gap between normal and abnormal instances. This augmentation is facilitated through the integration of an multi-head attention feature augmentation module and loss function with a KL divergence and a Gaussian distribution estimation-based approach. The method identifies distinguishable features and secures the inter-instance distance, thus fortifying the distance metrics between abnormal and normal instances approximated by distribution. The contribution of this research involves proposing a novel framework based on MIL for performing WSVAD and presenting an efficient integration strategy during the augmentation process. Extensive experiments were conducted on benchmark datasets XD-Violence and UCF-Crime to substantiate the effectiveness of the proposed model.
Collapse
Affiliation(s)
- Junyeop Lee
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea;
| | - Hyunbon Koo
- Korea Institute of Civil Engineering and Building Technology, Goyang-si 10223, Republic of Korea
| | - Seongjun Kim
- Korea Institute of Civil Engineering and Building Technology, Goyang-si 10223, Republic of Korea
| | - Hanseok Ko
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea;
| |
Collapse
|
9
|
Kim J, Yoon S, Choi T, Sull S. Unsupervised Video Anomaly Detection Based on Similarity with Predefined Text Descriptions. SENSORS (BASEL, SWITZERLAND) 2023; 23:6256. [PMID: 37514551 PMCID: PMC10385872 DOI: 10.3390/s23146256] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 07/01/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
Research on video anomaly detection has mainly been based on video data. However, many real-world cases involve users who can conceive potential normal and abnormal situations within the anomaly detection domain. This domain knowledge can be conveniently expressed as text descriptions, such as "walking" or "people fighting", which can be easily obtained, customized for specific applications, and applied to unseen abnormal videos not included in the training dataset. We explore the potential of using these text descriptions with unlabeled video datasets. We use large language models to obtain text descriptions and leverage them to detect abnormal frames by calculating the cosine similarity between the input frame and text descriptions using the CLIP visual language model. To enhance the performance, we refined the CLIP-derived cosine similarity using an unlabeled dataset and the proposed text-conditional similarity, which is a similarity measure between two vectors based on additional learnable parameters and a triplet loss. The proposed method has a simple training and inference process that avoids the computationally intensive analyses of optical flow or multiple frames. The experimental results demonstrate that the proposed method outperforms unsupervised methods by showing 8% and 13% better AUC scores for the ShanghaiTech and UCFcrime datasets, respectively. Although the proposed method shows -6% and -5% than weakly supervised methods for those datasets, in abnormal videos, the proposed method shows 17% and 5% better AUC scores, which means that the proposed method shows comparable results with weakly supervised methods that require resource-intensive dataset labeling. These outcomes validate the potential of using text descriptions in unsupervised video anomaly detection.
Collapse
Affiliation(s)
- Jaehyun Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Seongwook Yoon
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Taehyeon Choi
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea
| | - Sanghoon Sull
- School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
10
|
Nazir A, Mitra R, Sulieman H, Kamalov F. Suspicious Behavior Detection with Temporal Feature Extraction and Time-Series Classification for Shoplifting Crime Prevention. SENSORS (BASEL, SWITZERLAND) 2023; 23:5811. [PMID: 37447661 PMCID: PMC10347130 DOI: 10.3390/s23135811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 07/15/2023]
Abstract
The rise in crime rates in many parts of the world, coupled with advancements in computer vision, has increased the need for automated crime detection services. To address this issue, we propose a new approach for detecting suspicious behavior as a means of preventing shoplifting. Existing methods are based on the use of convolutional neural networks that rely on extracting spatial features from pixel values. In contrast, our proposed method employs object detection based on YOLOv5 with Deep Sort to track people through a video, using the resulting bounding box coordinates as temporal features. The extracted temporal features are then modeled as a time-series classification problem. The proposed method was tested on the popular UCF Crime dataset, and benchmarked against the current state-of-the-art robust temporal feature magnitude (RTFM) method, which relies on the Inflated 3D ConvNet (I3D) preprocessing method. Our results demonstrate an impressive 8.45-fold increase in detection inference speed compared to the state-of-the-art RTFM, along with an F1 score of 92%,outperforming RTFM by 3%. Furthermore, our method achieved these results without requiring expensive data augmentation or image feature extraction.
Collapse
Affiliation(s)
- Amril Nazir
- College of Technological Innovation, Zayed University, Abu Dhabi, United Arab Emirates
| | - Rohan Mitra
- Department of Computer Science and Engineering, American University of Sharjah, Sharjah, United Arab Emirates;
| | - Hana Sulieman
- Department of Mathematics and Statistics, American University of Sharjah, Sharjah, United Arab Emirates;
| | - Firuz Kamalov
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| |
Collapse
|
11
|
Rathee M, Bačić B, Doborjeh M. Automated Road Defect and Anomaly Detection for Traffic Safety: A Systematic Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:5656. [PMID: 37420822 PMCID: PMC10305190 DOI: 10.3390/s23125656] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 07/09/2023]
Abstract
Recently, there has been a substantial increase in the development of sensor technology. As enabling factors, computer vision (CV) combined with sensor technology have made progress in applications intended to mitigate high rates of fatalities and the costs of traffic-related injuries. Although past surveys and applications of CV have focused on subareas of road hazards, there is yet to be one comprehensive and evidence-based systematic review that investigates CV applications for Automated Road Defect and Anomaly Detection (ARDAD). To present ARDAD's state-of-the-art, this systematic review is focused on determining the research gaps, challenges, and future implications from selected papers (N = 116) between 2000 and 2023, relying primarily on Scopus and Litmaps services. The survey presents a selection of artefacts, including the most popular open-access datasets (D = 18), research and technology trends that with reported performance can help accelerate the application of rapidly advancing sensor technology in ARDAD and CV. The produced survey artefacts can assist the scientific community in further improving traffic conditions and safety.
Collapse
Affiliation(s)
- Munish Rathee
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1142, New Zealand;
| | - Boris Bačić
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1142, New Zealand;
| | - Maryam Doborjeh
- School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland 1142, New Zealand;
- Knowledge Engineering and Discovery Research Innovation, Auckland University of Technology, Auckland 1142, New Zealand
| |
Collapse
|
12
|
Dilek E, Dener M. Computer Vision Applications in Intelligent Transportation Systems: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:2938. [PMID: 36991649 PMCID: PMC10051529 DOI: 10.3390/s23062938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/03/2023] [Accepted: 03/06/2023] [Indexed: 06/19/2023]
Abstract
As technology continues to develop, computer vision (CV) applications are becoming increasingly widespread in the intelligent transportation systems (ITS) context. These applications are developed to improve the efficiency of transportation systems, increase their level of intelligence, and enhance traffic safety. Advances in CV play an important role in solving problems in the fields of traffic monitoring and control, incident detection and management, road usage pricing, and road condition monitoring, among many others, by providing more effective methods. This survey examines CV applications in the literature, the machine learning and deep learning methods used in ITS applications, the applicability of computer vision applications in ITS contexts, the advantages these technologies offer and the difficulties they present, and future research areas and trends, with the goal of increasing the effectiveness, efficiency, and safety level of ITS. The present review, which brings together research from various sources, aims to show how computer vision techniques can help transportation systems to become smarter by presenting a holistic picture of the literature on different CV applications in the ITS context.
Collapse
|
13
|
Wang Y, Liu T, Zhou J, Guan J. Video Anomaly Detection Based on Spatio-Temporal Relationships among Objects. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
14
|
Zhang P, Lu Y. Research on Anomaly Detection of Surveillance Video Based on Branch-Fusion Net and CSAM. SENSORS (BASEL, SWITZERLAND) 2023; 23:1385. [PMID: 36772423 PMCID: PMC9919792 DOI: 10.3390/s23031385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/18/2023] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
As the monitor probes are used more and more widely these days, the task of detecting abnormal behaviors in surveillance videos has gained widespread attention. The generalization ability and parameter overhead of the model affect how accurate the detection result is. To deal with the poor generalization ability and high parameter overhead of the model in existing anomaly detection methods, we propose a three-dimensional multi-branch convolutional fusion network, named "Branch-Fusion Net". The network is designed with a multi-branch structure not only to significantly reduce parameter overhead but also to improve the generalization ability by understanding the input feature map from different perspectives. To ignore useless features during the model training, we propose a simple yet effective Channel Spatial Attention Module (CSAM), which sequentially focuses attention on key channels and spatial feature regions to suppress useless features and enhance important features. We combine the Branch-Fusion Net and the CSAM as a local feature extraction network and use the Bi-Directional Gated Recurrent Unit (Bi-GRU) to extract global feature information. The experiments are validated on a self-built Crimes-mini dataset, and the accuracy of anomaly detection in surveillance videos reaches 93.55% on the test set. The result shows that the model proposed in the paper significantly improves the accuracy of anomaly detection in surveillance videos with low parameter overhead.
Collapse
Affiliation(s)
| | - Yuanyao Lu
- School of Information Science and Technology, North China University of Technology, Beijing 100144, China
| |
Collapse
|
15
|
Anomaly detection in surveillance videos: a thematic taxonomy of deep models, review and performance analysis. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10258-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
16
|
Spectrum Anomaly Detection Based on Spatio-Temporal Network Prediction. ELECTRONICS 2022. [DOI: 10.3390/electronics11111770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
With the miniaturization of communication devices, the number of distributed electromagnetic devices is increasing. In order to achieve effective management of the electromagnetic spectrum, prediction and anomaly detection of the spectrum has become increasingly critical. This paper proposes an algorithmic framework for detecting spectrum anomalies using deep learning techniques. More specifically, the framework includes spectrum prediction and anomaly detection. We use the sliding window method to divide the time series, construct multi-timescale historical data, and train the model with normal data to have high accuracy spectrum prediction capability. We analyze and determine the discriminant function to distinguish the spectral anomalies by calculating the differences between the predicted and real data. The experimental results show that the proposed method outperforms existing baseline algorithms based on real-world spectrum measurement data and simulated anomaly data.
Collapse
|
17
|
Future frame prediction based on generative assistant discriminative network for anomaly detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03488-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|