1
|
Liu S, Bi Y, Li Q, Ren Y, Ji H, Wang L. A deep learning based detection algorithm for anomalous behavior and anomalous item on buses. Sci Rep 2025; 15:2163. [PMID: 39820021 PMCID: PMC11739372 DOI: 10.1038/s41598-025-85962-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 01/07/2025] [Indexed: 01/19/2025] Open
Abstract
This paper proposes a new strategy for analysing and detecting abnormal passenger behavior and abnormal objects on buses. First, a library of abnormal passenger behaviors and objects on buses is established. Then, a new mask detection and abnormal object detection and analysis (MD-AODA) algorithm is proposed. The algorithm is based on the deep learning YOLOv5 (You Only Look Once) algorithm with improvements. For onboard face mask detection, a strategy based on the combination of onboard face detection and target tracking is used. To detect abnormal objects in the vehicle, a geometric scale conversion-based approach for recognizing large-size ab-normal objects is adopted. To apply the algorithm effectively to real bus data, an embedded video analysis system is designed. The system incorporates the proposed method, which results in improved accuracy and timeliness in detecting anomalies compared to existing approaches. The algorithm's effectiveness and applicability is verified through comprehensive experiments using actual video bus data. The experimental results affirm the validity and practicality of the pro-posed algorithm.
Collapse
Affiliation(s)
- Shida Liu
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Yu Bi
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Qingyi Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China.
| | - Ye Ren
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Honghai Ji
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Li Wang
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| |
Collapse
|
2
|
Zhu Y, Fu X, Zhang Z, Liu A, Xiong Z, Zha ZJ. Hue Guidance Network for Single Image Reflection Removal. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13701-13712. [PMID: 37220051 DOI: 10.1109/tnnls.2023.3270938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Reflection from glasses is ubiquitous in daily life, but it is usually undesirable in photographs. To remove these unwanted noises, existing methods utilize either correlative auxiliary information or handcrafted priors to constrain this ill-posed problem. However, due to their limited capability to describe the properties of reflections, these methods are unable to handle strong and complex reflection scenes. In this article, we propose a hue guidance network (HGNet) with two branches for single image reflection removal (SIRR) by integrating image information and corresponding hue information. The complementarity between image information and hue information has not been noticed. The key to this idea is that we found that hue information can describe reflections well and thus can be used as a superior constraint for the specific SIRR task. Accordingly, the first branch extracts the salient reflection features by directly estimating the hue map. The second branch leverages these effective features, which can help locate salient reflection regions to obtain a high-quality restored image. Furthermore, we design a new cyclic hue loss to provide a more accurate optimization direction for the network training. Experiments substantiate the superiority of our network, especially its excellent generalization ability to various reflection scenes, as compared with state-of-the-arts both qualitatively and quantitatively. Source codes are available at https://github.com/zhuyr97/HGRR.
Collapse
|
3
|
Wang H, Xie Q, Zhao Q, Li Y, Liang Y, Zheng Y, Meng D. RCDNet: An Interpretable Rain Convolutional Dictionary Network for Single Image Deraining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8668-8682. [PMID: 37018568 DOI: 10.1109/tnnls.2022.3231453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
As common weather, rain streaks adversely degrade the image quality and tend to negatively affect the performance of outdoor computer vision systems. Hence, removing rains from an image has become an important issue in the field. To handle such an ill-posed single image deraining task, in this article, we specifically build a novel deep architecture, called rain convolutional dictionary network (RCDNet), which embeds the intrinsic priors of rain streaks and has clear interpretability. In specific, we first establish a rain convolutional dictionary (RCD) model for representing rain streaks and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. By unfolding it, we then build the RCDNet in which every network module has clear physical meanings and corresponds to each operation involved in the algorithm. This good interpretability greatly facilitates an easy visualization and analysis of what happens inside the network and why it works well in the inference process. Moreover, taking into account the domain gap issue in real scenarios, we further design a novel dynamic RCDNet, where the rain kernels can be dynamically inferred corresponding to input rainy images and then help shrink the space for rain layer estimation with few rain maps, so as to ensure a fine generalization performance in the inconsistent scenarios of rain types between training and testing data. By end-to-end training such an interpretable network, all involved rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers and, thus, naturally leading to better deraining performance. Comprehensive experiments implemented on a series of representative synthetic and real datasets substantiate the superiority of our method, especially on its well generality to diverse testing scenarios and good interpretability for all its modules, compared with state-of-the-art single image derainers both visually and quantitatively. Code is available at https://github.com/hongwang01/DRCDNet.
Collapse
|
4
|
Huang Z, Zhang J. Contrastive Unfolding Deraining Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5155-5169. [PMID: 36112550 DOI: 10.1109/tnnls.2022.3202724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Due to the fact that the degradation of image quality caused by rain usually affects outdoor vision tasks, image deraining becomes more and more important. Focusing on the single image deraining (SID) task, in this article, we propose a novel Contrastive Unfolding DEraining Network (CUDEN), which combines the traditional iterative algorithm and deep network, exhibiting excellent performance and nice interpretability. CUDEN transforms the challenge of locating rain streaks into discovering rain features and defines the relationship between the image and feature domains in terms of mapping pairs. To obtain the mapping pairs efficiently, we propose a dynamic multidomain translation (DMT) module for decomposing the original mapping into sub-mappings. To enhance the feature extraction capability of networks, we also propose a new serial multireceptive field fusion (SMF) block, which extracts complex and variable rain features with convolution kernels of different receptive fields. Moreover, we are the first to introduce contrastive learning to the SID task and combine it with perceptual loss to propose a new contrastive perceptual loss (CPL), which is quite generalized and greatly helpful in identifying the appropriate gradient descent direction during training. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed CUDEN outperforms the state-of-the-art (SOTA) deraining networks.
Collapse
|
5
|
Jiang TX, Zhao XL, Zhang H, Ng MK. Dictionary Learning With Low-Rank Coding Coefficients for Tensor Completion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:932-946. [PMID: 34464263 DOI: 10.1109/tnnls.2021.3104837] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we propose a novel tensor learning and coding model for third-order data completion. The aim of our model is to learn a data-adaptive dictionary from given observations and determine the coding coefficients of third-order tensor tubes. In the completion process, we minimize the low-rankness of each tensor slice containing the coding coefficients. By comparison with the traditional predefined transform basis, the advantages of the proposed model are that: 1) the dictionary can be learned based on the given data observations so that the basis can be more adaptively and accurately constructed and 2) the low-rankness of the coding coefficients can allow the linear combination of dictionary features more effectively. Also we develop a multiblock proximal alternating minimization algorithm for solving such tensor learning and coding model and show that the sequence generated by the algorithm can globally converge to a critical point. Extensive experimental results for real datasets such as videos, hyperspectral images, and traffic data are reported to demonstrate these advantages and show that the performance of the proposed tensor learning and coding method is significantly better than the other tensor completion methods in terms of several evaluation metrics.
Collapse
|
6
|
Yan T, Li M, Li B, Yang Y, Lau RWH. Rain Removal From Light Field Images With 4D Convolution and Multi-Scale Gaussian Process. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:921-936. [PMID: 37018668 DOI: 10.1109/tip.2023.3234692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plenoptic camera. LFIs are becoming popular in the computer vision and graphics communities. However, making full use of the abundant information available from LFIs, such as 2D array of sub-views and the disparity map of each sub-view, for effective rain removal is still a challenging problem. In this paper, we propose a novel method, 4D-MGP-SRRNet, for rain streak removal from LFIs. Our method takes as input all sub-views of a rainy LFI. To make full use of the LFI, it adopts 4D convolutional layers to simultaneously process all sub-views of the LFI. In the pipeline, the rain detection network, MGPDNet, with a novel Multi-scale Self-guided Gaussian Process (MSGP) module is proposed to detect high-resolution rain streaks from all sub-views of the input LFI at multi-scales. Semi-supervised learning is introduced for MSGP to accurately detect rain streaks by training on both virtual-world rainy LFIs and real-world rainy LFIs at multi-scales via computing pseudo ground truths for real-world rain streaks. We then feed all sub-views subtracting the predicted rain streaks into a 4D convolution-based Depth Estimation Residual Network (DERNet) to estimate the depth maps, which are later converted into fog maps. Finally, all sub-views concatenated with the corresponding rain streaks and fog maps are fed into a powerful rainy LFI restoring model based on the adversarial recurrent neural network to progressively eliminate rain streaks and recover the rain-free LFI. Extensive quantitative and qualitative evaluations conducted on both synthetic LFIs and real-world LFIs demonstrate the effectiveness of our proposed method.
Collapse
|
7
|
Li P, Jin J, Jin G, Fan L. Scale-Space Feature Recalibration Network for Single Image Deraining. SENSORS (BASEL, SWITZERLAND) 2022; 22:6823. [PMID: 36146173 PMCID: PMC9503391 DOI: 10.3390/s22186823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/06/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
Computer vision technology is increasingly being used in areas such as intelligent security and autonomous driving. Users need accurate and reliable visual information, but the images obtained under severe weather conditions are often disturbed by rainy weather, causing image scenes to look blurry. Many current single image deraining algorithms achieve good performance but have limitations in retaining detailed image information. In this paper, we design a Scale-space Feature Recalibration Network (SFR-Net) for single image deraining. The proposed network improves the image feature extraction and characterization capability of a Multi-scale Extraction Recalibration Block (MERB) using dilated convolution with different convolution kernel sizes, which results in rich multi-scale rain streaks features. In addition, we develop a Subspace Coordinated Attention Mechanism (SCAM) and embed it into MERB, which combines coordinated attention recalibration and a subspace attention mechanism to recalibrate the rain streaks feature information learned from the feature extraction phase and eliminate redundant feature information to enhance the transfer of important feature information. Meanwhile, the overall SFR-Net structure uses dense connection and cross-layer feature fusion to repeatedly utilize the feature maps, thus enhancing the understanding of the network and avoiding gradient disappearance. Through extensive experiments on synthetic and real datasets, the proposed method outperforms the recent state-of-the-art deraining algorithms in terms of both the rain removal effect and the preservation of image detail information.
Collapse
|
8
|
Abstract
Pansharpening is an important yet challenging remote sensing image processing task, which aims to reconstruct a high-resolution (HR) multispectral (MS) image by fusing a HR panchromatic (PAN) image and a low-resolution (LR) MS image. Though deep learning (DL)-based pansharpening methods have achieved encouraging performance, they are infeasible to fully utilize the deep semantic features and shallow contextual features in the process of feature fusion for a HR-PAN image and LR-MS image. In this paper, we propose an efficient full-depth feature fusion network (FDFNet) for remote sensing pansharpening. Specifically, we design three distinctive branches called PAN-branch, MS-branch, and fusion-branch, respectively. The features extracted from the PAN and MS branches will be progressively injected into the fusion branch at every different depth to make the information fusion more broad and comprehensive. With this structure, the low-level contextual features and high-level semantic features can be characterized and integrated adequately. Extensive experiments on reduced- and full-resolution datasets acquired from WorldView-3, QuickBird, and GaoFen-2 sensors demonstrate that the proposed FDFNet only with less than 100,000 parameters performs better than other detail injection-based proposals and several state-of-the-art approaches, both visually and quantitatively.
Collapse
|
9
|
Zhao XL, Yang JH, Ma TH, Jiang TX, Ng MK, Huang TZ. Tensor Completion via Complementary Global, Local, and Nonlocal Priors. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:984-999. [PMID: 34971534 DOI: 10.1109/tip.2021.3138325] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Completing missing entries in multidimensional visual data is a typical ill-posed problem that requires appropriate exploitation of prior information of the underlying data. Commonly used priors can be roughly categorized into three classes: global tensor low-rankness, local properties, and nonlocal self-similarity (NSS); most existing works utilize one or two of them to implement completion. Naturally, there arises an interesting question: can one concurrently make use of multiple priors in a unified way, such that they can collaborate with each other to achieve better performance? This work gives a positive answer by formulating a novel tensor completion framework which can simultaneously take advantage of the global-local-nonlocal priors. In the proposed framework, the tensor train (TT) rank is adopted to characterize the global correlation; meanwhile, two Plug-and-Play (PnP) denoisers, including a convolutional neural network (CNN) denoiser and the color block-matching and 3 D filtering (CBM3D) denoiser, are incorporated to preserve local details and exploit NSS, respectively. Then, we design a proximal alternating minimization algorithm to efficiently solve this model under the PnP framework. Under mild conditions, we establish the convergence guarantee of the proposed algorithm. Extensive experiments show that these priors organically benefit from each other to achieve state-of-the-art performance both quantitatively and qualitatively.
Collapse
|
10
|
Chen H, He X, Yang H, Qing L, Teng Q. A Feature-Enriched Deep Convolutional Neural Network for JPEG Image Compression Artifacts Reduction and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:430-444. [PMID: 34793307 DOI: 10.1109/tnnls.2021.3124370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The amount of multimedia data, such as images and videos, has been increasing rapidly with the development of various imaging devices and the Internet, bringing more stress and challenges to information storage and transmission. The redundancy in images can be reduced to decrease data size via lossy compression, such as the most widely used standard Joint Photographic Experts Group (JPEG). However, the decompressed images generally suffer from various artifacts (e.g., blocking, banding, ringing, and blurring) due to the loss of information, especially at high compression ratios. This article presents a feature-enriched deep convolutional neural network for compression artifacts reduction (FeCarNet, for short). Taking the dense network as the backbone, FeCarNet enriches features to gain valuable information via introducing multi-scale dilated convolutions, along with the efficient 1 ×1 convolution for lowering both parameter complexity and computation cost. Meanwhile, to make full use of different levels of features in FeCarNet, a fusion block that consists of attention-based channel recalibration and dimension reduction is developed for local and global feature fusion. Furthermore, short and long residual connections both in the feature and pixel domains are combined to build a multi-level residual structure, thereby benefiting the network training and performance. In addition, aiming at reducing computation complexity further, pixel-shuffle-based image downsampling and upsampling layers are, respectively, arranged at the head and tail of the FeCarNet, which also enlarges the receptive field of the whole network. Experimental results show the superiority of FeCarNet over state-of-the-art compression artifacts reduction approaches in terms of both restoration capacity and model complexity. The applications of FeCarNet on several computer vision tasks, including image deblurring, edge detection, image segmentation, and object detection, demonstrate the effectiveness of FeCarNet further.
Collapse
|
11
|
Wang JL, Huang TZ, Zhao XL, Jiang TX, Ng MK. Multi-Dimensional Visual Data Completion via Low-Rank Tensor Representation Under Coupled Transform. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3581-3596. [PMID: 33684037 DOI: 10.1109/tip.2021.3062995] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This paper addresses the tensor completion problem, which aims to recover missing information of multi-dimensional images. How to represent a low-rank structure embedded in the underlying data is the key issue in tensor completion. In this work, we suggest a novel low-rank tensor representation based on coupled transform, which fully exploits the spatial multi-scale nature and redundancy in spatial and spectral/temporal dimensions, leading to a better low tensor multi-rank approximation. More precisely, this representation is achieved by using two-dimensional framelet transform for the two spatial dimensions, one/two-dimensional Fourier transform for the temporal/spectral dimension, and then Karhunen-Loéve transform (via singular value decomposition) for the transformed tensor. Based on this low-rank tensor representation, we formulate a novel low-rank tensor completion model for recovering missing information in multi-dimensional visual data, which leads to a convex optimization problem. To tackle the proposed model, we develop the alternating directional method of multipliers (ADMM) algorithm tailored for the structured optimization problem. Numerical examples on color images, multispectral images, and videos illustrate that the proposed method outperforms many state-of-the-art methods in qualitative and quantitative aspects.
Collapse
|