1
|
Zhang B, Gao J, Yuan Y. Center-enhanced video captioning model with multimodal semantic alignment. Neural Netw 2024; 180:106744. [PMID: 39326191 DOI: 10.1016/j.neunet.2024.106744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 09/10/2024] [Accepted: 09/15/2024] [Indexed: 09/28/2024]
Abstract
Video captioning aims at automatically generating descriptive sentences based on the given video, establishing an association between the visual contents and textual languages, has attracted great attention and plays a significant role in many practical applications. Previous researches focus more on the aspect of caption generation, ignoring the alignment of multimodal feature and just simply concatenating them. Besides, video feature extraction is usually done in an off-line manner, which leads to the fact that the extracted feature may not adapted to the subsequent caption generation task. To improve the applicability of extracted features for downstream caption generation and to address the issue of multimodal semantic alignment fusion, we propose an end-to-end center-enhanced video captioning model with multimodal semantic alignment, which integrates feature extraction and caption generation task into a unified framework. In order to enhance the completeness of semantic features, we design a center enhancement strategy where the visual-textual deep joint semantic feature can be captured via incremental clustering, then the cluster centers can serve as the guidance for better caption generation. Moreover, we propose to promote the visual-textual multimodal alignment fusion by learning the visual and textual representation in a shared latent semantic space, so as to alleviate the multimodal misalignment problem. Experimental results on two popular datasets MSVD and MSR-VTT demonstrate that the proposed model could outperform the state-of-the-art methods, obtaining higher-quality caption results.
Collapse
Affiliation(s)
- Benhui Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China.
| | - Junyu Gao
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China; Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China.
| | - Yuan Yuan
- School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
2
|
Zhou X, Zhang P. FF-HPINet: A Flipped Feature and Hierarchical Position Information Extraction Network for Lane Detection. SENSORS (BASEL, SWITZERLAND) 2024; 24:3502. [PMID: 38894293 PMCID: PMC11174791 DOI: 10.3390/s24113502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 05/21/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024]
Abstract
Effective lane detection technology plays an important role in the current autonomous driving system. Although deep learning models, with their intricate network designs, have proven highly capable of detecting lanes, there persist key areas requiring attention. Firstly, the symmetry inherent in visuals captured by forward-facing automotive cameras is an underexploited resource. Secondly, the vast potential of position information remains untapped, which can undermine detection precision. In response to these challenges, we propose FF-HPINet, a novel approach for lane detection. We introduce the Flipped Feature Extraction module, which models pixel pairwise relationships between the flipped feature and the original feature. This module allows us to capture symmetrical features and obtain high-level semantic feature maps from different receptive fields. Additionally, we design the Hierarchical Position Information Extraction module to meticulously mine the position information of the lanes, vastly improving target identification accuracy. Furthermore, the Deformable Context Extraction module is proposed to distill vital foreground elements and contextual nuances from the surrounding environment, yielding focused and contextually apt feature representations. Our approach achieves excellent performance with the F1 score of 97.00% on the TuSimple dataset and 76.84% on the CULane dataset.
Collapse
Affiliation(s)
| | - Peng Zhang
- School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China;
| |
Collapse
|
3
|
Malligere Shivanna V, Guo JI. Object Detection, Recognition, and Tracking Algorithms for ADASs-A Study on Recent Trends. SENSORS (BASEL, SWITZERLAND) 2023; 24:249. [PMID: 38203111 PMCID: PMC10781282 DOI: 10.3390/s24010249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/13/2023] [Accepted: 12/20/2023] [Indexed: 01/12/2024]
Abstract
Advanced driver assistance systems (ADASs) are becoming increasingly common in modern-day vehicles, as they not only improve safety and reduce accidents but also aid in smoother and easier driving. ADASs rely on a variety of sensors such as cameras, radars, lidars, and a combination of sensors, to perceive their surroundings and identify and track objects on the road. The key components of ADASs are object detection, recognition, and tracking algorithms that allow vehicles to identify and track other objects on the road, such as other vehicles, pedestrians, cyclists, obstacles, traffic signs, traffic lights, etc. This information is then used to warn the driver of potential hazards or used by the ADAS itself to take corrective actions to avoid an accident. This paper provides a review of prominent state-of-the-art object detection, recognition, and tracking algorithms used in different functionalities of ADASs. The paper begins by introducing the history and fundamentals of ADASs followed by reviewing recent trends in various ADAS algorithms and their functionalities, along with the datasets employed. The paper concludes by discussing the future of object detection, recognition, and tracking algorithms for ADASs. The paper also discusses the need for more research on object detection, recognition, and tracking in challenging environments, such as those with low visibility or high traffic density.
Collapse
Grants
- 112-2218-E-A49-027- National Science and Technology Council (NSTC), Taiwan, R.O.C.
- 112-2218-E-002 -042 - National Science and Technology Council (NSTC), Taiwan, R.O.C.
- 111-2622-8-A49-023- National Science and Technology Council (NSTC), Taiwan, R.O.C.
- 111-2221-E-A49-126-MY3 National Science and Technology Council (NSTC), Taiwan, R.O.C.
- 111-2634-F-A49-013- National Science and Technology Council (NSTC), Taiwan, R.O.C.
- 110-2221-E-A49-145-MY3 National Science and Technology Council (NSTC), Taiwan, R.O.C.
Collapse
Affiliation(s)
- Vinay Malligere Shivanna
- Department of Electrical Engineering, Institute of Electronics, National Yang-Ming Chiao Tung University, Hsinchu City 30010, Taiwan;
| | - Jiun-In Guo
- Department of Electrical Engineering, Institute of Electronics, National Yang-Ming Chiao Tung University, Hsinchu City 30010, Taiwan;
- Pervasive Artificial Intelligence Research (PAIR) Labs, National Yang Ming Chiao Tung University, Hsinchu City 30010, Taiwan
- eNeural Technologies Inc., Hsinchu City 30010, Taiwan
| |
Collapse
|
4
|
Zhang J, Zhai W. Blind Attention Geometric Restraint Neural Network for Single Image Dynamic/Defocus Deblurring. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8404-8417. [PMID: 35235524 DOI: 10.1109/tnnls.2022.3151099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Based on the information loss analysis of the blur accumulation model, a novel single-image deblurring method is proposed. We apply the recurrent neural network architecture to capture the attention perception map and the generative adversarial network (GAN) architecture to yield the deblurring image. Considering that the attention mechanism has to make hard decisions about specific parts of the input image to be focused on since blurry regions are not given, we propose a new adaptive attention disentanglement model based on the variation blind source separation, which provides the global geometric restraint to reduce the large solution space, so that the generator can realistically restore details on blurry regions, and the discriminator can accurately assess the content consistency of the restored regions. Since we combine blind source separation, attention geometric restraint with GANs, we name the proposed method BAGdeblur. Extensive evaluations on quantitative and qualitative experiments show that the proposed method achieves the state-of-the-art performance on both synthetic datasets and real-world blurry images.
Collapse
|
5
|
Weng Q, Chen D, Chen Y, Zhao W, Jiao L. Wave interference network with a wave function for traffic sign recognition. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:19254-19269. [PMID: 38052599 DOI: 10.3934/mbe.2023851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
In this paper, we successfully combine convolution with a wave function to build an effective and efficient classifier for traffic signs, named the wave interference network (WiNet). In the WiNet, the feature map extracted by the convolutional filters is refined into many entities from an input image. Each entity is represented as a wave. We utilize Euler's formula to unfold the wave function. Based on the wave-like information representation, the model modulates the relationship between the entities and the fixed weights of convolution adaptively. Experiment results on the Chinese Traffic Sign Recognition Database (CTSRD) and the German Traffic Sign Recognition Benchmark (GTSRB) demonstrate that the performance of the presented model is better than some other models, such as ResMLP, ResNet50, PVT and ViT in the following aspects: 1) WiNet obtains the best accuracy rate with 99.80% on the CTSRD and recognizes all images exactly on the GTSRB; 2) WiNet gains better robustness on the dataset with different noises compared with other models; 3) WiNet has a good generalization on different datasets.
Collapse
Affiliation(s)
- Qiang Weng
- School of Transportation, Fujian University of Technology, Fuzhou 350118, China
| | - Dewang Chen
- School of Transportation, Fujian University of Technology, Fuzhou 350118, China
| | - Yuandong Chen
- School of Transportation, Fujian University of Technology, Fuzhou 350118, China
| | - Wendi Zhao
- School of Transportation, Fujian University of Technology, Fuzhou 350118, China
| | - Lin Jiao
- School of Transportation, Fujian University of Technology, Fuzhou 350118, China
| |
Collapse
|
6
|
Zhang X, Zhang Z. Research on a Traffic Sign Recognition Method under Small Sample Conditions. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115091. [PMID: 37299816 DOI: 10.3390/s23115091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/20/2023] [Accepted: 05/23/2023] [Indexed: 06/12/2023]
Abstract
Traffic signs are updated quickly, and there image acquisition and labeling work requires a lot of manpower and material resources, so it is difficult to provide a large number of training samples for high-precision recognition. Aiming at this problem, a traffic sign recognition method based on FSOD (few-shot object learning) is proposed. This method adjusts the backbone network of the original model and introduces dropout, which improves the detection accuracy and reduces the risk of overfitting. Secondly, an RPN (region proposal network) with improved attention mechanism is proposed to generate more accurate target candidate boxes by selectively enhancing some features. Finally, the FPN (feature pyramid network) is introduced for multi-scale feature extraction, and the feature map with higher semantic information but lower resolution is merged with the feature map with higher resolution but weaker semantic information, which further improves the detection accuracy. Compared with the baseline model, the improved algorithm improves the 5-way 3-shot and 5-way 5-shot tasks by 4.27% and 1.64%, respectively. We apply the model structure to the PASCAL VOC dataset. The results show that this method is superior to some current few-shot object detection algorithms.
Collapse
Affiliation(s)
- Xiao Zhang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
| | - Zhenyu Zhang
- Key Laboratory of Multilingual Information Technology in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830017, China
| |
Collapse
|
7
|
Zhou J, Tian X. MKL-SING: A data-driven approach of sign recognition for managing and improving public services. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
8
|
Dhawan K, R SP, R. K. N. Identification of traffic signs for advanced driving assistance systems in smart cities using deep learning. MULTIMEDIA TOOLS AND APPLICATIONS 2023; 82:1-16. [PMID: 37362733 PMCID: PMC9985491 DOI: 10.1007/s11042-023-14823-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 06/22/2022] [Accepted: 02/06/2023] [Indexed: 06/28/2023]
Abstract
The ability of Advanced Driving Assistance Systems (ADAS) is to identify and understand all objects around the vehicle under varying driving conditions and environmental factors is critical. Today's vehicles are equipped with advanced driving assistance systems that make driving safer and more comfortable. A camera mounted on the car helps the system recognise and detect traffic signs and alerts the driver about various road conditions, like if construction work is ahead or if speed limits have changed. The goal is to identify the traffic sign and process the image in a minimal processing time. A custom convolutional neural network model is used to classify the traffic signs with higher accuracy than the existing models. Image augmentation techniques are used to expand the dataset artificially, and that allows one to learn how the image looks from different perspectives, such as when viewed from different angles or when it looks blurry due to poor weather conditions. The algorithms used to detect traffic signs are YOLO v3 and YOLO v4-tiny. The proposed solution for detecting a specific set of traffic signs performed well, with an accuracy rate of 95.85%.
Collapse
Affiliation(s)
- Kshitij Dhawan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, 632014 Tamilnadu India
| | - Srinivasa Perumal R
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, 632014 Tamilnadu India
| | - Nadesh R. K.
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, 632014 Tamilnadu India
| |
Collapse
|
9
|
Wu X, Ma D, Qu X, Jiang X, Zeng D. Depth Dynamic Center Difference Convolutions for Monocular 3D Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
10
|
Ruan W, Ye M, Wu Y, Liu W, Chen J, Liang C, Li G, Lin CW. TICNet: A Target-Insight Correlation Network for Object Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12150-12162. [PMID: 34033563 DOI: 10.1109/tcyb.2021.3070677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, the correlation filter (CF) and Siamese network have become the two most popular frameworks in object tracking. Existing CF trackers, however, are limited by feature learning and context usage, making them sensitive to boundary effects. In contrast, Siamese trackers can easily suffer from the interference of semantic distractors. To address the above problems, we propose an end-to-end target-insight correlation network (TICNet) for object tracking, which aims at breaking the above limitations on top of a unified network. TICNet is an asymmetric dual-branch network involving a target-background awareness model (TBAM), a spatial-channel attention network (SCAN), and a distractor-aware filter (DAF) for end-to-end learning. Specifically, TBAM aims to distinguish a target from the background in the pixel level, yielding a target likelihood map based on color statistics to mine distractors for DAF learning. SCAN consists of a basic convolutional network, a channel-attention network, and a spatial-attention network, aiming to generate attentive weights to enhance the representation learning of the tracker. Especially, we formulate a differentiable DAF and employ it as a learnable layer in the network, thus helping suppress distracting regions in the background. During testing, DAF, together with TBAM, yields a response map for the final target estimation. Extensive experiments on seven benchmarks demonstrate that TICNet outperforms the state-of-the-art methods while running at real-time speed.
Collapse
|
11
|
Yeh CH, Lin CH, Kang LW, Huang CH, Lin MH, Chang CY, Wang CC. Lightweight Deep Neural Network for Joint Learning of Underwater Object Detection and Color Conversion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6129-6143. [PMID: 33900925 DOI: 10.1109/tnnls.2021.3072414] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Underwater image processing has been shown to exhibit significant potential for exploring underwater environments. It has been applied to a wide variety of fields, such as underwater terrain scanning and autonomous underwater vehicles (AUVs)-driven applications, such as image-based underwater object detection. However, underwater images often suffer from degeneration due to attenuation, color distortion, and noise from artificial lighting sources as well as the effects of possibly low-end optical imaging devices. Thus, object detection performance would be degraded accordingly. To tackle this problem, in this article, a lightweight deep underwater object detection network is proposed. The key is to present a deep model for jointly learning color conversion and object detection for underwater images. The image color conversion module aims at transforming color images to the corresponding grayscale images to solve the problem of underwater color absorption to enhance the object detection performance with lower computational complexity. The presented experimental results with our implementation on the Raspberry pi platform have justified the effectiveness of the proposed lightweight jointly learning model for underwater object detection compared with the state-of-the-art approaches.
Collapse
|
12
|
Zhao Z, sun B. Hyperspectral anomaly detection via memory‐augmented autoencoders. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2022. [DOI: 10.1049/cit2.12116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Zhe Zhao
- Faculty of Printing, Packaging Engineering and Digital Media Technology Xi'an University of Technology Xi'an China
| | - Bangyong sun
- Faculty of Printing, Packaging Engineering and Digital Media Technology Xi'an University of Technology Xi'an China
- Norwegian Colour and Visual Computing Laboratory Norwegian University of Science and Technology Gjovik Norway
| |
Collapse
|
13
|
Adaptive Attentional Network for Few-Shot Relational Learning of Knowledge Graphs. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Few-shot knowledge graph reasoning is a research focus in the field of knowledge graph reasoning. At present, in order to expand the application scope of knowledge graphs, a large number of researchers are devoted to the study of the multi-shot knowledge graph model. However, as far as we know, the knowledge graph contains a large number of missing relations and entities, and there are not many reference examples at the time of training. In this paper, our goal is to be able to infer the correct entity given a few training instances, or even only one training instance is available. Therefore, we propose an adaptive attentional network for few-shot relational learning of knowledge graphs, extracting knowledge based on traditional embedding methods, using the Transformer mechanism and hierarchical attention mechanism to obtain hidden attributes of entities, and then using a noise checker to filter out unreasonable candidate entities. Our model produces large performance improvements on the NELL-One dataset.
Collapse
|
14
|
Yang C, Chen M, Xiong Z, Yuan Y, Wang Q. CM-Net: Concentric Mask Based Arbitrary-Shaped Text Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2864-2877. [PMID: 35349439 DOI: 10.1109/tip.2022.3141844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently fast arbitrary-shaped text detection has become an attractive research topic. However, most existing methods are non-real-time, which may fall short in intelligent systems. Although a few real-time text methods are proposed, the detection accuracy is far behind non-real-time methods. To improve the detection accuracy and speed simultaneously, we propose a novel fast and accurate text detection framework, namely CM-Net, which is constructed based on a new text representation method and a multi-perspective feature (MPF) module. The former can fit arbitrary-shaped text contours by concentric mask (CM) in an efficient and robust way. The latter encourages the network to learn more CM-related discriminative features from multiple perspectives and brings no extra computational cost. Benefiting the advantages of CM and MPF, the proposed CM-Net only needs to predict one CM of the text instance to rebuild the text contour and achieves the best balance between detection accuracy and speed compared with previous works. Moreover, to ensure that multi-perspective features are effectively learned, the multi-factor constraints loss is proposed. Extensive experiments demonstrate the proposed CM is efficient and robust to fit arbitrary-shaped text instances, and also validate the effectiveness of MPF and constraints loss for discriminative text features recognition. Furthermore, experimental results show that the proposed CM-Net is superior to existing state-of-the-art (SOTA) real-time text detection methods in both detection speed and accuracy on MSRA-TD500, CTW1500, Total-Text, and ICDAR2015 datasets.
Collapse
|
15
|
Cui L, Lv P, Jiang X, Gao Z, Zhou B, Zhang L, Shao L, Xu M. Context-Aware Block Net for Small Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2300-2313. [PMID: 32721905 DOI: 10.1109/tcyb.2020.3004636] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
State-of-the-art object detectors usually progressively downsample the input image until it is represented by small feature maps, which loses the spatial information and compromises the representation of small objects. In this article, we propose a context-aware block net (CAB Net) to improve small object detection by building high-resolution and strong semantic feature maps. To internally enhance the representation capacity of feature maps with high spatial resolution, we delicately design the context-aware block (CAB). CAB exploits pyramidal dilated convolutions to incorporate multilevel contextual information without losing the original resolution of feature maps. Then, we assemble CAB to the end of the truncated backbone network (e.g., VGG16) with a relatively small downsampling factor (e.g., 8) and cast off all following layers. CAB Net can capture both basic visual patterns as well as semantical information of small objects, thus improving the performance of small object detection. Experiments conducted on the benchmark Tsinghua-Tencent 100K and the Airport dataset show that CAB Net outperforms other top-performing detectors by a large margin while keeping real-time speed, which demonstrates the effectiveness of CAB Net for small object detection.
Collapse
|
16
|
Towards the design of vision-based intelligent vehicle system: methodologies and challenges. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00713-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
de Santana Correia A, Colombini EL. Attention, please! A survey of neural attention models in deep learning. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10148-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
18
|
Research on Visual Question Answering Based on GAT Relational Reasoning. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10689-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
19
|
Xuan H, Luo L, Zhang Z, Yang J, Yan Y. Discriminative Cross-Modality Attention Network for Temporal Inconsistent Audio-Visual Event Localization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7878-7888. [PMID: 34478364 DOI: 10.1109/tip.2021.3106814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
It is theoretically insufficient to construct a complete set of semantics in the real world using single-modality data. As a typical application of multi-modality perception, the audio-visual event localization task aims to match audio and visual components to identify the simultaneous events of interest. Although some recent methods have been proposed to deal with this task, they cannot handle the practical situation of temporal inconsistency that is widespread in the audio-visual scene. Inspired by the human system which automatically filters out event-unrelated information when performing multi-modality perception, we propose a discriminative cross-modality attention network to simulate such a process. Similar to human mechanism, our network can adaptively select "where" to attend, "when" to attend and "which" to attend for audio-visual event localization. In addition, to prevent our network from getting trivial solutions, a novel eigenvalue-based objective function is proposed to train the whole network to better fuse audio and visual signals, which can obtain discriminative and nonlinear multi-modality representation. In this way, even with large temporal inconsistency between audio and visual sequence, our network is able to adaptively select event-valuable information for audio-visual event localization. Furthermore, we systemically investigate three subtasks of audio-visual event localization, i.e., temporal localization, weakly-supervised spatial localization and cross-modality localization. The visualization results also help us better understand how our network works.
Collapse
|
20
|
Liu Y, Gu YC, Zhang XY, Wang W, Cheng MM. Lightweight Salient Object Detection via Hierarchical Visual Perception Learning. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4439-4449. [PMID: 33284772 DOI: 10.1109/tcyb.2020.3035613] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recently, salient object detection (SOD) has witnessed vast progress with the rapid development of convolutional neural networks (CNNs). However, the improvement of SOD accuracy comes with the increase in network depth and width, resulting in large network size and heavy computational overhead. This prevents state-of-the-art SOD methods from being deployed into practical platforms, especially mobile devices. To promote the deployment of real-world SOD applications, we aim at developing a lightweight SOD model in this article. Our observation comes from that the primate visual system processes visual signals hierarchically with different receptive fields and eccentricities in different visual cortex areas. Inspired by this, we propose a hierarchical visual perception (HVP) module to imitate the primate visual cortex for hierarchical perception learning. With the HVP module incorporated, we design a lightweight SOD network, namely, HVPNet. Extensive experiments on popular benchmarks demonstrate that HVPNet achieves highly competitive accuracy compared with state-of-the-art SOD methods while running at 4.3 frames/s CPU speed and 333.2 frames/s GPU speed with only 1.23M parameters.
Collapse
|
21
|
Shen L, You L, Peng B, Zhang C. Group multi-scale attention pyramid network for traffic sign detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.083] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
22
|
Detection of Copy-Move Forgery in Digital Image Using Multi-scale, Multi-stage Deep Learning Model. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10620-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
23
|
Liu Y, Peng J, Xue JH, Chen Y, Fu ZH. TSingNet: Scale-aware and context-rich feature learning for traffic sign detection and recognition in the wild. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.049] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
24
|
Zhou H, Ren D, Xia H, Fan M, Yang X, Huang H. AST-GNN: An attention-based spatio-temporal graph neural network for Interaction-aware pedestrian trajectory prediction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
25
|
Gudžius P, Kurasova O, Darulis V, Filatovas E. Deep learning-based object recognition in multispectral satellite imagery for real-time applications. MACHINE VISION AND APPLICATIONS 2021; 32:98. [PMID: 34177121 PMCID: PMC8217787 DOI: 10.1007/s00138-021-01209-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 01/27/2021] [Accepted: 05/03/2021] [Indexed: 06/13/2023]
Abstract
Satellite imagery is changing the way we understand and predict economic activity in the world. Advancements in satellite hardware and low-cost rocket launches have enabled near-real-time, high-resolution images covering the entire Earth. It is too labour-intensive, time-consuming and expensive for human annotators to analyse petabytes of satellite imagery manually. Current computer vision research exploring this problem still lack accuracy and prediction speed, both significantly important metrics for latency-sensitive automatized industrial applications. Here we address both of these challenges by proposing a set of improvements to the object recognition model design, training and complexity regularisation, applicable to a range of neural networks. Furthermore, we propose a fully convolutional neural network (FCN) architecture optimised for accurate and accelerated object recognition in multispectral satellite imagery. We show that our FCN exceeds human-level performance with state-of-the-art 97.67% accuracy over multiple sensors, it is able to generalize across dispersed scenery and outperforms other proposed methods to date. Its computationally light architecture delivers a fivefold improvement in training time and a rapid prediction, essential to real-time applications. To illustrate practical model effectiveness, we analyse it in algorithmic trading environment. Additionally, we publish a proprietary annotated satellite imagery dataset for further development in this research field. Our findings can be readily implemented for other real-time applications too.
Collapse
Affiliation(s)
- Povilas Gudžius
- Institute of Data Science and Digital Technologies, Vilnius University, Akademijos street 4, 08412 Vilnius, Lithuania
| | - Olga Kurasova
- Institute of Data Science and Digital Technologies, Vilnius University, Akademijos street 4, 08412 Vilnius, Lithuania
| | - Vytenis Darulis
- Institute of Data Science and Digital Technologies, Vilnius University, Akademijos street 4, 08412 Vilnius, Lithuania
| | - Ernestas Filatovas
- Institute of Data Science and Digital Technologies, Vilnius University, Akademijos street 4, 08412 Vilnius, Lithuania
| |
Collapse
|
26
|
A New Real-Time Detection and Tracking Method in Videos for Small Target Traffic Signs. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11073061] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
It is a challenging task for self-driving vehicles in Real-World traffic scenarios to find a trade-off between the real-time performance and the high accuracy of the detection, recognition, and tracking in videos. This issue is addressed in this paper with an improved YOLOv3 (You Only Look Once) and a multi-object tracking algorithm (Deep-Sort). First, data augmentation is employed for small sample traffic signs to address the problem of an extremely unbalanced distribution of different samples in the dataset. Second, a new architecture of YOLOv3 is proposed to make it more suitable for detecting small targets. The detailed method is (1) removing the output feature map corresponding to the 32-times subsampling of the input image in the original YOLOv3 structure to reduce its computational costs and improve its real-time performances; (2) adding an output feature map of 4-times subsampling to improve its detection capability for the small traffic signs; (3) Deep-Sort is integrated into the detection method to improve the precision and robustness of multi-object detection, and the tracking ability in videos. Finally, our method demonstrated better detection capabilities, with respect to state-of-the-art approaches, which precision, recall and mAP is 91%, 90%, and 84.76% respectively.
Collapse
|
27
|
Liu F, Qian Y, Li H, Wang Y, Zhang H. CAFFNet: Channel Attention and Feature Fusion Network for Multi-target Traffic Sign Detection. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s021800142152008x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The fact that the existing traffic sign images are easily affected by external factors, and the traffic signs are generally small targets on the images at different scales, has made it difficult in feature extraction when doing traffic sign detection. To achieve better detection results, a multi-target traffic sign detection method with channel attention and feature fusion network (CAFFNet in short) is proposed. This method effectively learns the correlation between feature channels through a lightweight channel attention network, realizes local cross-channel interaction without dimensionality reduction, and enhances the representation ability of the network. The feature pyramid network is used to achieve feature fusion and generate high-resolution multiscale semantic information. The dilated convolution is utilized to capture the multiscale context information to narrow the difference between features and improve the detection effect of the model. The experimental results show that the proposed method on the two datasets GTSDB and CTSD has achieved superior performance in the evaluation criteria compared with the existing detection algorithms.
Collapse
Affiliation(s)
- Feng Liu
- Key Laboratory of Software Engineering, School of Software, Xinjiang University, Urumqi 830091, P. R. China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, School of Software, Xinjiang University, Urumqi 830091, P. R. China
| | - Yurong Qian
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, School of Software, Xinjiang University, Urumqi 830091, P. R. China
| | - Hua Li
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, School of Software, Xinjiang University, Urumqi 830091, P. R. China
| | - Yongqiang Wang
- School of Information Science & Engineering, Xinjiang University, Urumqi 830046, P. R. China
| | - Hao Zhang
- School of Software, Xinjiang University, Urumqi 83009, P. R. China
| |
Collapse
|
28
|
Xiong Z, Yuan Y, Wang Q. ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2722-2733. [PMID: 33502980 DOI: 10.1109/tip.2021.3053459] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this article, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve state-of-the-art performance on public RGB-D scene recognition datasets.
Collapse
|
29
|
Yuan C, Jiao S, Sun X, Wu QMJ. MFFFLD: A Multi-modal Feature Fusion Based Fingerprint Liveness Detection. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2021.3062624] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
30
|
|
31
|
Abstract
Due to changes in illumination, adverse weather conditions, and interference from signs similar to real traffic signs, the false detection of traffic signs is possible. Nevertheless, in order to improve the detection effect of small targets, baseline SSD (single shot multibox detector) adopts a multi-scale feature detection method to improve the detection effect to some extent. The detection effect of small targets is improved, but the number of calculations needed for the baseline SSD network is large. To this end, we propose a lightweight SSD network algorithm. This method uses some 1 × 1 convolution kernels to replace some of the 3 × 3 convolution kernels in the baseline network and deletes some convolutional layers to reduce the calculation load of the baseline SSD network. Then the color detection algorithm based on the phase difference method and the connected component calculation are used to further filter the detection results, and finally, the data enhancement strategy based on the image appearance transformation is used to improve the balance of the dataset. The experimental results show that the proposed method is 3% more accurate than the baseline SSD network, and more importantly, the detection speed is also increased by 1.2 times.
Collapse
|
32
|
|
33
|
An over-regression suppression method to discriminate occluded objects of same category. Pattern Anal Appl 2019. [DOI: 10.1007/s10044-019-00853-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
34
|
Cao J, Song C, Peng S, Xiao F, Song S. Improved Traffic Sign Detection and Recognition Algorithm for Intelligent Vehicles. SENSORS 2019; 19:s19184021. [PMID: 31540378 PMCID: PMC6767627 DOI: 10.3390/s19184021] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 09/06/2019] [Accepted: 09/16/2019] [Indexed: 11/16/2022]
Abstract
Traffic sign detection and recognition are crucial in the development of intelligent vehicles. An improved traffic sign detection and recognition algorithm for intelligent vehicles is proposed to address problems such as how easily affected traditional traffic sign detection is by the environment, and poor real-time performance of deep learning-based methodologies for traffic sign recognition. Firstly, the HSV color space is used for spatial threshold segmentation, and traffic signs are effectively detected based on the shape features. Secondly, the model is considerably improved on the basis of the classical LeNet-5 convolutional neural network model by using Gabor kernel as the initial convolutional kernel, adding the batch normalization processing after the pooling layer and selecting Adam method as the optimizer algorithm. Finally, the traffic sign classification and recognition experiments are conducted based on the German Traffic Sign Recognition Benchmark. The favorable prediction and accurate recognition of traffic signs are achieved through the continuous training and testing of the network model. Experimental results show that the accurate recognition rate of traffic signs reaches 99.75%, and the average processing time per frame is 5.4 ms. Compared with other algorithms, the proposed algorithm has remarkable accuracy and real-time performance, strong generalization ability and high training efficiency. The accurate recognition rate and average processing time are markedly improved. This improvement is of considerable importance to reduce the accident rate and enhance the road traffic safety situation, providing a strong technical guarantee for the steady development of intelligent vehicle driving assistance.
Collapse
Affiliation(s)
- Jingwei Cao
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China.
- College of Automotive Engineering, Jilin University, Changchun 130022, China.
| | - Chuanxue Song
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China.
- College of Automotive Engineering, Jilin University, Changchun 130022, China.
| | - Silun Peng
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China.
- College of Automotive Engineering, Jilin University, Changchun 130022, China.
| | - Feng Xiao
- State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China.
- College of Automotive Engineering, Jilin University, Changchun 130022, China.
| | - Shixin Song
- School of Mechanical and Aerospace Engineering, Jilin University, Changchun 130022, China.
| |
Collapse
|