1
|
Zhang H, Liang J, Zhang J, Zhang T, Lin Y, Wang Y. Attention-Driven Memory Network for Online Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17085-17098. [PMID: 37566501 DOI: 10.1109/tnnls.2023.3299412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/13/2023]
Abstract
A memory mechanism has attracted growing popularity in tracking tasks due to the ability of learning long-term-dependent information. However, it is very challenging for existing memory modules to provide the intrinsic attribute information of the target to the tracker in complex scenes. In this article, by considering the biological visual memory mechanisms, we propose the novel online tracking method via an attention-driven memory network, which can mine discriminative memory information and enhance the robustness and reliability of the tracker. First, to reinforce effectiveness of memory content, we design a novel attention-driven memory network. In the network, the long memory module gains property-level memory information by focusing on the state of the target at both the channel and spatial levels. Meanwhile, in reciprocity, we add a short-term memory module to maintain good adaptability when confronting drastic deformation of the target. The attention-driven memory network can adaptively adjust the contribution of short-term and long-term memories to tracking results under the weighted gradient harmonized loss. On this basis, to avoid model performance degradation, an online memory updater (MU) is further proposed. It is designed to mining for target information in tracking results through the Mixer layer and the online head network together. By evaluating the confidence of the tracking results, the memory updater can accurately judge the time of updating the model, which guarantees the effectiveness of online memory updates. Finally, the proposed method performs favorably and has been extensively validated on several benchmark datasets, including object tracking benchmark-50/100 (OTB-50/100), temple color-128 (TC-128), unmanned aerial vehicles-123 (UAV-123), generic object tracking -10k (GOT-10k), visual object tracking-2016 (VOT-2016), and VOT-2018 against several advanced methods.
Collapse
|
2
|
Huang B, Li J, Chen J, Wang G, Zhao J, Xu T. Anti-UAV410: A Thermal Infrared Benchmark and Customized Scheme for Tracking Drones in the Wild. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2852-2865. [PMID: 37991906 DOI: 10.1109/tpami.2023.3335338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
The perception of drones, also known as Unmanned Aerial Vehicles (UAVs), particularly in infrared videos, is crucial for effective anti-UAV tasks. However, existing datasets for UAV tracking have limitations in terms of target size and attribute distribution characteristics, which do not fully represent complex realistic scenes. To address this issue, we introduce a generalized infrared UAV tracking benchmark called Anti-UAV410. The benchmark comprises a total of 410 videos with over 438 K manually annotated bounding boxes. To tackle the challenges of UAV tracking in complex environments, we propose a novel method called Siamese drone tracker (SiamDT). SiamDT incorporates a dual-semantic feature extraction mechanism that explicitly models targets in dynamic background clutter, enabling effective tracking of small UAVs. The SiamDT method consists of three key steps: Dual-Semantic RPN Proposals (DS-RPN), Versatile R-CNN (VR-CNN), and Background Distractors Suppression. These steps are responsible for generating candidate proposals, refining prediction scores based on dual-semantic features, and enhancing the discriminative capacity of the trackers against dynamic background clutter, respectively. Extensive experiments conducted on the Anti-UAV410 dataset and three other large-scale benchmarks demonstrate the superior performance of the proposed SiamDT method compared to recent state-of-the-art trackers.
Collapse
|
3
|
Nkabiti KP, Chen Y. Device-Free Tracking through Self-Attention Mechanism and Unscented Kalman Filter with Commodity Wi-Fi. SENSORS (BASEL, SWITZERLAND) 2023; 23:5527. [PMID: 37420694 PMCID: PMC10304888 DOI: 10.3390/s23125527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 05/29/2023] [Accepted: 05/31/2023] [Indexed: 07/09/2023]
Abstract
Recent advancements in target tracking using Wi-Fi signals and channel state information (CSI) have significantly improved the accuracy and efficiency of tracking mobile targets. However, there remains a gap in developing a comprehensive approach that combines CSI, an unscented Kalman filter (UKF), and a sole self-attention mechanism to accurately estimate the position, velocity, and acceleration of targets in real-time. Furthermore, optimizing the computational efficiency of such approaches is necessary for their applicability in resource-constrained environments. To bridge this gap, this research study proposes a novel approach that addresses these challenges. The approach leverages CSI data collected from commodity Wi-Fi devices and incorporates a combination of the UKF and a sole self-attention mechanism. By fusing these elements, the proposed model provides instantaneous and precise estimates of the target's position while considering factors such as acceleration and network information. The effectiveness of the proposed approach is demonstrated through extensive experiments conducted in a controlled test bed environment. The results exhibit a remarkable tracking accuracy level of 97%, affirming the model's ability to successfully track mobile targets. The achieved accuracy showcases the potential of the proposed approach for applications in human-computer interactions, surveillance, and security.
Collapse
Affiliation(s)
- Kabo Poloko Nkabiti
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China;
- School of Computing and Information Systems, Botswana Accountancy College, Private Bag, Gaborone 00319, Botswana
| | - Yueyun Chen
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China;
| |
Collapse
|
4
|
Liu F, Liu J, Chen Q, Wang X, Liu C. SiamHAS: Siamese Tracker with Hierarchical Attention Strategy for Aerial Tracking. MICROMACHINES 2023; 14:893. [PMID: 37421126 DOI: 10.3390/mi14040893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/26/2023] [Accepted: 03/30/2023] [Indexed: 07/09/2023]
Abstract
For the Siamese network-based trackers utilizing modern deep feature extraction networks without taking full advantage of the different levels of features, tracking drift is prone to occur in aerial scenarios, such as target occlusion, scale variation, and low-resolution target tracking. Additionally, the accuracy is low in challenging scenarios of visual tracking, which is due to the imperfect utilization of features. To improve the performance of the existing Siamese tracker in the above-mentioned challenging scenes, we propose a Siamese tracker based on Transformer multi-level feature enhancement with a hierarchical attention strategy. The saliency of the extracted features is enhanced by the process of Transformer Multi-level Enhancement; the application of the hierarchical attention strategy makes the tracker adaptively notice the target region information and improve the tracking performance in challenging aerial scenarios. Meanwhile, we conducted extensive experiments and qualitative or quantitative discussions on UVA123, UAV20L, and OTB100 datasets. Finally, the experimental results show that our SiamHAS performs favorably against several state-of-the-art trackers in these challenging scenarios.
Collapse
Affiliation(s)
- Faxue Liu
- Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jinghong Liu
- Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiqi Chen
- Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xuan Wang
- Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China
| | - Chenglong Liu
- Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
5
|
Yang K, Zhang H, Zhou D, Dong L, Ma J. IASA: An IoU-aware tracker with adaptive sample assignment. Neural Netw 2023; 161:267-280. [PMID: 36774865 DOI: 10.1016/j.neunet.2023.01.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 12/25/2022] [Accepted: 01/24/2023] [Indexed: 02/05/2023]
Abstract
Most of existing trackers develop tracking in a tracking head network, which is composed of classification branch and regression branch. However, they lack a meaningful exploration of how to define positive and negative samples during training, which can significantly affect tracking performance. Furthermore, they cannot provide a reliable ranking by using classification scores or a combination of classification and regression scores to obtain candidate locations. To address these issues, we propose an intersection over union (IoU) aware tracker with adaptive sample assignment (IASA). The IASA introduces an IoU-aware classification score to achieve a more accurate ranking for candidate tracking locations. We also propose a new loss function, IoU-focal loss, to train the anchor-free tracker IASA to predict the classification scores and introduce a star-shaped box feature representation to refine classification features. To explore the actual content of the training samples, we develop an adaptive sample assignment (ASA) strategy to divide the positive and negative samples according to the statistical characteristics of the sample IoUs. By combining these two proposed components, the IASA tracker treats the tracking task as a classification and a regression problem. It directly finds the candidate tracking location in the classification branch and then regresses the four distances from the location to the four sides of the tracking box. Experimental results show that the proposed IASA can achieve state-of-the-art performance on seven public datasets.
Collapse
Affiliation(s)
- Kai Yang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| | - Haijun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China.
| | - Dongliang Zhou
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| | - Li Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| | - Jianghong Ma
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| |
Collapse
|
6
|
Fan B, Zhang H, Cong Y, Tang Y, Fan H, Tian J. Dual Aligned Siamese Dense Regression Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3630-3643. [PMID: 35576412 DOI: 10.1109/tip.2022.3166638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Anchor or anchor-free based Siamese trackers have achieved the astonishing advancement. However, their parallel regression and classification branches lack the tracked target information link and interaction, and the corresponding independent optimization maybe lead to task-misalignment, such as the reliable classification prediction with imprecisely localization and vice versa. To address this problem, we develop a general Siamese dense regression tracker (SDRT) with both task and feature alignments. It consists of two cooperative and mutual-guidance core branches: dense local regression with RepPoint representation, the global and local multi-classifier fusion with aligned features. They complement and boost each other to constrain the results with well-localized followed to also be well-classified. Specifically, a dense local regression with RepPoint representation, directly estimates and averages multiple dense local bounding box offsets for accurate localization. And then, the refined bounding boxes can be used to learn the global and local affine alignment features for reliable multi-classifier fusion. The classified scores in turn guide the assigned positive bounding boxes for the regression task. The mutual guidance operations can bridge the connection between classification and regression substantially, since the assigned labels of one task depend on the prediction quality of the other task. The proposed tracking module is general, and it can boost both the anchor or anchor-free based Siamese trackers to some extent. The extensive tracking comparisons on six tracking benchmarks verify its favorable and competitive performance over states-of-the-arts tracking modules.
Collapse
|
7
|
Learning Spatio-Temporal Attention Based Siamese Network for Tracking UAVs in the Wild. REMOTE SENSING 2022. [DOI: 10.3390/rs14081797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The popularity of unmanned aerial vehicles (UAVs) has made anti-UAV technology increasingly urgent. Object tracking, especially in thermal infrared videos, offers a promising solution to counter UAV intrusion. However, troublesome issues such as fast motion and tiny size make tracking infrared drone targets difficult and challenging. This work proposes a simple and effective spatio-temporal attention based Siamese method called SiamSTA, which performs reliable local searching and wide-range re-detection alternatively for robustly tracking drones in the wild. Concretely, SiamSTA builds a two-stage re-detection network to predict the target state using the template of first frame and the prediction results of previous frames. To tackle the challenge of small-scale UAV targets for long-range acquisition, SiamSTA imposes spatial and temporal constraints on generating candidate proposals within local neighborhoods to eliminate interference from background distractors. Complementarily, in case of target lost from local regions due to fast movement, a third stage re-detection module is introduced, which exploits valuable motion cues through a correlation filter based on change detection to re-capture targets from a global view. Finally, a state-aware switching mechanism is adopted to adaptively integrate local searching and global re-detection and take their complementary strengths for robust tracking. Extensive experiments on three anti-UAV datasets nicely demonstrate SiamSTA’s advantage over other competitors. Notably, SiamSTA is the foundation of the 1st-place winning entry in the 2nd Anti-UAV Challenge.
Collapse
|