1
|
Arthanari S, Elayaperumal D, Joo YH. Learning temporal regularized spatial-aware deep correlation filter tracking via adaptive channel selection. Neural Netw 2025; 186:107210. [PMID: 39987711 DOI: 10.1016/j.neunet.2025.107210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 06/11/2024] [Accepted: 01/22/2025] [Indexed: 02/25/2025]
Abstract
In recent years, deep correlation filters have demonstrated outstanding performance in robust object tracking. Nevertheless, the correlation filters encounter challenges in managing huge occlusion, target deviation, and background clutter due to the lack of effective utilization of previous target information. To overcome these issues, we propose a novel temporal regularized spatial-aware deep correlation filter tracking via adaptive channel selection. To do this, we first presented the adaptive channel selection approach, which efficiently handles target deviation by adaptively selecting suitable channels during the learning stage. In addition, the adaptive channel selection method allows for dynamic adjustments to the filter based on the unique characteristics of the target object. This adaptability enhances the tracker's flexibility, making it well-suited for diverse tracking scenarios. Second, we propose the spatial-aware correlation filter with dynamic spatial constraints, which effectively reduces the filter response in the complex background region by distinguishing between the foreground and background regions in the response map. Hence, the target can be easily identified within the foreground region. Third, we designed a temporal regularization approach that improves the target accuracy when the case of large appearance variations. Additionally, this temporal regularization method considers the present and previous frames of the target region, which significantly enhances the tracking ability by utilizing historical information. Finally, we present a comprehensive experiments analysis of the OTB-2013, OTB-2015, TempleColor-128, UAV-123, UAVDT, and DTB-70 benchmark datasets to demonstrate the effectiveness of the proposed approach against the state-of-the-trackers.
Collapse
Affiliation(s)
- Sathiyamoorthi Arthanari
- School of IT Information and Control Engineering, Kunsan National University, 558 Daehak-ro, Gunsan-si, Jeonbuk 54150, Republic of Korea
| | - Dinesh Elayaperumal
- School of IT Information and Control Engineering, Kunsan National University, 558 Daehak-ro, Gunsan-si, Jeonbuk 54150, Republic of Korea
| | - Young Hoon Joo
- School of IT Information and Control Engineering, Kunsan National University, 558 Daehak-ro, Gunsan-si, Jeonbuk 54150, Republic of Korea.
| |
Collapse
|
2
|
Hu Y, Wang X, Gu Q. PWSNAS: Powering Weight Sharing NAS With General Search Space Shrinking Framework. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9171-9184. [PMID: 35316195 DOI: 10.1109/tnnls.2022.3156373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Neural architecture search (NAS) depends heavily on an efficient and accurate performance estimator. To speed up the evaluation process, recent advances, like differentiable architecture search (DARTS) and One-Shot approaches, instead of training every model from scratch, train a weight-sharing super-network to reuse parameters among different candidates, in which all child models can be efficiently evaluated. Though these methods significantly boost search efficiency, they inherently suffer from inaccurate and unstable performance estimation. To this end, we propose a general and effective framework for powering weight-sharing NAS, namely, PWSNAS, by shrinking search space automatically, i.e., candidate operators will be discarded if they are less important. With the strategy, our approach can provide a promising search space of a smaller size by progressively simplifying the original search space, which can reduce difficulties for existing NAS methods to find superior architectures. In particular, we present two strategies to guide the shrinking process: detect redundant operators with a new angle-based metric and decrease the degree of weight sharing of a super-network by increasing parameters, which differentiates PWSNAS from existing shrinking methods. Comprehensive analysis experiments on NASBench-201 verify the superiority of our proposed metric over existing accuracy-based and magnitude-based metrics. PWSNAS can easily apply to the state-of-the-art NAS methods, e.g., single path one-shot neural architecture search (SPOS), FairNAS, ProxylessNAS, DARTS, and progressive DARTS (PDARTS). We evaluate PWSNAS and demonstrate consistent performance gains over baseline methods.
Collapse
|
3
|
Huang B, Xu T, Li J, Luo F, Qin Q, Chen J. Learning Context Restrained Correlation Tracking Filters via Adversarial Negative Instance Generation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6132-6145. [PMID: 34941528 DOI: 10.1109/tnnls.2021.3133441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The tracking performance of discriminative correlation filters (DCFs) is often subject to unwanted boundary effects. Many attempts have already been made to address the above issue by enlarging searching regions over the last years. However, introducing excessive background information makes the discriminative filter prone to learn from the surrounding context rather than the target. In this article, we propose a novel context restrained correlation tracking filter (CRCTF) that can effectively suppress background interference via incorporating high-quality adversarial generative negative instances. Concretely, we first construct an adversarial context generation network to simulate the central target area with surrounding background information at the initial frame. Then, we suggest a coarse background estimation network to accelerate the background generation in subsequent frames. By introducing a suppression convolution term, we utilize generative background patches to reformulate the original ridge regression objective through circulant property of correlation and a cropping operator. Finally, our tracking filter is efficiently solved by the alternating direction method of multipliers (ADMM). CRCTF demonstrates the accuracy performance on par with several well-established and highly optimized baselines on multiple challenging tracking datasets, verifying the effectiveness of our proposed approach.
Collapse
|
4
|
Javed S, Mahmood A, Dias J, Seneviratne L, Werghi N. Hierarchical Spatiotemporal Graph Regularized Discriminative Correlation Filter for Visual Object Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12259-12274. [PMID: 34232902 DOI: 10.1109/tcyb.2021.3086194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visual object tracking is a fundamental and challenging task in many high-level vision and robotics applications. It is typically formulated by estimating the target appearance model between consecutive frames. Discriminative correlation filters (DCFs) and their variants have achieved promising speed and accuracy for visual tracking in many challenging scenarios. However, because of the unwanted boundary effects and lack of geometric constraints, these methods suffer from performance degradation. In the current work, we propose hierarchical spatiotemporal graph-regularized correlation filters for robust object tracking. The target sample is decomposed into a large number of deep channels, which are then used to construct a spatial graph such that each graph node corresponds to a particular target location across all channels. Such a graph effectively captures the spatial structure of the target object. In order to capture the temporal structure of the target object, the information in the deep channels obtained from a temporal window is compressed using the principal component analysis, and then, a temporal graph is constructed such that each graph node corresponds to a particular target location in the temporal dimension. Both spatial and temporal graphs span different subspaces such that the target and the background become linearly separable. The learned correlation filter is constrained to act as an eigenvector of the Laplacian of these spatiotemporal graphs. We propose a novel objective function that incorporates these spatiotemporal constraints into the DCFs framework. We solve the objective function using alternating direction methods of multipliers such that each subproblem has a closed-form solution. We evaluate our proposed algorithm on six challenging benchmark datasets and compare it with 33 existing state-of-the art trackers. Our results demonstrate an excellent performance of the proposed algorithm compared to the existing trackers.
Collapse
|
5
|
Wang X, Tang J, Luo B, Wang Y, Tian Y, Wu F. Tracking by Joint Local and Global Search: A Target-Aware Attention-Based Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6931-6945. [PMID: 34379596 DOI: 10.1109/tnnls.2021.3083933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Tracking-by-detection is a very popular framework for single-object tracking that attempts to search the target object within a local search window for each frame. Although such a local search mechanism works well on simple videos, however, it makes the trackers sensitive to extremely challenging scenarios, such as heavy occlusion and fast motion. In this article, we propose a novel and general target-aware attention mechanism (termed TANet) and integrate it with a tracking-by-detection framework to conduct joint local and global search for robust tracking. Specifically, we extract the features of the target object patch and continuous video frames; then, we concatenate and feed them into a decoder network to generate target-aware global attention maps. More importantly, we resort to adversarial training for better attention prediction. The appearance and motion discriminator networks are designed to ensure its consistency in spatial and temporal views. In the tracking procedure, we integrate target-aware attention with multiple trackers by exploring candidate search regions for robust tracking. Extensive experiments on both short- and long-term tracking benchmark datasets all validated the effectiveness of our algorithm.
Collapse
|
6
|
Bai L, Shao YH, Wang Z, Chen WJ, Deng NY. Multiple Flat Projections for Cross-Manifold Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:7704-7718. [PMID: 33523821 DOI: 10.1109/tcyb.2021.3050487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-manifold clustering is an extreme challenge learning problem. Since the low-density hypothesis is not satisfied in cross-manifold problems, many traditional clustering methods failed to discover the cross-manifold structures. In this article, we propose multiple flat projections clustering (MFPC) for cross-manifold clustering. In our MFPC, the given samples are projected into multiple localized flats to discover the global structures of implicit manifolds. Thus, the intersected clusters are distinguished in various projection flats. In MFPC, a series of nonconvex matrix optimization problems is solved by a proposed recursive algorithm. Furthermore, a nonlinear version of MFPC is extended via kernel tricks to deal with a more complex cross-manifold learning situation. The synthetic tests show that our MFPC works on the cross-manifold structures well. Moreover, experimental results on the benchmark datasets and object tracking videos show excellent performance of our MFPC compared with some state-of-the-art manifold clustering methods.
Collapse
|
7
|
Li S, Zhao S, Cheng B, Chen J. Noise-Aware Framework for Robust Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1179-1192. [PMID: 32520714 DOI: 10.1109/tcyb.2020.2996245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Both siamese network and correlation filter (CF)-based trackers have exhibited superior performance by formulating tracking as a similarity measure problem, where a similarity map is learned by the correlation between a target template and a region of interest (ROI) with a cosine window. Nevertheless, this window function is usually fixed for various targets and not changed, undergoing significant noise variations during tracking, which easily makes model drift. In this article, we focus on the study of a noise-aware (NA) framework for robust visual tracking. To this end, the impact of various window functions is first investigated in visual tracking. We identify that the low signal-to-noise ratio (SNR) of windowed ROIs makes the above trackers degenerate. At the prediction phase, a novel NA window customized for visual tracking is introduced to improve the SNR of windowed ROIs by adaptively suppressing the variable noise according to the observation of similarity maps. In addition, to further optimize the SNR of windowed pyramid ROIs for scale estimation, we propose to use the particle filter to dynamically sample several windowed ROIs with more favorable signals in temporal domains instead of this pyramid ROIs extracted in spatial domains. Extensive experiments on the popular OTB-2013, OTB-50, OTB-2015, VOT2017, TC128, UAV123, UAV123@10fps, UAV20L, and LaSOT datasets show that our NA framework can be extended to many siamese and CF trackers and our variants obtain superior performance than baseline trackers with a modest impact on efficiency.
Collapse
|
8
|
Liu R, Chen Q, Yao Y, Fan X, Luo Z. Location-Aware and Regularization-Adaptive Correlation Filters for Robust Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2430-2442. [PMID: 32749966 DOI: 10.1109/tnnls.2020.3005447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Correlation filter (CF) has recently been widely used for visual tracking. The estimation of the search window and the filter-learning strategies is the key component of the CF trackers. Nevertheless, prevalent CF models separately address these issues in heuristic manners. The commonly used CF models directly set the estimated location in the previous frame as the search center for the current one. Moreover, these models usually rely on simple and fixed regularization for filter learning, and thus, their performance is compromised by the search window size and optimization heuristics. To break these limits, this article proposes a location-aware and regularization-adaptive CF (LRCF) for robust visual tracking. LRCF establishes a novel bilevel optimization model to address simultaneously the location-estimation and filter-training problems. We prove that our bilevel formulation can successfully obtain a globally converged CF and the corresponding object location in a collaborative manner. Moreover, based on the LRCF framework, we design two trackers named LRCF-S and LRCF-SA and a series of comparisons to prove the flexibility and effectiveness of the LRCF framework. Extensive experiments on different challenging benchmark data sets demonstrate that our LRCF trackers perform favorably against the state-of-the-art methods in practice.
Collapse
|
9
|
Salamanca JJ. A universal, canonical dispersive ordering in metric spaces. J Stat Plan Inference 2021. [DOI: 10.1016/j.jspi.2020.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
10
|
Ge S, Zhang C, Li S, Zeng D, Tao D. Cascaded Correlation Refinement for Robust Deep Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1276-1288. [PMID: 32305944 DOI: 10.1109/tnnls.2020.2984256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent deep trackers have shown superior performance in visual tracking. In this article, we propose a cascaded correlation refinement approach to facilitate the robustness of deep tracking. The core idea is to address accurate target localization and reliable model update in a collaborative way. To this end, our approach cascades multiple stages of correlation refinement to progressively refine target localization. Thus, the localized object could be used to learn an accurate on-the-fly model for improving the reliability of model update. Meanwhile, we introduce an explicit measure to identify the tracking failure and then leverage a simple yet effective look-back scheme to adaptively incorporate the initial model and on-the-fly model to update the tracking model. As a result, the tracking model can be used to localize the target more accurately. Extensive experiments on OTB2013, OTB2015, VOT2016, VOT2018, UAV123, and GOT-10k demonstrate that the proposed tracker achieves the best robustness against the state of the arts.
Collapse
|
11
|
|
12
|
|
13
|
Walia GS, Ahuja H, Kumar A, Bansal N, Sharma K. Unified Graph-Based Multicue Feature Fusion for Robust Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2357-2368. [PMID: 31251204 DOI: 10.1109/tcyb.2019.2920289] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Visual tracking is a complex problem due to unconstrained appearance variations and a dynamic environment. The extraction of complementary information from the object environment via multiple features and adaption to the target's appearance variations are the key problems of this paper. To this end, we propose a robust object tracking framework based on the unified graph fusion (UGF) of multicue to adapt to the object's appearance. The proposed cross-diffusion of sparse and dense features not only suppresses the individual feature deficiencies but also extracts the complementary information from multicue. This iterative process builds robust unified features which are invariant to object deformations, fast motion, and occlusion. Robustness of the unified feature also enables the random forest classifier to precisely distinguish the foreground from the background, adding resilience to background clutter. In addition, we present a novel kernel-based adaptation strategy using outlier detection and a transductive reliability metric. The adaptation strategy updates the appearance model to accommodate variations in scale, illumination, and rotation. Both qualitative and quantitative analyses on benchmark video sequences from OTB-50, OTB-100, VOT2017/18, and UAV123 show that the proposed UGF tracker performs favorably against 18 other state-of-the-art trackers under various object tracking challenges.
Collapse
|
14
|
Online Semantic Subspace Learning with Siamese Network for UAV Tracking. REMOTE SENSING 2020. [DOI: 10.3390/rs12020325] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In urban environment monitoring, visual tracking on unmanned aerial vehicles (UAVs) can produce more applications owing to the inherent advantages, but it also brings new challenges for existing visual tracking approaches (such as complex background clutters, rotation, fast motion, small objects, and realtime issues due to camera motion and viewpoint changes). Based on the Siamese network, tracking can be conducted efficiently in recent UAV datasets. Unfortunately, the learned convolutional neural network (CNN) features are not discriminative when identifying the target from the background/clutter, In particular for the distractor, and cannot capture the appearance variations temporally. Additionally, occlusion and disappearance are also reasons for tracking failure. In this paper, a semantic subspace module is designed to be integrated into the Siamese network tracker to encode the local fine-grained details of the target for UAV tracking. More specifically, the target’s semantic subspace is learned online to adapt to the target in the temporal domain. Additionally, the pixel-wise response of the semantic subspace can be used to detect occlusion and disappearance of the target, and this enables reasonable updating to relieve model drifting. Substantial experiments conducted on challenging UAV benchmarks illustrate that the proposed method can obtain competitive results in both accuracy and efficiency when they are applied to UAV videos.
Collapse
|
15
|
Zhu G, Zhang Z, Wang J, Wu Y, Lu H. Dynamic Collaborative Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3035-3046. [PMID: 32175852 DOI: 10.1109/tnnls.2018.2861838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Correlation filter has been demonstrated remarkable success for visual tracking recently. However, most existing methods often face model drift caused by several factors, such as unlimited boundary effect, heavy occlusion, fast motion, and distracter perturbation. To address the issue, this paper proposes a unified dynamic collaborative tracking framework that can perform more flexible and robust position prediction. Specifically, the framework learns the object appearance model by jointly training the objective function with three components: target regression submodule, distracter suppression submodule, and maximum margin relation submodule. The first submodule mainly takes advantage of the circulant structure of training samples to obtain the distinguishing ability between the target and its surrounding background. The second submodule optimizes the label response of the possible distracting region close to zero for reducing the peak value of the confidence map in the distracting region. Inspired by the structure output support vector machines, the third submodule is introduced to utilize the differences between target appearance representation and distracter appearance representation in the discriminative mapping space for alleviating the disturbance of the most possible hard negative samples. In addition, a CUR filter as an assistant detector is embedded to provide effective object candidates for alleviating the model drift problem. Comprehensive experimental results show that the proposed approach achieves the state-of-the-art performance in several public benchmark data sets.
Collapse
|
16
|
Yang H, Huang Y, Xie Z. Improved Correlation Filter Tracking with Enhanced Features and Adaptive Kalman Filter. SENSORS (BASEL, SWITZERLAND) 2019; 19:s19071625. [PMID: 30987414 PMCID: PMC6479297 DOI: 10.3390/s19071625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 03/27/2019] [Accepted: 04/01/2019] [Indexed: 06/09/2023]
Abstract
In the field of visual tracking, discriminative correlation filter (DCF)-based trackers have made remarkable achievements with their high computational efficiency. The crucial challenge that still remains is how to construct qualified samples without boundary effects and redetect occluded targets. In this paper a feature-enhanced discriminative correlation filter (FEDCF) tracker is proposed, which utilizes the color statistical model to strengthen the texture features (like the histograms of oriented gradient of HOG) and uses the spatial-prior function to suppress the boundary effects. Then, improved correlation filters using the enhanced features are built, the optimal functions of which can be effectively solved by Gauss-Seidel iteration. In addition, the average peak-response difference (APRD) is proposed to reflect the degree of target-occlusion according to the target response, and an adaptive Kalman filter is established to support the target redetection. The proposed tracker achieved a success plot performance of 67.8% with 5.1 fps on the standard datasets OTB2013.
Collapse
Affiliation(s)
- Hao Yang
- Department of Arms and Control Engineering, Army Academy of Armored Forces, Beijing 100072, China.
| | - Yingqing Huang
- Department of Arms and Control Engineering, Army Academy of Armored Forces, Beijing 100072, China.
| | - Zhihong Xie
- Department of Arms and Control Engineering, Army Academy of Armored Forces, Beijing 100072, China.
| |
Collapse
|