1
|
Prasannakumar A, Mishra D. Deep Efficient Data Association for Multi-Object Tracking: Augmented with SSIM-Based Ambiguity Elimination. J Imaging 2024; 10:171. [PMID: 39057742 PMCID: PMC11277565 DOI: 10.3390/jimaging10070171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 07/06/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024] Open
Abstract
Recently, to address the multiple object tracking (MOT) problem, we harnessed the power of deep learning-based methods. The tracking-by-detection approach to multiple object tracking (MOT) involves two primary steps: object detection and data association. In the first step, objects of interest are detected in each frame of a video. The second step establishes the correspondence between these detected objects across different frames to track their trajectories. This paper proposes an efficient and unified data association method that utilizes a deep feature association network (deepFAN) to learn the associations. Additionally, the Structural Similarity Index Metric (SSIM) is employed to address uncertainties in the data association, complementing the deep feature association network. These combined association computations effectively link the current detections with the previous tracks, enhancing the overall tracking performance. To evaluate the efficiency of the proposed MOT framework, we conducted a comprehensive analysis of the popular MOT datasets, such as the MOT challenge and UA-DETRAC. The results showed that our technique performed substantially better than the current state-of-the-art methods in terms of standard MOT metrics.
Collapse
Affiliation(s)
- Aswathy Prasannakumar
- Department of Avionics, Indian Institute of Space Science and Technology, Trivandrum 695547, Kerala, India;
| | | |
Collapse
|
2
|
Diab MS, Elhosseini MA, El-Sayed MS, Ali HA. Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features. SENSORS 2021; 21:s21227604. [PMID: 34833680 PMCID: PMC8625767 DOI: 10.3390/s21227604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 11/10/2021] [Accepted: 11/11/2021] [Indexed: 11/16/2022]
Abstract
The human brain can effortlessly perform vision processes using the visual system, which helps solve multi-object tracking (MOT) problems. However, few algorithms simulate human strategies for solving MOT. Therefore, devising a method that simulates human activity in vision has become a good choice for improving MOT results, especially occlusion. Eight brain strategies have been studied from a cognitive perspective and imitated to build a novel algorithm. Two of these strategies gave our algorithm novel and outstanding results, rescuing saccades and stimulus attributes. First, rescue saccades were imitated by detecting the occlusion state in each frame, representing the critical situation that the human brain saccades toward. Then, stimulus attributes were mimicked by using semantic attributes to reidentify the person in these occlusion states. Our algorithm favourably performs on the MOT17 dataset compared to state-of-the-art trackers. In addition, we created a new dataset of 40,000 images, 190,000 annotations and 4 classes to train the detection model to detect occlusion and semantic attributes. The experimental results demonstrate that our new dataset achieves an outstanding performance on the scaled YOLOv4 detection model by achieving a 0.89 mAP 0.5.
Collapse
Affiliation(s)
- Mai S. Diab
- Faculty of Computer & Artificial Intelligence, Benha University, Benha 13511, Egypt;
- Intoolab Ltd., London WC2H 9JQ, UK
- Correspondence:
| | - Mostafa A. Elhosseini
- Computers Engineering and Control System, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt; (M.A.E.); (H.A.A.)
- College of Computer Science and Engineering in Yanbu, Taibah University, Madinah 46421, Saudi Arabia
| | - Mohamed S. El-Sayed
- Faculty of Computer & Artificial Intelligence, Benha University, Benha 13511, Egypt;
| | - Hesham A. Ali
- Computers Engineering and Control System, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt; (M.A.E.); (H.A.A.)
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 35511, Egypt
| |
Collapse
|
3
|
Pramanik A, Pal SK, Maiti J, Mitra P. Granulated RCNN and Multi-Class Deep SORT for Multi-Object Detection and Tracking. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2020.3041019] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
4
|
Sun S, Akhtar N, Song H, Mian A, Shah M. Deep Affinity Network for Multiple Object Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:104-119. [PMID: 31329110 DOI: 10.1109/tpami.2019.2929520] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multiple Object Tracking (MOT) plays an important role in solving many fundamental problems in video analysis and computer vision. Most MOT methods employ two steps: Object Detection and Data Association. The first step detects objects of interest in every frame of a video, and the second establishes correspondence between the detected objects in different frames to obtain their tracks. Object detection has made tremendous progress in the last few years due to deep learning. However, data association for tracking still relies on hand crafted constraints such as appearance, motion, spatial proximity, grouping etc. to compute affinities between the objects in different frames. In this paper, we harness the power of deep learning for data association in tracking by jointly modeling object appearances and their affinities between different frames in an end-to-end fashion. The proposed Deep Affinity Network (DAN) learns compact, yet comprehensive features of pre-detected objects at several levels of abstraction, and performs exhaustive pairing permutations of those features in any two frames to infer object affinities. DAN also accounts for multiple objects appearing and disappearing between video frames. We exploit the resulting efficient affinity computations to associate objects in the current frame deep into the previous frames for reliable on-line tracking. Our technique is evaluated on popular multiple object tracking challenges MOT15, MOT17 and UA-DETRAC. Comprehensive benchmarking under twelve evaluation metrics demonstrates that our approach is among the best performing techniques on the leader board for these challenges. The open source implementation of our work is available at https://github.com/shijieS/SST.git.
Collapse
|
5
|
Dubey SR, Chakraborty S, Roy SK, Mukherjee S, Singh SK, Chaudhuri BB. diffGrad: An Optimization Method for Convolutional Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4500-4511. [PMID: 31880565 DOI: 10.1109/tnnls.2019.2955777] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Stochastic gradient descent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal-sized steps for all parameters, irrespective of the gradient behavior. Hence, an efficient way of deep network optimization is to have adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp, and adaptive moment estimation (Adam). These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this article, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad optimization technique, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters. The convergence analysis is done using the regret bound approach of the online learning framework. In this article, thorough analysis is made over three synthetic complex nonconvex functions. The image categorization experiments are also conducted over the CIFAR10 and CIFAR100 data sets to observe the performance of diffGrad with respect to the state-of-the-art optimizers such as SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam. The residual unit (ResNet)-based convolutional neural network (CNN) architecture is used in the experiments. The experiments show that diffGrad outperforms other optimizers. Also, we show that diffGrad performs uniformly well for training CNN using different activation functions. The source code is made publicly available at https://github.com/shivram1987/diffGrad.
Collapse
|
6
|
|
7
|
Kamkar S, Ghezloo F, Moghaddam HA, Borji A, Lashgari R. Multiple-target tracking in human and machine vision. PLoS Comput Biol 2020; 16:e1007698. [PMID: 32271746 PMCID: PMC7144962 DOI: 10.1371/journal.pcbi.1007698] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Humans are able to track multiple objects at any given time in their daily activities—for example, we can drive a car while monitoring obstacles, pedestrians, and other vehicles. Several past studies have examined how humans track targets simultaneously and what underlying behavioral and neural mechanisms they use. At the same time, computer-vision researchers have proposed different algorithms to track multiple targets automatically. These algorithms are useful for video surveillance, team-sport analysis, video analysis, video summarization, and human–computer interaction. Although there are several efficient biologically inspired algorithms in artificial intelligence, the human multiple-target tracking (MTT) ability is rarely imitated in computer-vision algorithms. In this paper, we review MTT studies in neuroscience and biologically inspired MTT methods in computer vision and discuss the ways in which they can be seen as complementary. Multiple-target tracking (MTT) is a challenging task vital for both a human’s daily life and for many artificial intelligent systems, such as those used for urban traffic control. Neuroscientists are interested in discovering the underlying neural mechanisms that successfully exploit cognitive resources, e.g., spatial attention or memory, during MTT. Computer-vision specialists aim to develop powerful MTT algorithms based on advanced models or data-driven computational methods. In this paper, we review MTT studies from both communities and discuss how findings from cognitive studies can inspire developers to construct higher performing MTT algorithms. Moreover, some directions have been proposed through which MTT algorithms could raise new questions in the cognitive science domain, and answering them can shed light on neural processes underlying MTT.
Collapse
Affiliation(s)
- Shiva Kamkar
- Machine Vision and Medical Image Processing Laboratory, Faculty of Electrical and Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
- Brain Engineering Research Center, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Fatemeh Ghezloo
- Brain Engineering Research Center, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Hamid Abrishami Moghaddam
- Machine Vision and Medical Image Processing Laboratory, Faculty of Electrical and Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
- * E-mail: (RL); (HAM)
| | - Ali Borji
- HCL America, Manhattan, New York City, United States of America
| | - Reza Lashgari
- Brain Engineering Research Center, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- * E-mail: (RL); (HAM)
| |
Collapse
|
8
|
Lan X, Ye M, Zhang S, Zhou H, Yuen PC. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.10.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Li K, Kong Y, Fu Y. Visual Object Tracking via Multi-Stream Deep Similarity Learning Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:3311-3320. [PMID: 31869790 DOI: 10.1109/tip.2019.2959249] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual tracking remains a challenging research problem because of appearance variations of the object over time, changing cluttered background and requirement for real-time speed. In this paper, we investigate the problem of real-time accurate tracking in a instance-level tracking-by-verification mechanism. We propose a multi-stream deep similarity learning network to learn a similarity comparison model purely off-line. Our loss function encourages the distance between a positive patch and the background patches to be larger than that between the positive patch and the target template. Then, the learned model is directly used to determine the patch in each frame that is most distinctive to the background context and similar to the target template. Within the learned feature space, even if the distance between positive patches becomes large caused by the interference of background clutter, impact from hard distractors from the same class or the appearance change of the target, our method can still distinguish the target robustly using the relative distance. Besides, we also propose a complete framework considering the recovery from failures and the template updating to further improve the tracking performance without taking too much computing resource. Experiments on visual tracking benchmarks show the effectiveness of the proposed tracker when comparing with several recent real-time-speed trackers as well as trackers already included in the benchmarks.
Collapse
|
10
|
Zheng P, Zhao H, Zhan J, Yan Y, Ren J, Lv J, Huang Z. Incremental learning-based visual tracking with weighted discriminative dictionaries. INT J ADV ROBOT SYST 2019. [DOI: 10.1177/1729881419890155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Existing sparse representation-based visual tracking methods detect the target positions by minimizing the reconstruction error. However, due to complex background, illumination change, and occlusion problems, these methods are difficult to locate the target properly. In this article, we propose a novel visual tracking method based on weighted discriminative dictionaries and a pyramidal feature selection strategy. First, we utilize color features and texture features of the training samples to obtain multiple discriminative dictionaries. Then, we use the position information of those samples to assign weights to the base vectors in dictionaries. For robust visual tracking, we propose a pyramidal sparse feature selection strategy where the weights of base vectors and reconstruction errors in different feature are integrated together to get the best target regions. At the same time, we measure feature reliability to dynamically adjust the weights of different features. In addition, we introduce a scenario-aware mechanism and an incremental dictionary update method based on noise energy analysis. Comparison experiments show that the proposed algorithm outperforms several state-of-the-art methods, and useful quantitative and qualitative analyses are also carried out.
Collapse
Affiliation(s)
- Penggen Zheng
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| | - Huimin Zhao
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| | - Jin Zhan
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| | - Yijun Yan
- Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK
| | - Jinchang Ren
- Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK
| | - Jujian Lv
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| | - Zhihui Huang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
| |
Collapse
|
11
|
Abstract
Object tracking has always been an interesting and essential research topic in the domain of computer vision, of which the model update mechanism is an essential work, therefore the robustness of it has become a crucial factor influencing the quality of tracking of a sequence. This review analyses on recent tracking model update strategies, where target model update occasion is first discussed, then we give a detailed discussion on update strategies of the target model based on the mainstream tracking frameworks, and the background update frameworks are discussed afterwards. The experimental performances of the trackers in recent researches acting on specific sequences are listed in this review, where the superiority and some failure cases on each of them are discussed, and conclusions based on those performances are then drawn. It is a crucial point that design of a proper background model as well as its update strategy ought to be put into consideration. A cascade update of the template corresponding to each deep network layer based on the contributions of them to the target recognition can also help with more accurate target location, where target saliency information can be utilized as a tool for state estimation.
Collapse
|
12
|
Zhu G, Zhang Z, Wang J, Wu Y, Lu H. Dynamic Collaborative Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3035-3046. [PMID: 32175852 DOI: 10.1109/tnnls.2018.2861838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Correlation filter has been demonstrated remarkable success for visual tracking recently. However, most existing methods often face model drift caused by several factors, such as unlimited boundary effect, heavy occlusion, fast motion, and distracter perturbation. To address the issue, this paper proposes a unified dynamic collaborative tracking framework that can perform more flexible and robust position prediction. Specifically, the framework learns the object appearance model by jointly training the objective function with three components: target regression submodule, distracter suppression submodule, and maximum margin relation submodule. The first submodule mainly takes advantage of the circulant structure of training samples to obtain the distinguishing ability between the target and its surrounding background. The second submodule optimizes the label response of the possible distracting region close to zero for reducing the peak value of the confidence map in the distracting region. Inspired by the structure output support vector machines, the third submodule is introduced to utilize the differences between target appearance representation and distracter appearance representation in the discriminative mapping space for alleviating the disturbance of the most possible hard negative samples. In addition, a CUR filter as an assistant detector is embedded to provide effective object candidates for alleviating the model drift problem. Comprehensive experimental results show that the proposed approach achieves the state-of-the-art performance in several public benchmark data sets.
Collapse
|
13
|
Robust Visual Tracking Using Structural Patch Response Map Fusion Based on Complementary Correlation Filter and Color Histogram. SENSORS 2019; 19:s19194178. [PMID: 31561565 PMCID: PMC6806098 DOI: 10.3390/s19194178] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 09/18/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]
Abstract
A part-based strategy has been applied to visual tracking with demonstrated success in recent years. Different from most existing part-based methods that only employ one type of tracking representation model, in this paper, we propose an effective complementary tracker based on structural patch response fusion under correlation filter and color histogram models. The proposed method includes two component trackers with complementary merits to adaptively handle illumination variation and deformation. To identify and take full advantage of reliable patches, we present an adaptive hedge algorithm to hedge the responses of patches into a more credible one in each component tracker. In addition, we design different loss metrics of tracked patches in two components to be applied in the proposed hedge algorithm. Finally, we selectively combine the two component trackers at the response maps level with different merging factors according to the confidence of each component tracker. Extensive experimental evaluations on OTB2013, OTB2015, and VOT2016 datasets show outstanding performance of the proposed algorithm contrasted with some state-of-the-art trackers.
Collapse
|
14
|
Jiang Z, Crookes D, Green BD, Zhao Y, Ma H, Li L, Zhang S, Tao D, Zhou H. Context-Aware Mouse Behavior Recognition Using Hidden Markov Models. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:1133-1148. [PMID: 30307863 DOI: 10.1109/tip.2018.2875335] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Automated recognition of mouse behaviors is crucial in studying psychiatric and neurologic diseases. To achieve this objective, it is very important to analyze the temporal dynamics of mouse behaviors. In particular, the change between mouse neighboring actions is swift in a short period. In this paper, we develop and implement a novel hidden Markov model (HMM) algorithm to describe the temporal characteristics of mouse behaviors. In particular, we here propose a hybrid deep learning architecture, where the first unsupervised layer relies on an advanced spatial-temporal segment Fisher vector encoding both visual and contextual features. Subsequent supervised layers based on our segment aggregate network are trained to estimate the state-dependent observation probabilities of the HMM. The proposed architecture shows the ability to discriminate between visually similar behaviors and results in high recognition rates with the strength of processing imbalanced mouse behavior datasets. Finally, we evaluate our approach using JHuang's and our own datasets, and the results show that our method outperforms other state-of-the-art approaches.
Collapse
|
15
|
|
16
|
Zhou T, Liu F, Bhaskar H, Yang J. Robust Visual Tracking via Online Discriminative and Low-Rank Dictionary Learning. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:2643-2655. [PMID: 28920914 DOI: 10.1109/tcyb.2017.2747998] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we propose a novel and robust tracking framework based on online discriminative and low-rank dictionary learning. The primary aim of this paper is to obtain compact and low-rank dictionaries that can provide good discriminative representations of both target and background. We accomplish this by exploiting the recovery ability of low-rank matrices. That is if we assume that the data from the same class are linearly correlated, then the corresponding basis vectors learned from the training set of each class shall render the dictionary to become approximately low-rank. The proposed dictionary learning technique incorporates a reconstruction error that improves the reliability of classification. Also, a multiconstraint objective function is designed to enable active learning of a discriminative and robust dictionary. Further, an optimal solution is obtained by iteratively computing the dictionary, coefficients, and by simultaneously learning the classifier parameters. Finally, a simple yet effective likelihood function is implemented to estimate the optimal state of the target during tracking. Moreover, to make the dictionary adaptive to the variations of the target and background during tracking, an online update criterion is employed while learning the new dictionary. Experimental results on a publicly available benchmark dataset have demonstrated that the proposed tracking algorithm performs better than other state-of-the-art trackers.
Collapse
|
17
|
|
18
|
|
19
|
Yuen PC, Chellappa R. Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2022-2037. [PMID: 29989985 DOI: 10.1109/tip.2017.2777183] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
Collapse
|
20
|
Liu Q, Yang J, Zhang K, Wu Y. Adaptive Compressive Tracking via Online Vector Boosting Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4289-4301. [PMID: 27662696 DOI: 10.1109/tcyb.2016.2606512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Recently, the compressive tracking (CT) method has attracted much attention due to its high efficiency, but it cannot well deal with the large scale target appearance variations due to its data-independent random projection matrix that results in less discriminative features. To address this issue, in this paper, we propose an adaptive CT approach, which selects the most discriminative features to design an effective appearance model. Our method significantly improves CT in three aspects. First, the most discriminative features are selected via an online vector boosting method. Second, the object representation is updated in an effective online manner, which preserves the stable features while filtering out the noisy ones. Furthermore, a simple and effective trajectory rectification approach is adopted that can make the estimated location more accurate. Finally, a multiple scale adaptation mechanism is explored to estimate object size, which helps to relieve interference from background information. Extensive experiments on the CVPR2013 tracking benchmark and the VOT2014 challenges demonstrate the superior performance of our method.
Collapse
|