1
|
Dian R, Li S, Fang L, Lu T, Bioucas-Dias JM. Nonlocal Sparse Tensor Factorization for Semiblind Hyperspectral and Multispectral Image Fusion. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4469-4480. [PMID: 31794410 DOI: 10.1109/tcyb.2019.2951572] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Combining a high-spatial-resolution multispectral image (HR-MSI) with a low-spatial-resolution hyperspectral image (LR-HSI) has become a common way to enhance the spatial resolution of the HSI. The existing state-of-the-art LR-HSI and HR-MSI fusion methods are mostly based on the matrix factorization, where the matrix data representation may be hard to fully make use of the inherent structures of 3-D HSI. We propose a nonlocal sparse tensor factorization approach, called the NLSTF_SMBF, for the semiblind fusion of HSI and MSI. The proposed method decomposes the HSI into smaller full-band patches (FBPs), which, in turn, are factored as dictionaries of the three HSI modes and a sparse core tensor. This decomposition allows to solve the fusion problem as estimating a sparse core tensor and three dictionaries for each FBP. Similar FBPs are clustered together, and they are assumed to share the same dictionaries to make use of the nonlocal self-similarities of the HSI. For each group, we learn the dictionaries from the observed HR-MSI and LR-HSI. The corresponding sparse core tensor of each FBP is computed via tensor sparse coding. Two distinctive features of NLSTF_SMBF are that: 1) it is blind with respect to the point spread function (PSF) of the hyperspectral sensor and 2) it copes with spatially variant PSFs. The experimental results provide the evidence of the advantages of the NLSTF_SMBF method over the existing state-of-the-art methods, namely, in semiblind scenarios.
Collapse
|
2
|
|
3
|
Fayez R, Taha MTAE, Gadallah M. Occluded Object Tracking System (OOTS). INTERNATIONAL JOURNAL OF SERVICE SCIENCE, MANAGEMENT, ENGINEERING, AND TECHNOLOGY 2020; 11:65-81. [DOI: 10.4018/ijssmet.2020070105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Visual object tracking remains a challenge facing an intelligent control system. A variety of applications serve many purposes such as surveillance. The developed technology faces plenty of obstacles that should be addressed including occlusion. In visual tracking, online learning techniques are most common due to their efficiency for most video sequences. Many object tracking techniques have emerged. However, the drifting problem in the case of noisy updates has been a stumbling block for the majority of relevant techniques. Such a problem can now be surmounted through updating the classifiers. The proposed system is called the Occluded Object Tracking System (OOTS) It is a hybrid system constructed from two algorithms: a fast technique Circulant Structure Kernels with Color Names (CSK-CN) and an efficient algorithm occlusion-aware Real-time Object Tracking (ROT). The proposed OOTS is evaluated with standard visual tracking benchmark databases. The experimental results proved that the proposed OOTS system is more reliable and provides efficient tracking results than other compared methods.
Collapse
Affiliation(s)
- Rawan Fayez
- Modern Academy for Computer Science and Management, Egypt
| | | | | |
Collapse
|
4
|
Shen J, Tang X, Dong X, Shao L. Visual Object Tracking by Hierarchical Attention Siamese Network. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3068-3080. [PMID: 31536029 DOI: 10.1109/tcyb.2019.2936503] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual tracking addresses the problem of localizing an arbitrary target in video according to the annotated bounding box. In this article, we present a novel tracking method by introducing the attention mechanism into the Siamese network to increase its matching discrimination. We propose a new way to compute attention weights to improve matching performance by a sub-Siamese network [Attention Net (A-Net)], which locates attentive parts for solving the searching problem. In addition, features in higher layers can preserve more semantic information while features in lower layers preserve more location information. Thus, in order to solve the tracking failure cases by the higher layer features, we fully utilize location and semantic information by multilevel features and propose a new way to fuse multiscale response maps from each layer to obtain a more accurate position estimation of the object. We further propose a hierarchical attention Siamese network by combining the attention weights and multilayer integration for tracking. Our method is implemented with a pretrained network which can outperform most well-trained Siamese trackers even without any fine-tuning and online updating. The comparison results with the state-of-the-art methods on popular tracking benchmarks show that our method achieves better performance. Our source code and results will be available at https://github.com/shenjianbing/HASN.
Collapse
|
5
|
Deng C, Han Y, Zhao B. High-Performance Visual Tracking With Extreme Learning Machine Framework. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2781-2792. [PMID: 30624237 DOI: 10.1109/tcyb.2018.2886580] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In real-time applications, a fast and robust visual tracker should generally have the following important properties: 1) feature representation of an object that is not only efficient but also has a good discriminative capability and 2) appearance modeling which can quickly adapt to the variations of foreground and backgrounds. However, most of the existing tracking algorithms cannot achieve satisfactory performance in both of the two aspects. To address this issue, in this paper, we advocate a novel and efficient visual tracker by exploiting the excellent feature learning and classification capabilities of an emerging learning technique, that is, extreme learning machine (ELM). The contributions of the proposed work are as follows: 1) motivated by the simplicity and learning ability of the ELM autoencoder (ELM-AE), an ELM-AE-based feature extraction model is presented, and this model can provide a compact and discriminative representation of the inputs efficiently and 2) due to the fast learning speed of an ELM classifier, an ELM-based appearance model is developed for feature classification, and is able to rapidly distinguish the object of interest from its surroundings. In addition, in order to cope with the visual changes of the target and its backgrounds, the online sequential ELM is used to incrementally update the appearance model. Plenty of experiments on challenging image sequences demonstrate the effectiveness and robustness of the proposed tracker.
Collapse
|
6
|
|
7
|
Du Y, Yan Y, Chen S, Hua Y. Object-adaptive LSTM network for real-time visual tracking with adversarial data augmentation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
8
|
Ruan W, Liang C, Yu Y, Chen J, Hu R. SIST: Online Scale-Adaptive Object tracking with Stepwise Insight. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
|
10
|
Du Y, Han G, Quan Y, Yu Z, Wong HS, Chen CLP, Zhang J. Exploiting Global Low-Rank Structure and Local Sparsity Nature for Tensor Completion. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3898-3910. [PMID: 30047919 DOI: 10.1109/tcyb.2018.2853122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In the era of data science, a huge amount of data has emerged in the form of tensors. In many applications, the collected tensor data are incomplete with missing entries, which affects the analysis process. In this paper, we investigate a new method for tensor completion, in which a low-rank tensor approximation is used to exploit the global structure of data, and sparse coding is used for elucidating the local patterns of data. Regarding the characterization of low-rank structures, a weighted nuclear norm for the tensor is introduced. Meanwhile, an orthogonal dictionary learning process is incorporated into sparse coding for more effective discovery of the local details of data. By simultaneously using the global patterns and local cues, the proposed method can effectively and efficiently recover the lost information of incomplete tensor data. The capability of the proposed method is demonstrated with several experiments on recovering MRI data and visual data, and the experimental results have shown the excellent performance of the proposed method in comparison with recent related methods.
Collapse
|
11
|
Xiao Y, Li J, Du B, Wu J, Li X, Chang J, Zhou Y. Robust correlation filter tracking with multi-scale spatial view. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
|
13
|
Liu Y, Shen J, Wang W, Sun H, Shao L. Better Dense Trajectories by Motion in Videos. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:159-170. [PMID: 29990074 DOI: 10.1109/tcyb.2017.2769097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Currently, the most widely used point trajectories generation methods estimate the trajectories from the dense optical flow, by using a consistency check strategy to detect the occluded regions. However, these methods will miss some important trajectories, thus resulting in breaking smooth areas without any structure especially around the motion boundaries (MBs). We suggest exploring MBs in video to generate more accurate dense point trajectories. Estimating MBs from the video improves the point trajectory accuracy of the discontinuity or occluded areas. Then, we obtain trajectories by tracking the initial feature points through all frames. The experimental results demonstrate that our method outperforms the state-of-the-art methods on the challenging benchmark.
Collapse
|
14
|
Xu G, Khan S, Zhu H, Han L, Ng MK, Yan H. Discriminative tracking via supervised tensor learning. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.05.108] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Yun X, Sun Y, Wang S, Shi Y, Lu N. Multi-layer convolutional network-based visual tracking via important region selection. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Ma B, Hu H, Shen J, Zhang Y, Shao L, Porikli F. Robust Object Tracking by Nonlinear Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4769-4781. [PMID: 29990266 DOI: 10.1109/tnnls.2017.2776124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose a method that obtains a discriminative visual dictionary and a nonlinear classifier for visual tracking tasks in a sparse coding manner based on the globally linear approximation for a nonlinear learning theory. Traditional discriminative tracking methods based on sparse representation learn a dictionary in an unsupervised way and then train a classifier, which may not generate both descriptive and discriminative models for targets by treating dictionary learning and classifier learning separately. In contrast, the proposed tracking approach can construct a dictionary that fully reflects the intrinsic manifold structure of visual data and introduces more discriminative ability in a unified learning framework. Finally, an iterative optimization approach, which computes the optimal dictionary, the associated sparse coding, and a classifier, is introduced. Experiments on two benchmarks show that our tracker achieves a better performance compared with some popular tracking algorithms.
Collapse
|
17
|
Hu H, Ma B, Shen J, Shao L. Manifold Regularized Correlation Object Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1786-1795. [PMID: 28422697 DOI: 10.1109/tnnls.2017.2688448] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we propose a manifold regularized correlation tracking method with augmented samples. To make better use of the unlabeled data and the manifold structure of the sample space, a manifold regularization-based correlation filter is introduced, which aims to assign similar labels to neighbor samples. Meanwhile, the regression model is learned by exploiting the block-circulant structure of matrices resulting from the augmented translated samples over multiple base samples cropped from both target and nontarget regions. Thus, the final classifier in our method is trained with positive, negative, and unlabeled base samples, which is a semisupervised learning framework. A block optimization strategy is further introduced to learn a manifold regularization-based correlation filter for efficient online tracking. Experiments on two public tracking data sets demonstrate the superior performance of our tracker compared with the state-of-the-art tracking approaches.
Collapse
|
18
|
Yuen PC, Chellappa R. Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:2022-2037. [PMID: 29989985 DOI: 10.1109/tip.2017.2777183] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
Collapse
|
19
|
Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D. Multiobject Tracking by Submodular Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2018; 49:1990-2001. [PMID: 29994594 DOI: 10.1109/tcyb.2018.2803217] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose a new multiobject visual tracking algorithm by submodular optimization. The proposed algorithm is composed of two main stages. At the first stage, a new selecting strategy of tracklets is proposed to cope with occlusion problem. We generate low-level tracklets using overlap criteria and min-cost flow, respectively, and then integrate them into a candidate tracklets set. In the second stage, we formulate the multiobject tracking problem as the submodular maximization problem subject to related constraints. The submodular function selects the correct tracklets from the candidate set of tracklets to form the object trajectory. Then, we design a connecting process which connects the corresponding trajectories to overcome the occlusion problem. Experimental results demonstrate the effectiveness of our tracking algorithm. Our source code is available at https://github.com/shenjianbing/submodulartrack.
Collapse
|
20
|
Xue W, Feng Z, Xu C, Liu T, Meng Z, Zhang C. Visual tracking via improving motion model and model updater. INT J ADV ROBOT SYST 2018. [DOI: 10.1177/1729881418756238] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motion model and model updater are two necessary components for online visual tracking. On the one hand, an effective motion model needs to strike the right balance between target processing, to account for the target appearance and scene analysis, and to describe stable background information. Most conventional trackers focus on one aspect out of the two and hence are not able to achieve the correct balance. On the other hand, the admirable model update needs to consider both the tracking speed and the model drift. Most tracking models are updated on every frame or fixed frames, so it cannot achieve the best performance. In this article, we solve the motion model problem by collaboratively using salient region detection and image segmentation. Particularly, the two methods are for different purposes. In the absence of prior knowledge, the former considers image attributes like color, gradient, edges, and boundaries then forms a robust object; the latter aggregates individual pixels into meaningful atomic regions by using the prior knowledge of target and background in the video sequence. Taking advantage of their complementary roles, we construct a more reasonable confidence map. For model update problems, we dynamically update the model by analyzing scene with image similarity, which not only reduces the update frequency of the model but also suppresses the model drift. Finally, we use these improved building blocks not only to do comparative tests but also to give a basic tracker, and extensive experimental results on OTB50 show that the proposed methods perform favorably against the state-of-the-art methods.
Collapse
Affiliation(s)
- Wanli Xue
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Zhiyong Feng
- School of Computer Software, Tianjin University, Tianjin, China
| | - Chao Xu
- School of Computer Software, Tianjin University, Tianjin, China
| | - Tong Liu
- School of Computer Software, Tianjin University, Tianjin, China
| | - Zhaopeng Meng
- School of Computer Software, Tianjin University, Tianjin, China
| | - Chengwei Zhang
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
21
|
Jenkins MD, Barrie P, Buggy T, Morison G. Selective Sampling Importance Resampling Particle Filter Tracking With Multibag Subspace Restoration. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:264-276. [PMID: 27959835 DOI: 10.1109/tcyb.2016.2631660] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The focus of this paper is a novel object tracking algorithm which combines an incrementally updated subspace-based appearance model, reconstruction error likelihood function and a two stage selective sampling importance resampling particle filter with motion estimation through autoregressive filtering techniques. The primary contribution of this paper is the use of multiple bags of subspaces with which we aim to tackle the issue of appearance model update. The use of a multibag approach allows our algorithm to revert to a previously successful appearance model in the event that the primary model fails. The aim of this is to eliminate tracker drift by undoing updates to the model that lead to error accumulation and to redetect targets after periods of occlusion by removing the subspace updates carried out during the period of occlusion. We compare our algorithm with several state-of-the-art methods and test on a range of challenging, publicly available image sequences. Our findings indicate a significant robustness to drift and occlusion as a result of our multibag approach and results show that our algorithm competes well with current state-of-the-art algorithms.
Collapse
|
22
|
Porikli F. Visual Tracking by Sampling in Part Space. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5800-5810. [PMID: 28858801 DOI: 10.1109/tip.2017.2745204] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we present a novel part-based visual tracking method from the perspective of probability sampling. Specifically, we represent the target by a part space with two online learned probabilities to capture the structure of the target. The proposal distribution memorizes the historical performance of different parts, and it is used for the first round of part selection. The acceptance probability validates the specific tracking stability of each part in a frame, and it determines whether to accept its vote or to reject it. By doing this, we transform the complex online part selection problem into a probability learning one, which is easier to tackle. The observation model of each part is constructed by an improved supervised descent method and is learned in an incremental manner. Experimental results on two benchmarks demonstrate the competitive performance of our tracker against state-of-the-art methods.
Collapse
|
23
|
Zhang J, Han Y, Tang J, Hu Q, Jiang J. Semi-Supervised Image-to-Video Adaptation for Video Action Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:960-973. [PMID: 26992186 DOI: 10.1109/tcyb.2016.2535122] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Human action recognition has been well explored in applications of computer vision. Many successful action recognition methods have shown that action knowledge can be effectively learned from motion videos or still images. For the same action, the appropriate action knowledge learned from different types of media, e.g., videos or images, may be related. However, less effort has been made to improve the performance of action recognition in videos by adapting the action knowledge conveyed from images to videos. Most of the existing video action recognition methods suffer from the problem of lacking sufficient labeled training videos. In such cases, over-fitting would be a potential problem and the performance of action recognition is restrained. In this paper, we propose an adaptation method to enhance action recognition in videos by adapting knowledge from images. The adapted knowledge is utilized to learn the correlated action semantics by exploring the common components of both labeled videos and images. Meanwhile, we extend the adaptation method to a semi-supervised framework which can leverage both labeled and unlabeled videos. Thus, the over-fitting can be alleviated and the performance of action recognition is improved. Experiments on public benchmark datasets and real-world datasets show that our method outperforms several other state-of-the-art action recognition methods.
Collapse
|
24
|
Gao J, Zhang T, Yang X, Xu C. Deep Relative Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:1845-1858. [PMID: 28113343 DOI: 10.1109/tip.2017.2656628] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Most existing tracking methods are direct trackers, which directly exploit foreground or/and background information for object appearance modeling and decide whether an image patch is target object or not. As a result, these trackers cannot perform well when target appearance changes heavily and becomes different from its model. To deal with this issue, we propose a novel relative tracker, which can effectively exploit the relative relationship among image patches from both foreground and background for object appearance modeling. Different from direct trackers, the proposed relative tracker is robust to localize target object by use of the best image patch with the highest relative score to target appearance model. To model relative relationship among large-scale image patch pairs, we propose a novel and effective deep relative learning algorithm via Convolutional Neural Network. We test the proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that our method consistently outperforms state-of-the-art trackers due to the powerful capacity of the proposed deep relative model.
Collapse
|
25
|
Shen J, Hao X, Liang Z, Liu Y, Wang W, Shao L. Real-Time Superpixel Segmentation by DBSCAN Clustering Algorithm. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5933-5942. [PMID: 27740485 DOI: 10.1109/tip.2016.2616302] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we propose a real-time image superpixel segmentation method with 50 frames/s by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. In order to decrease the computational costs of superpixel algorithms, we adopt a fast two-step framework. In the first clustering stage, the DBSCAN algorithm with color-similarity and geometric restrictions is used to rapidly cluster the pixels, and then, small clusters are merged into superpixels by their neighborhood through a distance measurement defined by color and spatial features in the second merging stage. A robust and simple distance function is defined for obtaining better superpixels in these two steps. The experimental results demonstrate that our real-time superpixel algorithm (50 frames/s) by the DBSCAN clustering outperforms the state-of-the-art superpixel segmentation methods in terms of both accuracy and efficiency.
Collapse
|
26
|
Porikli F. Visual Tracking Under Motion Blur. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:5867-5876. [PMID: 27723595 DOI: 10.1109/tip.2016.2615812] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Most existing tracking algorithms do not explicitly consider the motion blur contained in video sequences, which degrades their performance in real-world applications where motion blur often occurs. In this paper, we propose to solve the motion blur problem in visual tracking in a unified framework. Specifically, a joint blur state estimation and multi-task reverse sparse learning framework are presented, where the closed-form solution of blur kernel and sparse code matrix is obtained simultaneously. The reverse process considers the blurry candidates as dictionary elements, and sparsely represents blurred templates with the candidates. By utilizing the information contained in the sparse code matrix, an efficient likelihood model is further developed, which quickly excludes irrelevant candidates and narrows the particle scale down. Experimental results on the challenging benchmarks show that our method performs well against the state-of-the-art trackers.
Collapse
|
27
|
Ma B, Hu H, Shen J, Liu Y, Shao L. Generalized Pooling for Robust Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:4199-4208. [PMID: 27392358 DOI: 10.1109/tip.2016.2588329] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Feature pooling in a majority of sparse coding-based tracking algorithms computes final feature vectors only by low-order statistics or extreme responses of sparse codes. The high-order statistics and the correlations between responses to different dictionary items are neglected. We present a more generalized feature pooling method for visual tracking by utilizing the probabilistic function to model the statistical distribution of sparse codes. Since immediate matching between two distributions usually requires high computational costs, we introduce the Fisher vector to derive a more compact and discriminative representation for sparse codes of the visual target. We encode target patches by local coordinate coding, utilize Gaussian mixture model to compute Fisher vectors, and finally train semi-supervised linear kernel classifiers for visual tracking. In order to handle the drifting problem during the tracking process, these classifiers are updated online with current tracking results. The experimental results on two challenging tracking benchmarks demonstrate that the proposed approach achieves a better performance than the state-of-the-art tracking algorithms.
Collapse
|
28
|
|
29
|
Porikli F. DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1834-1848. [PMID: 26841390 DOI: 10.1109/tip.2015.2510583] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking, because they require very long training time and a large number of training samples. In this paper, we present an efficient and very robust tracking algorithm using a single convolutional neural network (CNN) for learning effective feature representations of the target object in a purely online manner. Our contributions are multifold. First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation. Second, we enhance the ordinary stochastic gradient descent approach in CNN training with a robust sample selection mechanism. The sampling mechanism randomly generates positive and negative samples from different temporal distributions, which are generated by taking the temporal relations and label noise into account. Finally, a lazy yet effective updating scheme is designed for CNN training. Equipped with this novel updating algorithm, the CNN model is robust to some long-existing difficulties in visual tracking, such as occlusion or incorrect detections, without loss of the effective adaption for significant appearance changes. In the experiment, our CNN tracker outperforms all compared state-of-the-art methods on two recently proposed benchmarks, which in total involve over 60 video sequences. The remarkable performance improvement over the existing trackers illustrates the superiority of the feature representations, which are learned purely online via the proposed deep learning framework.
Collapse
|