1
|
Jiao L, Wang D, Bai Y, Chen P, Liu F. Deep Learning in Visual Tracking: A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5497-5516. [PMID: 34968181 DOI: 10.1109/tnnls.2021.3136907] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep learning (DL) has made breakthroughs in many computer vision tasks and also in visual tracking. From the beginning of the research on the automatic acquisition of high abstract feature representation, DL has gone deep into all aspects of tracking to date, to name a few, similarity metric, data association, and bounding box estimation. Also, pure DL-based trackers have obtained the state-of-the-art performance after the community's constant research. We believe that it is time to comprehensively review the development of DL research in visual tracking. In this article, we overview the critical improvements brought to the field by DL: deep feature representations, network architecture, and four crucial issues in visual tracking (spatiotemporal information integration, target-specific classification, target information update, and bounding box estimation). The scope of the survey of DL-based tracking covers two primary subtasks for the first time, single-object tracking and multiple-object tracking. Also, we analyze the performance of DL-based approaches and give meaningful conclusions. Finally, we provide several promising directions and tasks in visual tracking and relevant fields.
Collapse
|
2
|
Zhu Y, Wang M, Yin X, Zhang J, Meijering E, Hu J. Deep Learning in Diverse Intelligent Sensor Based Systems. SENSORS (BASEL, SWITZERLAND) 2022; 23:62. [PMID: 36616657 PMCID: PMC9823653 DOI: 10.3390/s23010062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/06/2022] [Accepted: 12/14/2022] [Indexed: 05/27/2023]
Abstract
Deep learning has become a predominant method for solving data analysis problems in virtually all fields of science and engineering. The increasing complexity and the large volume of data collected by diverse sensor systems have spurred the development of deep learning methods and have fundamentally transformed the way the data are acquired, processed, analyzed, and interpreted. With the rapid development of deep learning technology and its ever-increasing range of successful applications across diverse sensor systems, there is an urgent need to provide a comprehensive investigation of deep learning in this domain from a holistic view. This survey paper aims to contribute to this by systematically investigating deep learning models/methods and their applications across diverse sensor systems. It also provides a comprehensive summary of deep learning implementation tips and links to tutorials, open-source codes, and pretrained models, which can serve as an excellent self-contained reference for deep learning practitioners and those seeking to innovate deep learning in this space. In addition, this paper provides insights into research topics in diverse sensor systems where deep learning has not yet been well-developed, and highlights challenges and future opportunities. This survey serves as a catalyst to accelerate the application and transformation of deep learning in diverse sensor systems.
Collapse
Affiliation(s)
- Yanming Zhu
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Min Wang
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Xuefei Yin
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Jue Zhang
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
| | - Jiankun Hu
- School of Engineering and Information Technology, University of New South Wales, Canberra, ACT 2612, Australia
| |
Collapse
|
3
|
Robust appearance modeling for object detection and tracking: a survey of deep learning approaches. PROGRESS IN ARTIFICIAL INTELLIGENCE 2022. [DOI: 10.1007/s13748-022-00290-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
4
|
Tan K, Xu TB, Wei Z. IMSiam: IoU-aware Matching-adaptive Siamese network for object tracking. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Saribas H, Cevikalp H, Köpüklü O, Uzun B. TRAT: Tracking by attention using spatio-temporal features. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Abstract
Continuous growth in software, hardware and internet technology has enabled the growth of internet-based sensor tools that provide physical world observations and data measurement. The Internet of Things(IoT) is made up of billions of smart things that communicate, extending the boundaries of physical and virtual entities of the world further. These intelligent things produce or collect massive data daily with a broad range of applications and fields. Analytics on these huge data is a critical tool for discovering new knowledge, foreseeing future knowledge and making control decisions that make IoT a worthy business paradigm and enhancing technology. Deep learning has been used in a variety of projects involving IoT and mobile apps, with encouraging early results. With its data-driven, anomaly-based methodology and capacity to detect developing, unexpected attacks, deep learning may deliver cutting-edge solutions for IoT intrusion detection. In this paper, the increased amount of information gathered or produced is being used to further develop intelligence and application capabilities through Deep Learning (DL) techniques. Many researchers have been attracted to the various fields of IoT, and both DL and IoT techniques have been approached. Different studies suggested DL as a feasible solution to manage data produced by IoT because it was intended to handle a variety of data in large amounts, requiring almost real-time processing. We start by discussing the introduction to IoT, data generation and data processing. We also discuss the various DL approaches with their procedures. We surveyed and summarized major reporting efforts for DL in the IoT region on various datasets. The features, application and challenges that DL uses to empower IoT applications, which are also discussed in this promising field, can motivate and inspire further developments.
Collapse
|
7
|
Shi Y, Wang Z, Du X, Gong B, Lu Y, Li L. Membrane fouling diagnosis of membrane components based on multi-feature information fusion. J Memb Sci 2022. [DOI: 10.1016/j.memsci.2022.120670] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
8
|
Liu C, Ibrayim M, Hamdulla A. Multi-Feature Single Target Robust Tracking Fused with Particle Filter. SENSORS 2022; 22:s22051879. [PMID: 35271025 PMCID: PMC8914627 DOI: 10.3390/s22051879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 02/11/2022] [Accepted: 02/25/2022] [Indexed: 11/16/2022]
Abstract
Aiming at the problems of target model drift or loss of target tracking caused by serious deformation, occlusion, fast motion, and out of view of the target in long-term moving target tracking in complex scenes, this paper presents a robust multi-feature single-target tracking algorithm based on a particle filter. The algorithm is based on the correlation filtering framework. First, to extract more accurate target appearance features, in addition to the manual features histogram of oriented gradient features and color histogram features, the depth features from the conv3–4, conv4–4 and conv5–4 convolutional layer outputs in VGGNet-19 are also fused. Secondly, this paper designs a re-detection module of a fusion particle filter for the problem of how to return to accurate tracking after the target tracking fails, so that the algorithm in this paper can maintain high robustness during long-term tracking. Finally, in the adaptive model update stage, the adaptive learning rate update and adaptive filter update are performed to improve the accuracy of target tracking. Extensive experiments are conducted on dataset OTB-2015, dataset OTB-2013, and dataset UAV123. The experimental results show that the proposed multi-feature single-target robust tracking algorithm with fused particle filtering can effectively solve the long-time target tracking problem in complex scenes, while showing more stable and accurate tracking performance.
Collapse
|
9
|
HCDC-SRCF tracker: Learning an adaptively multi-feature fuse tracker in spatial regularized correlation filters framework. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107913] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
10
|
Zheng J, Xu Y, Xin M. Structured object tracking with discriminative patch attributed relational graph. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
11
|
Yao S, Zhang H, Ren W, Ma C, Han X, Cao X. Robust Online Tracking via Contrastive Spatio-Temporal Aware Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1989-2002. [PMID: 33444140 DOI: 10.1109/tip.2021.3050314] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing tracking-by-detection approaches using deep features have achieved promising results in recent years. However, these methods mainly exploit feature representations learned from individual static frames, thus paying little attention to the temporal smoothness between frames. This easily leads trackers to drift in the presence of large appearance variations and occlusions. To address this issue, we propose a two-stream network to learn discriminative spatio-temporal feature representations to represent the target objects. The proposed network consists of a Spatial ConvNet module and a Temporal ConvNet module. Specifically, the Spatial ConvNet adopts 2D convolutions to encode the target-specific appearance in static frames, while the Temporal ConvNet models the temporal appearance variations using 3D convolutions and learns consistent temporal patterns in a short video clip. Then we propose a proposal refinement module to adjust the predicted bounding box, which can make the target localizing outputs to be more consistent in video sequences. In addition, to improve the model adaptation during online update, we propose a contrastive online hard example mining (OHEM) strategy, which selects hard negative samples and enforces them to be embedded in a more discriminative feature space. Extensive experiments conducted on the OTB, Temple Color and VOT benchmarks demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
Collapse
|
12
|
Chakraborty DB, Pal SK. Rough video conceptualization for real-time event precognition with motion entropy. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.09.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Li X, Liu Q, Fan N, Zhou Z, He Z, Jing XY. Dual-regression model for visual tracking. Neural Netw 2020; 132:364-374. [DOI: 10.1016/j.neunet.2020.09.011] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 09/02/2020] [Accepted: 09/10/2020] [Indexed: 01/07/2023]
|
14
|
Kamranian Z, Naghsh Nilchi AR, Sadeghian H, Tombari F, Navab N. Joint motion boundary detection and CNN-based feature visualization for video object segmentation. Neural Comput Appl 2020. [DOI: 10.1007/s00521-019-04448-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
15
|
Wang L, Zhang L, Wang J, Yi Z. Memory Mechanisms for Discriminative Visual Tracking Algorithms With Deep Neural Networks. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2900506] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
16
|
Zhang CL, Tang YP, Li ZX, Wang ZW. Joint spatiograms for multi-modality tracking with online update. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.02.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
17
|
|
18
|
Teng Z, Xing J, Wang Q, Zhang B, Fan J. Deep Spatial and Temporal Network for Robust Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1762-1775. [PMID: 31562088 DOI: 10.1109/tip.2019.2942502] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.
Collapse
|
19
|
Abstract
Kernel correlation filters (KCF) demonstrate significant potential in visual object tracking by employing robust descriptors. Proper selection of color and texture features can provide robustness against appearance variations. However, the use of multiple descriptors would lead to a considerable feature dimension. In this paper, we propose a novel low-rank descriptor, that provides better precision and success rate in comparison to state-of-the-art trackers. We accomplished this by concatenating the magnitude component of the Overlapped Multi-oriented Tri-scale Local Binary Pattern (OMTLBP), Robustness-Driven Hybrid Descriptor (RDHD), Histogram of Oriented Gradients (HoG), and Color Naming (CN) features. We reduced the rank of our proposed multi-channel feature to diminish the computational complexity. We formulated the Support Vector Machine (SVM) model by utilizing the circulant matrix of our proposed feature vector in the kernel correlation filter. The use of discrete Fourier transform in the iterative learning of SVM reduced the computational complexity of our proposed visual tracking algorithm. Extensive experimental results on Visual Tracker Benchmark dataset show better accuracy in comparison to other state-of-the-art trackers.
Collapse
|
20
|
Zhou L, Zhang J. Combined Kalman Filter and Multifeature Fusion Siamese Network for Real-Time Visual Tracking. SENSORS (BASEL, SWITZERLAND) 2019; 19:s19092201. [PMID: 31086025 PMCID: PMC6539506 DOI: 10.3390/s19092201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 05/01/2019] [Accepted: 05/08/2019] [Indexed: 06/09/2023]
Abstract
SiamFC has a simple network structure and can be pretrained offline on a large data set, so it has attracted the attention of many researchers. It has no online learning process at all. Hence, there are no good solutions for some complex tracking scenarios such as occlusion and large target deformation. For this problem, we propose a method using the Kalman filter method and fusion multiresolution features and get multiple response scores. The Kalman filter acquires the target's trajectory information, which is used to process complex tracking scenes and to change the selection method of the search area. This also enables our tracker to stably track fast moving targets.The introduction of the Kalman filter compensates for the shortcomings that SiamFC can only track offline, and the tracking network has an online learning process. The fusion of multiresolution features to obtain multiple response scores map helps the tracker to obtain robust features that can be adapted to a variety of tracking targets. Our proposed method has reached the state-of-the-art in testing on five data sets and can be run in real time (40 fps), including OTB2013, OTB2015, OTB50, VOT2015 and VOT 2016.
Collapse
Affiliation(s)
- Lijun Zhou
- Key Laboratory of Optical Engineering, Institute of Optics and Electronics, Chinese Academy of Sciences, No.1, Optoelectronic Avenue, Wenxing Town, Shuangliu District, Chengdu 610200, China.
- University of Chinese Academy of Sciences, Beijing 100000, China.
| | - Jianlin Zhang
- Key Laboratory of Optical Engineering, Institute of Optics and Electronics, Chinese Academy of Sciences, No.1, Optoelectronic Avenue, Wenxing Town, Shuangliu District, Chengdu 610200, China.
| |
Collapse
|
21
|
Heinrich S, Springstübe P, Knöppler T, Kerzel M, Wermter S. Continuous convolutional object tracking in developmental robot scenarios. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.086] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
22
|
Gao L, Lan X, Mi H, Feng D, Xu K, Peng Y. Multistructure-Based Collaborative Online Distillation. ENTROPY 2019; 21:e21040357. [PMID: 33267071 PMCID: PMC7514841 DOI: 10.3390/e21040357] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 12/03/2022]
Abstract
Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet’s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation.
Collapse
Affiliation(s)
- Liang Gao
- National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
| | - Xu Lan
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E14NS, UK
| | - Haibo Mi
- National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
| | - Dawei Feng
- National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
| | - Kele Xu
- National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
- Correspondence: ; Tel.: +86-166-7316-1118
| | - Yuxing Peng
- National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
| |
Collapse
|
23
|
Han Y, Deng C, Zhao B, Tao D. State-aware Anti-drift Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4075-4086. [PMID: 30892207 DOI: 10.1109/tip.2019.2905984] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Correlation filter (CF) based trackers have aroused increasing attentions in visual tracking field due to the superior performance on several datasets while maintaining high running speed. For each frame, an ideal filter is trained in order to discriminate the target from its surrounding background. Considering that the target always undergoes external and internal interference during tracking procedure, the trained tracker should not only have the ability to judge the current state when failure occurs, but also to resist the model drift caused by challenging distractions. To this end, we present a State-aware Anti-drift Tracker (SAT) in this paper, which jointly model the discrimination and reliability information in filter learning. Specifically, global context patches are incorporated into filter training stage to better distinguish the target from backgrounds. Meanwhile, a color-based reliable mask is learned to encourage the filter to focus on more reliable regions suitable for tracking. We show that the proposed optimization problem could be efficiently solved using Alternative Direction Method of Multipliers and fully carried out in Fourier domain. Furthermore, a Kurtosis-based updating scheme is advocated to reveal the tracking condition as well as guarantee a high-confidence template updating. Extensive experiments are conducted on OTB-100 and UAV-20L datasets to compare the SAT tracker with other relevant state-of-the-art methods. Both quantitative and qualitative evaluations further demonstrate the effectiveness and robustness of the proposed work.
Collapse
|
24
|
A Robust Visual Tracking Algorithm Based on Spatial-Temporal Context Hierarchical Response Fusion. ALGORITHMS 2018. [DOI: 10.3390/a12010008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical features of convolutional neural networks (CNNs) and learn a spatial-temporal context correlation filter on convolutional layers. Then, the translation is estimated by fusing the response score of the filters on the three convolutional layers. In terms of scale estimation, we learn a discriminative correlation filter to estimate scale from the best confidence results. Furthermore, we proposed a re-detection activation discrimination method to improve the robustness of visual tracking in the case of tracking failure and an adaptive model update method to reduce tracking drift caused by noisy updates. We evaluate the proposed tracker with DCFs and deep features on OTB benchmark datasets. The tracking results demonstrated that the proposed algorithm is superior to several state-of-the-art DCF methods in terms of accuracy and robustness.
Collapse
|
25
|
Kamranian Z, Naghsh Nilchi AR, Monadjemi A, Navab N. Iterative algorithm for interactive co-segmentation using semantic information propagation. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1221-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
26
|
Yun X, Sun Y, Wang S, Shi Y, Lu N. Multi-layer convolutional network-based visual tracking via important region selection. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
27
|
Robust Visual Tracking Based on Adaptive Convolutional Features and Offline Siamese Tracker. SENSORS 2018; 18:s18072359. [PMID: 30036993 PMCID: PMC6068628 DOI: 10.3390/s18072359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 07/17/2018] [Accepted: 07/18/2018] [Indexed: 11/17/2022]
Abstract
Robust and accurate visual tracking is one of the most challenging computer vision problems. Due to the inherent lack of training data, a robust approach for constructing a target appearance model is crucial. The existing spatially regularized discriminative correlation filter (SRDCF) method learns partial-target information or background information when experiencing rotation, out of view, and heavy occlusion. In order to reduce the computational complexity by creating a novel method to enhance tracking ability, we first introduce an adaptive dimensionality reduction technique to extract the features from the image, based on pre-trained VGG-Net. We then propose an adaptive model update to assign weights during an update procedure depending on the peak-to-sidelobe ratio. Finally, we combine the online SRDCF-based tracker with the offline Siamese tracker to accomplish long term tracking. Experimental results demonstrate that the proposed tracker has satisfactory performance in a wide range of challenging tracking scenarios.
Collapse
|
28
|
Yun S, Choi J, Yoo Y, Yun K, Choi JY. Action-Driven Visual Object Tracking With Deep Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2239-2252. [PMID: 29771675 DOI: 10.1109/tnnls.2018.2801826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we propose an efficient visual tracker, which directly captures a bounding box containing the target object in a video by means of sequential actions learned using deep neural networks. The proposed deep neural network to control tracking actions is pretrained using various training video sequences and fine-tuned during actual tracking for online adaptation to a change of target and background. The pretraining is done by utilizing deep reinforcement learning (RL) as well as supervised learning. The use of RL enables even partially labeled data to be successfully utilized for semisupervised learning. Through the evaluation of the object tracking benchmark data set, the proposed tracker is validated to achieve a competitive performance at three times the speed of existing deep network-based trackers. The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.
Collapse
|
29
|
|
30
|
Gao J, Zhang T, Yang X, Xu C. P2T: Part-to-Target Tracking via Deep Regression Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:3074-3086. [PMID: 29994065 DOI: 10.1109/tip.2018.2813166] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Most existing part based tracking methods are part-to-part trackers, which usually have two separated steps including part matching and target localization. Different from existing methods, in this paper, we propose a novel part-totarget (P2T) tracker in a unified fashion by inferring target location from parts directly. To achieve this goal, we propose a novel deep regression model for part to target regression in an end-to-end framework via Convolutional Neural Networks. The proposed model is able to not only exploit part context information to preserve object spatial layout structure, but also learn part reliability to emphasize part importance for robust part to target regression. We evaluate the proposed tracker on 4 challenging benchmark sequences, and extensive experimental results demonstrate that our method performs favorably against state-of-the-art trackers because of the powerful capacity of the proposed deep regression model.
Collapse
|
31
|
Li C, Wu X, Zhao N, Cao X, Tang J. Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.11.068] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
32
|
|
33
|
Target Tracking via Particle Filter and Convolutional Network. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2018. [DOI: 10.1155/2018/5381962] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We propose a more effective tracking algorithm which can work robustly in a complex scene such as illumination, appearance change, and partial occlusion. The algorithm is based on an improved particle filter which used the efficient design of observation model. Predefined convolutional filters are used to extract the high-order features. The global representation is generated by combining local features without changing their structures and space arrangements. It not only increases the feature invariance, but also maintains the specificity. The extracted feature from convolution network is introduced into particle filter algorithm. The observation model is constructed by fusing the color feature of the target and a set of features from templates which are extracted by convolutional networks without training in our paper. It is fused with the features extracted from convolutional network for tracking. In the process of tracking, the template is updated in real time, and then the robustness of the algorithm is improved. Experiments show that the algorithm can achieve an ideal tracking effect when the targets are in a complex environment.
Collapse
|
34
|
Gundogdu E, Ozkan H, Alatan AA. Extending Correlation Filter-Based Visual Tracking by Tree-Structured Ensemble and Spatial Windowing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:5270-5283. [PMID: 28767369 DOI: 10.1109/tip.2017.2733199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Correlation filters have been successfully used in visual tracking due to their modeling power and computational efficiency. However, the state-of-the-art correlation filter-based (CFB) tracking algorithms tend to quickly discard the previous poses of the target, since they consider only a single filter in their models. On the contrary, our approach is to register multiple CFB trackers for previous poses and exploit the registered knowledge when an appearance change occurs. To this end, we propose a novel tracking algorithm [of complexity O(D) ] based on a large ensemble of CFB trackers. The ensemble [of size O(2D) ] is organized over a binary tree (depth D ), and learns the target appearance subspaces such that each constituent tracker becomes an expert of a certain appearance. During tracking, the proposed algorithm combines only the appearance-aware relevant experts to produce boosted tracking decisions. Additionally, we propose a versatile spatial windowing technique to enhance the individual expert trackers. For this purpose, spatial windows are learned for target objects as well as the correlation filters and then the windowed regions are processed for more robust correlations. In our extensive experiments on benchmark datasets, we achieve a substantial performance increase by using the proposed tracking algorithm together with the spatial windowing.
Collapse
|
35
|
Wang X, Fan B, Chang S, Wang Z, Liu X, Tao D, Huang TS. Greedy Batch-Based Minimum-Cost Flows for Tracking Multiple Objects. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:4765-4776. [PMID: 28692973 DOI: 10.1109/tip.2017.2723239] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Minimum-cost flow algorithms have recently achieved state-of-the-art results in multi-object tracking. However, they rely on the whole image sequence as input. When deployed in real-time applications or in distributed settings, these algorithms first operate on short batches of frames and then stitch the results into full trajectories. This decoupled strategy is prone to errors because the batch-based tracking errors may propagate to the final trajectories and cannot be corrected by other batches. In this paper, we propose a greedy batch-based minimum-cost flow approach for tracking multiple objects. Unlike existing approaches that conduct batch-based tracking and stitching sequentially, we optimize consecutive batches jointly so that the tracking results on one batch may benefit the results on the other. Specifically, we apply a generalized minimum-cost flows (MCF) algorithm on each batch and generate a set of conflicting trajectories. These trajectories comprise the ones with high probabilities, but also those with low probabilities potentially missed by detectors and trackers. We then apply the generalized MCF again to obtain the optimal matching between trajectories from consecutive batches. Our proposed approach is simple, effective, and does not require training. We demonstrate the power of our approach on data sets of different scenarios.
Collapse
|
36
|
Wang L, Zhang L, Yi Z. Trajectory Predictor by Using Recurrent Neural Networks in Visual Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3172-3183. [PMID: 28885144 DOI: 10.1109/tcyb.2017.2705345] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Motion models have been proved to be a crucial part in the visual tracking process. In recent trackers, particle filter and sliding windows-based motion models have been widely used. Treating motion models as a sequence prediction problem, we can estimate the motion of objects using their trajectories. Moreover, it is possible to transfer the learned knowledge from annotated trajectories to new objects. Inspired by recent advance in deep learning for visual feature extraction and sequence prediction, we propose a trajectory predictor to learn prior knowledge from annotated trajectories and transfer it to predict the motion of target objects. In this predictor, convolutional neural networks extract the visual features of target objects. Long short-term memory model leverages the annotated trajectory priors as well as sequential visual information, which includes the tracked features and center locations of the target object, to predict the motion. Furthermore, to extend this method to videos in which it is difficult to obtain annotated trajectories, a dynamic weighted motion model that combines the proposed trajectory predictor with a random sampler is proposed. To evaluate the transfer performance of the proposed trajectory predictor, we annotated a real-world vehicle dataset. Experiment results on both this real-world vehicle dataset and an online tracker benchmark dataset indicate that the proposed method outperforms several state-of-the-art trackers.
Collapse
|
37
|
Zhang L, Suganthan PN. Visual Tracking With Convolutional Random Vector Functional Link Network. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3243-3253. [PMID: 27542188 DOI: 10.1109/tcyb.2016.2588526] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Deep neural network-based methods have recently achieved excellent performance in visual tracking task. As very few training samples are available in visual tracking task, those approaches rely heavily on extremely large auxiliary dataset such as ImageNet to pretrain the model. In order to address the discrepancy between the source domain (the auxiliary data) and the target domain (the object being tracked), they need to be finetuned during the tracking process. However, those methods suffer from sensitivity to the hyper-parameters such as learning rate, maximum number of epochs, size of mini-batch, and so on. Thus, it is worthy to investigate whether pretraining and fine tuning through conventional back-prop is essential for visual tracking. In this paper, we shed light on this line of research by proposing convolutional random vector functional link (CRVFL) neural network, which can be regarded as a marriage of the convolutional neural network and random vector functional link network, to simplify the visual tracking system. The parameters in the convolutional layer are randomly initialized and kept fixed. Only the parameters in the fully connected layer need to be learned. We further propose an elegant approach to update the tracker. In the widely used visual tracking benchmark, without any auxiliary data, a single CRVFL model achieves 79.0% with a threshold of 20 pixels for the precision plot. Moreover, an ensemble of CRVFL yields comparatively the best result of 86.3%.
Collapse
|
38
|
|
39
|
Oh SI, Kang HB. Multiple Objects Fusion Tracker Using a Matching Network for Adaptively Represented Instance Pairs. SENSORS (BASEL, SWITZERLAND) 2017; 17:s17040883. [PMID: 28420194 PMCID: PMC5424760 DOI: 10.3390/s17040883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 04/14/2017] [Accepted: 04/14/2017] [Indexed: 06/07/2023]
Abstract
Multiple-object tracking is affected by various sources of distortion, such as occlusion, illumination variations and motion changes. Overcoming these distortions by tracking on RGB frames, such as shifting, has limitations because of material distortions caused by RGB frames. To overcome these distortions, we propose a multiple-object fusion tracker (MOFT), which uses a combination of 3D point clouds and corresponding RGB frames. The MOFT uses a matching function initialized on large-scale external sequences to determine which candidates in the current frame match with the target object in the previous frame. After conducting tracking on a few frames, the initialized matching function is fine-tuned according to the appearance models of target objects. The fine-tuning process of the matching function is constructed as a structured form with diverse matching function branches. In general multiple object tracking situations, scale variations for a scene occur depending on the distance between the target objects and the sensors. If the target objects in various scales are equally represented with the same strategy, information losses will occur for any representation of the target objects. In this paper, the output map of the convolutional layer obtained from a pre-trained convolutional neural network is used to adaptively represent instances without information loss. In addition, MOFT fuses the tracking results obtained from each modality at the decision level to compensate the tracking failures of each modality using basic belief assignment, rather than fusing modalities by selectively using the features of each modality. Experimental results indicate that the proposed tracker provides state-of-the-art performance considering multiple objects tracking (MOT) and KITTIbenchmarks.
Collapse
Affiliation(s)
- Sang-Il Oh
- Department of Media Engineering, Catholic University of Korea, 43-1, Yeoggok 2-dong, Wonmmi-gu, Bucheon-si, Gyeonggi-do 14662, Korea.
| | - Hang-Bong Kang
- Department of Media Engineering, Catholic University of Korea, 43-1, Yeoggok 2-dong, Wonmmi-gu, Bucheon-si, Gyeonggi-do 14662, Korea.
| |
Collapse
|
40
|
Visual Object Tracking Based on Cross-Modality Gaussian-Bernoulli Deep Boltzmann Machines with RGB-D Sensors. SENSORS 2017; 17:s17010121. [PMID: 28075373 PMCID: PMC5298694 DOI: 10.3390/s17010121] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 01/05/2017] [Accepted: 01/05/2017] [Indexed: 11/17/2022]
Abstract
Visual object tracking technology is one of the key issues in computer vision. In this paper, we propose a visual object tracking algorithm based on cross-modality featuredeep learning using Gaussian-Bernoulli deep Boltzmann machines (DBM) with RGB-D sensors. First, a cross-modality featurelearning network based on aGaussian-Bernoulli DBM is constructed, which can extract cross-modality features of the samples in RGB-D video data. Second, the cross-modality features of the samples are input into the logistic regression classifier, andthe observation likelihood model is established according to the confidence score of the classifier. Finally, the object tracking results over RGB-D data are obtained using aBayesian maximum a posteriori (MAP) probability estimation algorithm. The experimental results show that the proposed method has strong robustness to abnormal changes (e.g., occlusion, rotation, illumination change, etc.). The algorithm can steadily track multiple targets and has higher accuracy.
Collapse
|