1
|
Han W, Dong X, Zhang Y, Crandall D, Xu CZ, Shen J. Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:7363-7376. [PMID: 38743545 DOI: 10.1109/tpami.2024.3400873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.
Collapse
|
2
|
Kang B, Liang D, Mei J, Tan X, Zhou Q, Zhang D. Robust RGB-T Tracking via Graph Attention-Based Bilinear Pooling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9900-9911. [PMID: 35417355 DOI: 10.1109/tnnls.2022.3161969] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
RGB-T tracker possesses strong capability of fusing two different yet complementary target observations, thus providing a promising solution to fulfill all-weather tracking in intelligent transportation systems. Existing convolutional neural network (CNN)-based RGB-T tracking methods often consider the multisource-oriented deep feature fusion from global viewpoint, but fail to yield satisfactory performance when the target pair only contains partially useful information. To solve this problem, we propose a four-stream oriented Siamese network (FS-Siamese) for RGB-T tracking. The key innovation of our network structure lies in that we formulate multidomain multilayer feature map fusion as a multiple graph learning problem, based on which we develop a graph attention-based bilinear pooling module to explore the partial feature interaction between the RGB and the thermal targets. This can effectively avoid uninformed image blocks disturbing feature embedding fusion. To enhance the efficiency of the proposed Siamese network structure, we propose to adopt meta-learning to incorporate category information in the updating of bilinear pooling results, which can online enforce the exemplar and current target appearance obtaining similar sematic representation. Extensive experiments on grayscale-thermal object tracking (GTOT) and RGBT234 datasets demonstrate that the proposed method outperforms the state-of-the-art methods for the task of RGB-T tracking.
Collapse
|
3
|
Yang Y, Gu X. Joint Correlation and Attention Based Feature Fusion Network for Accurate Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1705-1715. [PMID: 37028050 DOI: 10.1109/tip.2023.3251027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Correlation operation and attention mechanism are two popular feature fusion approaches which play an important role in visual object tracking. However, the correlation-based tracking networks are sensitive to location information but loss some context semantics, while the attention-based tracking networks can make full use of rich semantic information but ignore the position distribution of the tracked object. Therefore, in this paper, we propose a novel tracking framework based on joint correlation and attention networks, termed as JCAT, which can effectively combine the advantages of these two complementary feature fusion approaches. Concretely, the proposed JCAT approach adopts parallel correlation and attention branches to generate position and semantic features. Then the fusion features are obtained by directly adding the location feature and semantic feature. Finally, the fused features are fed into the segmentation network to generate the pixel-wise state estimation of the object. Furthermore, we develop a segmentation memory bank and an online sample filtering mechanism for robust segmentation and tracking. The extensive experimental results on eight challenging visual tracking benchmarks show that the proposed JCAT tracker achieves very promising tracking performance and sets a new state-of-the-art on the VOT2018 benchmark.
Collapse
|
4
|
Xu T, Feng Z, Wu XJ, Kittler J. Toward Robust Visual Object Tracking With Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1541-1554. [PMID: 37027596 DOI: 10.1109/tip.2023.3246800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tracking performance as compared with the state-of-the-art tracking methods.
Collapse
|
5
|
Li S, Zhao S, Cheng B, Chen J. Part-Aware Framework for Robust Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:750-763. [PMID: 37018334 DOI: 10.1109/tip.2022.3232941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The local parts of the target are vitally important for robust object tracking. Nevertheless, existing excellent context regression methods involving siamese networks and discrimination correlation filters mostly represent the target appearance from the holistic model, showing high sensitivity in scenarios with partial occlusion and drastic appearance changes. In this paper, we address this issue by proposing a novel part-aware framework based on context regression, which simultaneously considers the global and local parts of the target and fully exploits their relationship to be collaboratively aware of the target state online. To this end, the spatial-temporal measure among context regressors corresponding to multiple parts is designed to evaluate the tracking quality of each part regressor by solving the imbalance among global and local parts. The coarse target locations provided by part regressors are further aggregated by treating their measures as weights to refine the final target location. Furthermore, the divergence of multiple part regressors in each frame reveals the interference degree of background noise, which is quantified to control the proposed combination window functions in part regressors to adaptively filter redundant noise. Besides, the spatial-temporal information among part regressors is also leveraged to assist in accurately estimating the target scale. Extensive evaluations demonstrate that the proposed framework help many context regression trackers achieve performance improvements and perform favorably against state-of-the-art methods on the popular benchmarks: OTB, TC128, UAV, UAVDT, VOT, TrackingNet, GOT-10k, LaSOT.
Collapse
|
6
|
Wei B, Chen H, Cao S, Ding Q, Luo H. An IoU-aware Siamese network for real-time visual tracking. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
7
|
Fan N, Liu Q, Li X, Zhou Z, He Z. Siamese Residual Network for Efficient Visual Tracking. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
8
|
Nai K, Li Z, Gan Y, Wang Q. Robust Visual Tracking via Multitask Sparse Correlation Filters Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:502-515. [PMID: 34310327 DOI: 10.1109/tnnls.2021.3097498] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, a novel multitask sparse correlation filters (MTSCF) model, which introduces multitask sparse learning into the CFs framework, is proposed for visual tracking. Specifically, the proposed MTSCF method exploits multitask learning to take the interdependencies among different visual features (e.g., histogram of oriented gradient (HOG), color names, and CNN features) into account to simultaneously learn the CFs and make the learned filters enhance and complement each other to boost the tracking performance. Moreover, it also performs feature selection to dynamically select discriminative spatial features from the target region to distinguish the target object from the background. A l2,1 regularization term is considered to realize multitask sparse learning. In order to solve the objective model, alternating direction method of multipliers is utilized for learning the CFs. By considering multitask sparse learning, the proposed MTSCF model can fully utilize the strength of different visual features and select effective spatial features to better model the appearance of the target object. Extensive experiment results on multiple tracking benchmarks demonstrate that our MTSCF tracker achieves competitive tracking performance in comparison with several state-of-the-art trackers.
Collapse
|
9
|
Liang T, Li B, Wang M, Tan H, Luo Z. A Closer Look at the Joint Training of Object Detection and Re-Identification in Multi-Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:267-280. [PMID: 37015359 DOI: 10.1109/tip.2022.3227814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Unifying object detection and re-identification (ReID) into a single network enables faster multi-object tracking (MOT), while this multi-task setting poses challenges for training. In this work, we dissect the joint training of detection and ReID from two dimensions: label assignment and loss function. We find previous works generally overlook them and directly borrow the practices from object detection, inevitably causing inferior performance. Specifically, we identify a qualified label assignment for MOT should: 1) have the assignment cost aware of ReID cost, not just detection cost; 2) provide sufficient positive samples for robust feature learning while avoiding ambiguous positives (i.e., the positives shared by different ground-truth objects). To achieve the above goals, we first propose Identity-aware Label Assignment, which jointly considers the assignment cost of detection and ReID to select positive samples for each instance without ambiguities. Moreover, we advance a novel Discriminative Focal Loss that integrates ReID predictions with Focal Loss to focus the training on the discriminative samples. Finally, we upgrade the strong baseline FairMOT with our techniques and achieve up to 7.0 MOTA / 54.1% IDs improvements on MOT16/17/20 benchmarks under favorable inference speed, which verifies our tailored label assignment and loss function for MOT are superior to those inherited from object detection.
Collapse
|
10
|
Shen J, Liu Y, Dong X, Lu X, Khan FS, Hoi S. Distilled Siamese Networks for Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8896-8909. [PMID: 34762585 DOI: 10.1109/tpami.2021.3127492] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In recent years, Siamese network based trackers have significantly advanced the state-of-the-art in real-time tracking. Despite their success, Siamese trackers tend to suffer from high memory costs, which restrict their applicability to mobile devices with tight memory budgets. To address this issue, we propose a distilled Siamese tracking framework to learn small, fast and accurate trackers (students), which capture critical knowledge from large Siamese trackers (teachers) by a teacher-students knowledge distillation model. This model is intuitively inspired by the one teacher versus multiple students learning method typically employed in schools. In particular, our model contains a single teacher-student distillation module and a student-student knowledge sharing mechanism. The former is designed using a tracking-specific distillation strategy to transfer knowledge from a teacher to students. The latter is utilized for mutual learning between students to enable in-depth knowledge understanding. Extensive empirical evaluations on several popular Siamese trackers demonstrate the generality and effectiveness of our framework. Moreover, the results on five tracking benchmarks show that the proposed distilled trackers achieve compression rates of up to 18× and frame-rates of 265 FPS, while obtaining comparable tracking accuracy compared to base models.
Collapse
|
11
|
Bin L, Ying T, Haoyang D, Jiafan Z, Menghui Q, Zengquan Z. Semi-supervised LDA pedestrian re-identification algorithm based on K-nearest neighbor resampling. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-220924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Person re-identification is a challenging task in the field of computer vision in recent years. The image samples of pedestrians undergo with drastic appearance variations across camera views. The training data of the existing dataset is unable to describe the complex appearance changes, which leads to over-fitting problem of the metric model. In order to solve this problem, based on the statistical and topological characteristics of multi-view paired pedestrian images, a resampled linear discriminant analysis (LDA) method was proposed. This method utilized sample normality and k-nearest neighbours to form potential positive pairs. The potential positive pairs are used to improve the metric model and generalize the metric model to the test data. By optimizing the inter-class divergence method, a semi-supervised re-sampling LDA person re-identification algorithm was established. It was then tested on the VIPeR, CUHK01 and Market 1501datasets. The results show that the proposed method achieves the best performance compared to some available methods. Especially, the proposed method outplays the best comparison method by 0.6% and 5.76% at rank-1 identification rate on the VIPeR and CUHK01 datasets respectively. At the same time, the improved LDA algorithm has improved the rank-1 identification accuracy of traditional LDA method by 9.36% and 32.11% on these two datasets respectively. Besides, the proposed method is limited to Market-1501 dataset when the test data is of large size.
Collapse
Affiliation(s)
- Li Bin
- School of Mechanical Engineering, Wuhan Polytechnic University, Wuhan Hubei, China
| | - Tian Ying
- School of Mechanical Engineering, Wuhan Polytechnic University, Wuhan Hubei, China
| | - Ding Haoyang
- School of Mechanical Engineering, Wuhan Polytechnic University, Wuhan Hubei, China
| | - Zhang Jiafan
- School of Mechanical Engineering, Wuhan Polytechnic University, Wuhan Hubei, China
| | - Qi Menghui
- Hubei Key Laboratory of Theory and Application of Advanced Materials Mechanics, Wuhan University of Technology, Wuhan Hubei, China
| | - Zheng Zengquan
- Hubei Key Laboratory of Theory and Application of Advanced Materials Mechanics, Wuhan University of Technology, Wuhan Hubei, China
| |
Collapse
|
12
|
Risnandar. DeSa COVID-19: Deep salient COVID-19 image-based quality assessment. JOURNAL OF KING SAUD UNIVERSITY. COMPUTER AND INFORMATION SCIENCES 2022; 34:9501-9512. [PMID: 38620925 PMCID: PMC8647162 DOI: 10.1016/j.jksuci.2021.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 11/14/2021] [Accepted: 11/16/2021] [Indexed: 04/17/2024]
Abstract
This study offers an advanced method to evaluate the coronavirus disease 2019 (COVID-19) image quality. The salient COVID-19 image map is incorporated with the deep convolutional neural network (DCNN), namely DeSa COVID-19, which exerts the n-convex method for the full-reference image quality assessment (FR-IQA). The glaring outcomes substantiate that DeSa COVID-19 and the recommended DCNN architecture can convey a remarkable accomplishment on the COVID-chestxray and the COVID-CT datasets, respectively. The salient COVID-19 image map is also gauged in the minuscule COVID-19 image patches. The exploratory results attest that DeSa COVID-19 and the recommended DCNN methods are very good accomplishment compared with other advanced methods on COVID-chestxray and COVID-CT datasets, respectively. The recommended DCNN also acquires the enhanced outgrowths faced with several advanced full-reference-medical-image-quality-assessment (FR-MIQA) techniques in the fast fading (FF), blocking artifact (BA), white noise Gaussian (WG), JPEG, and JPEG2000 (JP2K) in the distorted and undistorted COVID-19 images. The Spearman's rank order correlation coefficient (SROCC) and the linear correlation coefficient (LCC) appraise the recommended DCNN and DeSa COVID-19 fulfillment which are compared the recent FR-MIQA methods. The DeSa COVID-19 evaluation outshines 2.63 % and 2.62 % higher compared the recommended DCNN, and 28.53 % and 29.01 % esteem all of advanced FR-MIQAs methods on SROCC and LCC measures, respectively. The shift add operations of trigonometric, logarithmic, and exponential functions are mowed down in the computational complexity of the DeSa COVID-19 and the recommended DCNN. The DeSa COVID-19 more superior the recommended DCNN and also the other recent full-reference medical image quality assessment methods.
Collapse
Affiliation(s)
- Risnandar
- The Intelligent Systems Research Group, School of Computing, Telkom University, Jl. Telekomunikasi No. 1, Terusan Buahbatu-Dayeuhkolot, Bandung, West Java 40257 Indonesia
- The Computer Vision Research Group, the Research Center for Informatics, Indonesian Institute of Sciences (LIPI) and the National Research and Innovation Agency (BRIN), Republic of Indonesia, Jl. Sangkuriang/Cisitu No.21/154D LIPI Building 20th, 3rd Floor, Bandung, West Java, 40135 Indonesia
| |
Collapse
|
13
|
Wang X, Chen Z, Jiang B, Tang J, Luo B, Tao D. Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-Based Beam Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6239-6254. [PMID: 36166563 DOI: 10.1109/tip.2022.3208437] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. In particular, if a tracker drifts, errors will be accumulated and would further make response scores estimated by the tracker unreliable in future frames. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. More specifically, using the classification-based tracker as the baseline, we first adopt bi-GRU to encode the target feature, proposal feature, and its response score into a unified state representation. The state feature and greedy search result are then fed into the first agent for independent action selection. Afterwards, the output action and state features are fed into the subsequent agent for diverse results prediction. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.
Collapse
|
14
|
Miao J, Wu Y, Yang Y. Identifying Visible Parts via Pose Estimation for Occluded Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4624-4634. [PMID: 33651698 DOI: 10.1109/tnnls.2021.3059515] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We focus on the occlusion problem in person re-identification (re-id), which is one of the main challenges in real-world person retrieval scenarios. Previous methods on the occluded re-id problem usually assume that only the probes are occluded, thereby removing occlusions by manually cropping. However, this may not always hold in practice. This article relaxes this assumption and investigates a more general occlusion problem, where both the probe and gallery images could be occluded. The key to this challenging problem is depressing the noise information by identifying bodies and occlusions. We propose to incorporate the pose information into the re-id framework, which benefits the model in three aspects. First, it provides the location of the body. We then design a Pose-Masked Feature Branch to make our model focus on the body region only and filter those noise features brought by occlusions. Second, the estimated pose reveals which body parts are visible, giving us a hint to construct more informative person features. We propose a Pose-Embedded Feature Branch to adaptively re-calibrate channel-wise feature responses based on the visible body parts. Third, in testing, the estimated pose indicates which regions are informative and reliable for both probe and gallery images. Then we explicitly split the extracted spatial feature into parts. Only part features from those commonly visible parts are utilized in the retrieval. To better evaluate the performances of the occluded re-id, we also propose a large-scale data set for the occluded re-id with more than 35 000 images, namely Occluded-DukeMTMC. Extensive experiments show our approach surpasses previous methods on the occluded, partial, and non-occluded re-id data sets.
Collapse
|
15
|
Cheng Y, Wang A, Wu L. A Classification Method for Electronic Components Based on Siamese Network. SENSORS (BASEL, SWITZERLAND) 2022; 22:6478. [PMID: 36080937 PMCID: PMC9460278 DOI: 10.3390/s22176478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/18/2022] [Accepted: 08/25/2022] [Indexed: 06/15/2023]
Abstract
In the field of electronics manufacturing, electronic component classification facilitates the management and recycling of the functional and valuable electronic components in electronic waste. Current electronic component classification methods are mainly based on deep learning, which requires a large number of samples to train the model. Owing to the wide variety of electronic components, collecting datasets is a time-consuming and laborious process. This study proposed a Siamese network-based classification method to solve the electronic component classification problem for a few samples. First, an improved visual geometry group 16 (VGG-16) model was proposed as the feature extraction part of the Siamese neural network to improve the recognition performance of the model under small samples. Then, a novel channel correlation loss function that allows the model to learn the correlation between different channels in the feature map was designed to further improve the generalization performance of the model. Finally, the nearest neighbor algorithm was used to complete the classification work. The experimental results show that the proposed method can achieve high classification accuracy under small sample conditions and is robust for electronic components with similar appearances. This improves the classification quality of electronic components and reduces the training sample collection cost.
Collapse
Affiliation(s)
| | - Aimin Wang
- Correspondence: ; Tel.: +86-135-2266-2896
| | | |
Collapse
|
16
|
Wang Q, Han T, Gao J, Yuan Y. Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3238-3250. [PMID: 33502985 DOI: 10.1109/tnnls.2021.3051371] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning. When it comes to specific tasks, we find that the domain shifts are reflected in model parameters' differences. To describe the domain gap directly at the parameter level, we propose a neuron linear transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters. Finally, the target neuron is generated via a linear transformation. Extensive experiments and analysis on six real-world data sets validate that NLT achieves top performance compared with other domain adaptation methods. An ablation study also shows that the NLT is robust and more effective than supervised and fine-tune training. Code is available at https://github.com/taohan10200/NLT.
Collapse
|
17
|
Liu H, Zhao B, Ji M, Li M, Liu P. GreedyFool: Multi-Factor Imperceptibility and Its Application to Designing a Black-box Adversarial Attack. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
Zhao X, Wang G, He Z, Jiang H. A survey of moving object detection methods: a practical perspective. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
19
|
Tan K, Xu TB, Wei Z. IMSiam: IoU-aware Matching-adaptive Siamese network for object tracking. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
20
|
Huang Y, Li Y, Heyes T, Jourjon G, Cheng A, Seneviratne S, Thilakarathna K, Webb D, Xu RYD. Task adaptive siamese neural networks for open-set recognition of encrypted network traffic with bidirectional dropout. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
21
|
García-Pulido JA, Pajares G, Dormido S. UAV Landing Platform Recognition Using Cognitive Computation Combining Geometric Analysis and Computer Vision Techniques. Cognit Comput 2022. [DOI: 10.1007/s12559-021-09962-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractUnmanned aerial vehicles (UAVs) are excellent tools with extensive demand. During the last phase of landing, they require additional support to that of GPS. This can be achieved through the UAV’s perception system based on its on-board camera and intelligence, and with which decisions can be made as to how to land on a platform (target). A cognitive computation approach is proposed to recognize this target that has been specifically designed to translate human reasoning into computational procedures by computing two probabilities of detection which are combined considering the fuzzy set theory for proper decision-making. The platform design is based on: (1) spectral information in the visible range which are uncommon colors in the UAV’s operating environments (indoors and outdoors) and (2) specific figures in the foreground, which allow partial perception of each figure. We exploit color image properties from specific-colored figures embedded on the platform and which are identified by applying image processing and pattern recognition techniques, including Euclidean Distance Smart Geometric Analysis, to identify the platform in a very efficient and reliable manner. The test strategy uses 800 images captured with a smartphone onboard a quad-rotor UAV. The results verify the proposed method outperforms existing strategies, especially those that do not use color information. Platform recognition is also possible even with only a partial view of the target, due to image capture under adverse conditions. This demonstrates the effectiveness and robustness of the proposed cognitive computing-based perception system.
Collapse
|
22
|
Chan S, Tao J, Zhou X, Bai C, Zhang X. Siamese Implicit Region Proposal Network With Compound Attention for Visual Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1882-1894. [PMID: 35139020 DOI: 10.1109/tip.2022.3148876] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently, siamese-based trackers have achieved significant successes. However, those trackers are restricted by the difficulty of learning consistent feature representation with the object. To address the above challenge, this paper proposes a novel siamese implicit region proposal network with compound attention for visual tracking. First, an implicit region proposal (IRP) module is designed by combining a novel pixel-wise correlation method. This module can aggregate feature information of different regions that are similar to the pre-defined anchor boxes in Region Proposal Network. To this end, the adaptive feature receptive fields then can be obtained by linear fusion of features from different regions. Second, a compound attention module including a channel and non-local attention is raised to assist the IRP module to perform a better perception of the scale and shape of the object. The channel attention is applied for mining the discriminative information of the object to handle the background clutters of the template, while non-local attention is trained to aggregate the contextual information to learn the semantic range of the object. Finally, experimental results demonstrate that the proposed tracker achieves state-of-the-art performance on six challenging benchmark tests, including VOT-2018, VOT-2019, OTB-100, GOT-10k, LaSOT, and TrackingNet. Further, our obtained results demonstrate that the proposed approach can be run at an average speed of 72 FPS in real time.
Collapse
|
23
|
Sun B, Ren Y, Lu X. Semisupervised Consistent Projection Metric Learning for Person Reidentification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:738-747. [PMID: 32310811 DOI: 10.1109/tcyb.2020.2979262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person reidentification is a hot topic in the computer vision field. Many efforts have been paid on modeling a discriminative distance metric. However, existing metric-learning-based methods are a lack of generalization. In this article, the poor generalization of the metric model is argued as the biased estimation problem that the independent identical distribution hypothesis is not valid. The verification experimental result shows that there is a sharp difference between the training and test samples in the metric subspace. A semisupervised consistent projection metric-learning method is proposed to ease the biased estimation problem by learning a consistent constrained metric subspace in which the identified pairs are forced to follow the distribution of the positive training pairs. First, a semisupervised method is proposed to generate potential matching pairs from the k -nearest neighbors of test samples. The potential matching pairs are used to estimate the distances' distribution center of the positive test pairs. Second, the metric subspace is improved by forcing this estimation to be close to the center of the positive training pairs. Finally, extensive experiments are conducted on five datasets and the results demonstrate that the proposed method reaches the best performance, especially on the rank-1 identification rate.
Collapse
|
24
|
Li J, Wang D, Liu X, Shi Z, Wang M. Two-Branch Attention Network via Efficient Semantic Coupling for One-Shot Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:341-351. [PMID: 34748491 DOI: 10.1109/tip.2021.3124668] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Over the past few years, Convolutional Neural Networks (CNNs) have achieved remarkable advancement for the tasks of one-shot image classification. However, the lack of effective attention modeling has limited its performance. In this paper, we propose a Two-branch (Content-aware and Position-aware) Attention (CPA) Network via an Efficient Semantic Coupling module for attention modeling. Specifically, we harness content-aware attention to model the characteristic features (e.g., color, shape, texture) as well as position-aware attention to model the spatial position weights. In addition, we exploit support images to improve the learning of attention for the query images. Similarly, we also use query images to enhance the attention model of the support set. Furthermore, we design a local-global optimizing framework that further improves the recognition accuracy. The extensive experiments on four common datasets (miniImageNet, tieredImageNet, CUB-200-2011, CIFAR-FS) with three popular networks (DPGN, RelationNet and IFSL) demonstrate that our devised CPA module equipped with local-global Two-stream framework (CPAT) can achieve state-of-the-art performance, with a significant improvement in accuracy of 3.16% on CUB-200-2011 in particular.
Collapse
|
25
|
Learning spatial-channel regularization jointly with correlation filter for visual tracking. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.04.146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
26
|
Zhang Y, Wang T, Liu K, Zhang B, Chen L. Recent advances of single-object tracking methods: A brief survey. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
27
|
Lin G, Zhao S, Shen J. Video person re-identification with global statistic pooling and self-attention distillation. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.05.111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
28
|
Towards accurate estimation for visual object tracking with multi-hierarchy feature aggregation. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
29
|
Yang Y, Xing W, Wang D, Zhang S, Yu Q, Wang L. AEVRNet: Adaptive exploration network with variance reduced optimization for visual tracking. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
30
|
Carranza-García M, Lara-Benítez P, García-Gutiérrez J, Riquelme JC. Enhancing object detection for autonomous driving by optimizing anchor generation and addressing class imbalance. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
31
|
Zhang S, Gao H, Rao Q. Defense Against Adversarial Attacks by Reconstructing Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6117-6129. [PMID: 34197323 DOI: 10.1109/tip.2021.3092582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Convolutional neural networks (CNNs) are vulnerable to being deceived by adversarial examples generated by adding small, human-imperceptible perturbations to a clean image. In this paper, we propose an image reconstruction network that reconstructs an input adversarial example into a clean output image to defend against such adversarial attacks. Due to the powerful learning capabilities of the residual block structure, our model can learn a precise mapping from adversarial examples to reconstructed examples. The use of a perceptual loss greatly suppresses the error amplification effect and improves the performance of our reconstruction network. In addition, by adding randomization layers to the end of the network, the effects of additional noise are further suppressed, especially for iterative attacks. Our model has the following four advantages. 1) It greatly reduces the impact of adversarial perturbations while having little influence on the prediction performance of clean images. 2) During inference phase, it performs better than most existing model-agnostic defense methods. 3) It has better generalization capability. 4) It can be flexibly combined with other methods, such as adversarially trained models.
Collapse
|
32
|
Zhang D, Zheng Z, Li M, Liu R. CSART: Channel and spatial attention-guided residual learning for real-time object tracking. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.11.046] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
33
|
Dong X, Shen J, Wang W, Shao L, Ling H, Porikli F. Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1515-1529. [PMID: 31796388 DOI: 10.1109/tpami.2019.2956703] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Hyperparameters are numerical pre-sets whose values are assigned prior to the commencement of a learning process. Selecting appropriate hyperparameters is often critical for achieving satisfactory performance in many vision problems, such as deep learning-based visual object tracking. However, it is often difficult to determine their optimal values, especially if they are specific to each video input. Most hyperparameter optimization algorithms tend to search a generic range and are imposed blindly on all sequences. In this paper, we propose a novel dynamical hyperparameter optimization method that adaptively optimizes hyperparameters for a given sequence using an action-prediction network leveraged on continuous deep Q-learning. Since the observation space for object tracking is significantly more complex than those in traditional control problems, existing continuous deep Q-learning algorithms cannot be directly applied. To overcome this challenge, we introduce an efficient heuristic strategy to handle high dimensional state space, while also accelerating the convergence behavior. The proposed algorithm is applied to improve two representative trackers, a Siamese-based one and a correlation-filter-based one, to evaluate its generalizability. Their superior performances on several popular benchmarks are clearly demonstrated. Our source code is available at https://github.com/shenjianbing/dqltracking.
Collapse
|
34
|
A Robust Quadruplet and Faster Region-Based CNN for UAV Video-Based Multiple Object Tracking in Crowded Environment. ELECTRONICS 2021. [DOI: 10.3390/electronics10070795] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) videos has faced several challenges such as motion capture and appearance, clustering, object variation, high altitudes, and abrupt motion. Consequently, the volume of objects captured by the UAV is usually quite small, and the target object appearance information is not always reliable. To solve these issues, a new technique is presented to track objects based on a deep learning technique that attains state-of-the-art performance on standard datasets, such as Stanford Drone and Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking (UAVDT) datasets. The proposed faster RCNN (region-based convolutional neural network) framework was enhanced by integrating a series of activities, including the proper calibration of key parameters, multi-scale training, hard negative mining, and feature collection to improve the region-based CNN baseline. Furthermore, a deep quadruplet network (DQN) was applied to track the movement of the captured objects from the crowded environment, and it was modelled to utilize new quadruplet loss function in order to study the feature space. A deep 6 Rectified linear units (ReLU) convolution was used in the faster RCNN to mine spatial–spectral features. The experimental results on the standard datasets demonstrated a high performance accuracy. Thus, the proposed method can be used to detect multiple objects and track their trajectories with a high accuracy.
Collapse
|
35
|
|
36
|
|
37
|
|
38
|
Guan Q, Huang Y, Luo Y, Liu P, Xu M, Yang Y. Discriminative Feature Learning for Thorax Disease Classification in Chest X-ray Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2476-2487. [PMID: 33497335 DOI: 10.1109/tip.2021.3052711] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This paper focuses on the thorax disease classification problem in chest X-ray (CXR) images. Different from the generic image classification task, a robust and stable CXR image analysis system should consider the unique characteristics of CXR images. Particularly, it should be able to: 1) automatically focus on the disease-critical regions, which usually are of small sizes; 2) adaptively capture the intrinsic relationships among different disease features and utilize them to boost the multi-label disease recognition rates jointly. In this paper, we propose to learn discriminative features with a two-branch architecture, named ConsultNet, to achieve those two purposes simultaneously. ConsultNet consists of two components. First, an information bottleneck constrained feature selector extracts critical disease-specific features according to the feature importance. Second, a spatial-and-channel encoding based feature integrator enhances the latent semantic dependencies in the feature space. ConsultNet fuses these discriminative features to improve the performance of thorax disease classification in CXRs. Experiments conducted on the ChestX-ray14 and CheXpert dataset demonstrate the effectiveness of the proposed method.
Collapse
|
39
|
Yang F, Li X, Shen J. MSB-FCN: Multi-Scale Bidirectional FCN for Object Skeleton Extraction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2301-2312. [PMID: 33226943 DOI: 10.1109/tip.2020.3038483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The performance of state-of-the-art object skeleton detection (OSD) methods have been greatly boosted by Convolutional Neural Networks (CNNs). However, the most existing CNN-based OSD methods rely on a 'skip-layer' structure where low-level and high-level features are combined to gather multi-level contextual information. Unfortunately, as shallow features tend to be noisy and lack semantic knowledge, they will cause errors and inaccuracy. Therefore, in order to improve the accuracy of object skeleton detection, we propose a novel network architecture, the Multi-Scale Bidirectional Fully Convolutional Network (MSB-FCN), to better gather and enhance multi-scale high-level contextual information. The advantage is that only deep features are used to construct multi-scale feature representations along with a bidirectional structure for better capturing contextual knowledge. This enables the proposed MSB-FCN to learn semantic-level information from different sub-regions. Moreover, we introduce dense connections into the bidirectional structure to ensure that the learning process at each scale can directly encode information from all other scales. An attention pyramid is also integrated into our MSB-FCN to dynamically control information propagation and reduce unreliable features. Extensive experiments on various benchmarks demonstrate that the proposed MSB-FCN achieves significant improvements over the state-of-the-art algorithms.
Collapse
|
40
|
Guo Q, Feng W, Gao R, Liu Y, Wang S. Exploring the Effects of Blur and Deblurring to Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1812-1824. [PMID: 33417542 DOI: 10.1109/tip.2020.3045630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The existence of motion blur can inevitably influence the performance of visual object tracking. However, in contrast to the rapid development of visual trackers, the quantitative effects of increasing levels of motion blur on the performance of visual trackers still remain unstudied. Meanwhile, although image-deblurring can produce visually sharp videos for pleasant visual perception, it is also unknown whether visual object tracking can benefit from image deblurring or not. In this paper, we present a Blurred Video Tracking (BVT) benchmark to address these two problems, which contains a large variety of videos with different levels of motion blurs, as well as ground-truth tracking results. To explore the effects of blur and deblurring to visual object tracking, we extensively evaluate 25 trackers on the proposed BVT benchmark and obtain several new interesting findings. Specifically, we find that light motion blur may improve the accuracy of many trackers, but heavy blur usually hurts the tracking performance. We also observe that image deblurring is helpful to improve tracking accuracy on heavily-blurred videos but hurts the performance of lightly-blurred videos. According to these observations, we then propose a new general GAN-based scheme to improve a tracker's robustness to motion blur. In this scheme, a fine-tuned discriminator can effectively serve as an adaptive blur assessor to enable selective frames deblurring during the tracking process. We use this scheme to successfully improve the accuracy of 6 state-of-the-art trackers on motion-blurred videos.
Collapse
|
41
|
Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020; 2020:8841681. [PMID: 33293945 PMCID: PMC7700038 DOI: 10.1155/2020/8841681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 11/03/2020] [Accepted: 11/07/2020] [Indexed: 11/19/2022]
Abstract
Visual saliency prediction for RGB-D images is more challenging than that for their RGB counterparts. Additionally, very few investigations have been undertaken concerning RGB-D-saliency prediction. The proposed study presents a method based on a hierarchical multimodal adaptive fusion (HMAF) network to facilitate end-to-end prediction of RGB-D saliency. In the proposed method, hierarchical (multilevel) multimodal features are first extracted from an RGB image and depth map using a VGG-16-based two-stream network. Subsequently, the most significant hierarchical features of the said RGB image and depth map are predicted using three two-input attention modules. Furthermore, adaptive fusion of saliencies concerning the above-mentioned fused saliency features of different levels (hierarchical fusion saliency features) can be accomplished using a three-input attention module to facilitate high-accuracy RGB-D visual saliency prediction. Comparisons based on the application of the proposed HMAF-based approach against those of other state-of-the-art techniques on two challenging RGB-D datasets demonstrate that the proposed method outperforms other competing approaches consistently by a considerable margin.
Collapse
|
42
|
Wang W, Pei W, Cao Q, Liu S, Lu G, Tai YW. Push for Center Learning via Orthogonalization and Subspace Masking for Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:907-920. [PMID: 33259297 DOI: 10.1109/tip.2020.3036720] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Person re-identification aims to identify whether pairs of images belong to the same person or not. This problem is challenging due to large differences in camera views, lighting and background. One of the mainstream in learning CNN features is to design loss functions which reinforce both the class separation and intra-class compactness. In this paper, we propose a novel Orthogonal Center Learning method with Subspace Masking for person re-identification. We make the following contributions: 1) we develop a center learning module to learn the class centers by simultaneously reducing the intra-class differences and inter-class correlations by orthogonalization; 2) we introduce a subspace masking mechanism to enhance the generalization of the learned class centers; and 3) we propose to integrate the average pooling and max pooling in a regularizing manner that fully exploits their powers. Extensive experiments show that our proposed method consistently outperforms the state-of-the-art methods on large-scale ReID datasets including Market-1501, DukeMTMC-ReID, CUHK03 and MSMT17.
Collapse
|
43
|
Gultekin S, Saha A, Ratnaparkhi A, Paisley J. MBA: Mini-Batch AUC Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5561-5574. [PMID: 32142457 DOI: 10.1109/tnnls.2020.2969527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of machine-learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large data sets remains an open challenge for this problem. This article proposes a novel approach to AUC maximization based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method.
Collapse
|
44
|
|
45
|
Wu D, Dong X, Shen J, Hoi SCH. Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4933-4945. [PMID: 31940565 DOI: 10.1109/tnnls.2019.2959129] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics, has been largely left untouched. In this article, we investigate the underestimation phenomenon in the recent twin delay deep deterministic actor-critic algorithm and theoretically demonstrate its existence. We also observe that this underestimation bias does indeed hurt performance in various experiments. Considering the opposite properties of single-critic and double-critic methods, we propose a novel triplet-average deep deterministic policy gradient algorithm that takes the weighted action value of three target critics to reduce the estimation bias. Given the connection between estimation bias and approximation error, we suggest averaging previous target values to reduce per-update error and further improve performance. Extensive empirical results over various continuous control tasks in OpenAI gym show that our approach outperforms the state-of-the-art methods.
Collapse
|
46
|
Hui L, Bo Z, Linquan H, Jiabao G, Yifan L. FoolChecker: A platform to evaluate the robustness of images against adversarial attacks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
47
|
Wang F, Xu Z, Gan Y, Vong CM, Liu Q. SCNet: Scale-aware coupling-structure network for efficient video object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
48
|
|
49
|
Henschel R, Von Marcard T, Rosenhahn B. Accurate Long-Term Multiple People Tracking using Video and Body-Worn IMUs. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8476-8489. [PMID: 32790627 DOI: 10.1109/tip.2020.3013801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors.
Collapse
|
50
|
Attention shake siamese network with auxiliary relocation branch for visual object tracking. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|