51
|
Zhang YF, Zheng J, Li L, Liu N, Jia W, Fan X, Xu C, He X. Rethinking feature aggregation for deep RGB-D salient object detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.079] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
52
|
Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2020; 2020:8841681. [PMID: 33293945 PMCID: PMC7700038 DOI: 10.1155/2020/8841681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 11/03/2020] [Accepted: 11/07/2020] [Indexed: 11/19/2022]
Abstract
Visual saliency prediction for RGB-D images is more challenging than that for their RGB counterparts. Additionally, very few investigations have been undertaken concerning RGB-D-saliency prediction. The proposed study presents a method based on a hierarchical multimodal adaptive fusion (HMAF) network to facilitate end-to-end prediction of RGB-D saliency. In the proposed method, hierarchical (multilevel) multimodal features are first extracted from an RGB image and depth map using a VGG-16-based two-stream network. Subsequently, the most significant hierarchical features of the said RGB image and depth map are predicted using three two-input attention modules. Furthermore, adaptive fusion of saliencies concerning the above-mentioned fused saliency features of different levels (hierarchical fusion saliency features) can be accomplished using a three-input attention module to facilitate high-accuracy RGB-D visual saliency prediction. Comparisons based on the application of the proposed HMAF-based approach against those of other state-of-the-art techniques on two challenging RGB-D datasets demonstrate that the proposed method outperforms other competing approaches consistently by a considerable margin.
Collapse
|
53
|
Tian Y, Zhang Y, Zhou D, Cheng G, Chen WG, Wang R. Triple attention network for video segmentation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.078] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
54
|
Gultekin S, Saha A, Ratnaparkhi A, Paisley J. MBA: Mini-Batch AUC Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5561-5574. [PMID: 32142457 DOI: 10.1109/tnnls.2020.2969527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of machine-learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large data sets remains an open challenge for this problem. This article proposes a novel approach to AUC maximization based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method.
Collapse
|
55
|
Li C, Cong R, Guo C, Li H, Zhang C, Zheng F, Zhao Y. A parallel down-up fusion network for salient object detection in optical remote sensing images. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.108] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
56
|
|
57
|
Wu D, Dong X, Shen J, Hoi SCH. Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:4933-4945. [PMID: 31940565 DOI: 10.1109/tnnls.2019.2959129] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The overestimation caused by function approximation is a well-known property in Q-learning algorithms, especially in single-critic models, which leads to poor performance in practical tasks. However, the opposite property, underestimation, which often occurs in Q-learning methods with double critics, has been largely left untouched. In this article, we investigate the underestimation phenomenon in the recent twin delay deep deterministic actor-critic algorithm and theoretically demonstrate its existence. We also observe that this underestimation bias does indeed hurt performance in various experiments. Considering the opposite properties of single-critic and double-critic methods, we propose a novel triplet-average deep deterministic policy gradient algorithm that takes the weighted action value of three target critics to reduce the estimation bias. Given the connection between estimation bias and approximation error, we suggest averaging previous target values to reduce per-update error and further improve performance. Extensive empirical results over various continuous control tasks in OpenAI gym show that our approach outperforms the state-of-the-art methods.
Collapse
|
58
|
Abstract
Kidney tumors represent a type of cancer that people of advanced age are more likely to develop. For this reason, it is important to exercise caution and provide diagnostic tests in the later stages of life. Medical imaging and deep learning methods are becoming increasingly attractive in this sense. Developing deep learning models to help physicians identify tumors with successful segmentation is of great importance. However, not many successful systems exist for soft tissue organs, such as the kidneys and the prostate, of which segmentation is relatively difficult. In such cases where segmentation is difficult, V-Net-based models are mostly used. This paper proposes a new hybrid model using the superior features of existing V-Net models. The model represents a more successful system with improvements in the encoder and decoder phases not previously applied. We believe that this new hybrid V-Net model could help the majority of physicians, particularly those focused on kidney and kidney tumor segmentation. The proposed model showed better performance in segmentation than existing imaging models and can be easily integrated into all systems due to its flexible structure and applicability. The hybrid V-Net model exhibited average Dice coefficients of 97.7% and 86.5% for kidney and tumor segmentation, respectively, and, therefore, could be used as a reliable method for soft tissue organ segmentation.
Collapse
|
59
|
Fan ZB, Zhang K. Visual order of Chinese ink paintings. Vis Comput Ind Biomed Art 2020; 3:23. [PMID: 33043407 PMCID: PMC7548264 DOI: 10.1186/s42492-020-00059-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 09/25/2020] [Indexed: 11/20/2022] Open
Abstract
Visual order is one of the key factors influencing the aesthetic judgment of artworks. This paper reports the results of evaluating the influence of extracted features on visual order in Chinese ink paintings, using a regression model. We use nine contemporary artists’ paintings as examples and extract features related to the visual order of their paintings. A questionnaire survey is conducted to collect people’s rating scores on the visual order. Via regression modeling, our research analyzes the significance of each feature and validates the influences of the features on the visual order.
Collapse
Affiliation(s)
- Zhen-Bao Fan
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, 75082-3021, USA
| | - Kang Zhang
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, 75082-3021, USA.
| |
Collapse
|
60
|
Li A, Qi J, Lu H. Multi-attention guided feature fusion network for salient object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.021] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
61
|
Zhang Q, Cui W, Shi Y, Zhang X, Liu Y. Attentive feature integration network for detecting salient objects in images. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.083] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
62
|
|
63
|
Jiao L, Zhang S, Dong S, Wang H. RFP-Net: Receptive field-based proposal generation network for object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.106] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
64
|
Ma H, Liao Q, Zhang J, Liu S, Xue JH. An α-Matte Boundary Defocus Model-Based Cascaded Network for Multi-focus Image Fusion. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8668-8679. [PMID: 32845840 DOI: 10.1109/tip.2020.3018261] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Capturing an all-in-focus image with a single camera is difficult since the depth of field of the camera is usually limited. An alternative method to obtain the all-in-focus image is to fuse several images that are focused at different depths. However, existing multi-focus image fusion methods cannot obtain clear results for areas near the focused/defocused boundary (FDB). In this paper, a novel α-matte boundary defocus model is proposed to generate realistic training data with the defocus spread effect precisely modeled, especially for areas near the FDB. Based on this α-matte defocus model and the generated data, a cascaded boundary-aware convolutional network termed MMF-Net is proposed and trained, aiming to achieve clearer fusion results around the FDB. Specifically, the MMF-Net consists of two cascaded subnets for initial fusion and boundary fusion. These two subnets are designed to first obtain a guidance map of FDB and then refine the fusion near the FDB. Experiments demonstrate that with the help of the new α-matte boundary defocus model, the proposed MMF-Net outperforms the state-of-the-art methods both qualitatively and quantitatively.
Collapse
|
65
|
Wang W, Shen J, Dong X, Borji A, Yang R. Inferring Salient Objects from Human Fixations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1913-1927. [PMID: 30892201 DOI: 10.1109/tpami.2019.2905607] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Previous research in visual saliency has been focused on two major types of models namely fixation prediction and salient object detection. The relationship between the two, however, has been less explored. In this work, we propose to employ the former model type to identify salient objects. We build a novel Attentive Saliency Network (ASNet)1 1.Available at: https://github.com/wenguanwang/ASNet. that learns to detect salient objects from fixations. The fixation map, derived at the upper network layers, mimics human visual attention mechanisms and captures a high-level understanding of the scene from a global view. Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner. ASNet is based on a hierarchy of convLSTMs that offers an efficient recurrent mechanism to sequentially refine the saliency features over multiple steps. Several loss functions, derived from existing saliency evaluation metrics, are incorporated to further boost the performance. Extensive experiments on several challenging datasets show that our ASNet outperforms existing methods and is capable of generating accurate segmentation maps with the help of the computed fixation prior. Our work offers a deeper insight into the mechanisms of attention and narrows the gap between salient object detection and fixation prediction.
Collapse
|
66
|
|
67
|
Shen J, Tang X, Dong X, Shao L. Visual Object Tracking by Hierarchical Attention Siamese Network. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3068-3080. [PMID: 31536029 DOI: 10.1109/tcyb.2019.2936503] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual tracking addresses the problem of localizing an arbitrary target in video according to the annotated bounding box. In this article, we present a novel tracking method by introducing the attention mechanism into the Siamese network to increase its matching discrimination. We propose a new way to compute attention weights to improve matching performance by a sub-Siamese network [Attention Net (A-Net)], which locates attentive parts for solving the searching problem. In addition, features in higher layers can preserve more semantic information while features in lower layers preserve more location information. Thus, in order to solve the tracking failure cases by the higher layer features, we fully utilize location and semantic information by multilevel features and propose a new way to fuse multiscale response maps from each layer to obtain a more accurate position estimation of the object. We further propose a hierarchical attention Siamese network by combining the attention weights and multilayer integration for tracking. Our method is implemented with a pretrained network which can outperform most well-trained Siamese trackers even without any fine-tuning and online updating. The comparison results with the state-of-the-art methods on popular tracking benchmarks show that our method achieves better performance. Our source code and results will be available at https://github.com/shenjianbing/HASN.
Collapse
|
68
|
Li T, Song H, Zhang K, Liu Q. Recurrent reverse attention guided residual learning for saliency object detection. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
69
|
Tang Y, Yang X, Wang N, Song B, Gao X. CGAN-TM: A novel domain-to-domain transferring method for person re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5641-5651. [PMID: 32286985 DOI: 10.1109/tip.2020.2985545] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Person re-identification (re-ID) is a technique aiming to recognize person cross different cameras. Although some supervised methods have achieved favorable performance, they are far from practical application owing to the lack of labeled data. Thus, unsupervised person re-ID methods are in urgent need. Generally, the commonly used approach in existing unsupervised methods is to first utilize the source image dataset for generating a model in supervised manner, and then transfer the source image domain to the target image domain. However, images may lose their identity information after translation, and the distributions between different domains are far away. To solve these problems, we propose an image domain-to-domain translation method by keeping pedestrian's identity information and pulling closer the domains' distributions for unsupervised person re-ID tasks. Our work exploits the CycleGAN to transfer the existing labeled image domain to the unlabeled image domain. Specially, a Self-labeled Triplet Net is proposed to maintain the pedestrian identity information, and maximum mean discrepancy is introduced to pull the domain distribution closer. Extensive experiments have been conducted and the results demonstrate that the proposed method performs superiorly than the state-ofthe- art unsupervised methods on DukeMTMC-reID and Market- 1501.
Collapse
|
70
|
|
71
|
Yan M, Wang J, Li J, Zhang K, Yang Z. Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
72
|
Du Y, Yan Y, Chen S, Hua Y. Object-adaptive LSTM network for real-time visual tracking with adversarial data augmentation. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.12.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
73
|
Ruan W, Liang C, Yu Y, Chen J, Hu R. SIST: Online Scale-Adaptive Object tracking with Stepwise Insight. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
74
|
Cheng Y, Zhang X, Lu F, Lu F, Sato Y. Gaze Estimation by Exploring Two-Eye Asymmetry. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5259-5272. [PMID: 32224460 DOI: 10.1109/tip.2020.2982828] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Eye gaze estimation is increasingly demanded by recent intelligent systems to facilitate a range of interactive applications. Unfortunately, learning the highly complicated regression from a single eye image to the gaze direction is not trivial. Thus, the problem is yet to be solved efficiently. Inspired by the two-eye asymmetry as two eyes of the same person may appear uneven, we propose the face-based asymmetric regression-evaluation network (FARE-Net) to optimize the gaze estimation results by considering the difference between left and right eyes. The proposed method includes one face-based asymmetric regression network (FAR-Net) and one evaluation network (E-Net). The FAR-Net predicts 3D gaze directions for both eyes and is trained with the asymmetric mechanism, which asymmetrically weights and sums the loss generated by two-eye gaze directions. With the asymmetric mechanism, the FAR-Net utilizes the eyes that can achieve high performance to optimize network. The E-Net learns the reliabilities of two eyes to balance the learning of the asymmetric mechanism and symmetric mechanism. Our FARENet achieves leading performances on MPIIGaze, EyeDiap and RT-Gene datasets. Additionally, we investigate the effectiveness of FARE-Net by analyzing the distribution of errors and ablation study.
Collapse
|
75
|
Yang Z, Yu H, Feng M, Sun W, Lin X, Sun M, Mao ZH, Mian A. Small Object Augmentation of Urban Scenes for Real-Time Semantic Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:5175-5190. [PMID: 32191886 DOI: 10.1109/tip.2020.2976856] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Semantic segmentation is a key step in scene understanding for autonomous driving. Although deep learning has significantly improved the segmentation accuracy, current highquality models such as PSPNet and DeepLabV3 are inefficient given their complex architectures and reliance on multi-scale inputs. Thus, it is difficult to apply them to real-time or practical applications. On the other hand, existing real-time methods cannot yet produce satisfactory results on small objects such as traffic lights, which are imperative to safe autonomous driving. In this paper, we improve the performance of real-time semantic segmentation from two perspectives, methodology and data. Specifically, we propose a real-time segmentation model coined Narrow Deep Network (NDNet) and build a synthetic dataset by inserting additional small objects into the training images. The proposed method achieves 65.7% mean intersection over union (mIoU) on the Cityscapes test set with only 8.4G floatingpoint operations (FLOPs) on 1024×2048 inputs. Furthermore, by re-training the existing PSPNet and DeepLabV3 models on our synthetic dataset, we obtained an average 2% mIoU improvement on small objects.
Collapse
|
76
|
|
77
|
Wu B, Akinola I, Gupta A, Xu F, Varley J, Watkins-Valls D, Allen PK. Generative Attention Learning: a “GenerAL” framework for high-performance multi-fingered grasping in clutter. Auton Robots 2020. [DOI: 10.1007/s10514-020-09907-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
78
|
Tang Y, Zou W, Hua Y, Jin Z, Li X. Video salient object detection via spatiotemporal attention neural networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.064] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
79
|
|
80
|
|
81
|
Wang Z, Tian G. Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.10.063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
82
|
Li L, Zhu H, Zhao S, Ding G, Lin W. Personality-assisted Multi-task Learning for Generic and Personalized Image Aesthetics Assessment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 29:3898-3910. [PMID: 31995495 DOI: 10.1109/tip.2020.2968285] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Traditional image aesthetics assessment (IAA) approaches mainly predict the average aesthetic score of an image. However, people tend to have different tastes on image aesthetics, which is mainly determined by their subjective preferences. As an important subjective trait, personality is believed to be a key factor in modeling individual's subjective preference. In this paper, we present a personality-assisted multi-task deep learning framework for both generic and personalized image aesthetics assessment. The proposed framework comprises two stages. In the first stage, a multi-task learning network with shared weights is proposed to predict the aesthetics distribution of an image and Big-Five (BF) personality traits of people who like the image. The generic aesthetics score of the image can be generated based on the predicted aesthetics distribution. In order to capture the common representation of generic image aesthetics and people's personality traits, a Siamese network is trained using aesthetics data and personality data jointly. In the second stage, based on the predicted personality traits and generic aesthetics of an image, an inter-task fusion is introduced to generate individual's personalized aesthetic scores on the image. The performance of the proposed method is evaluated using two public image aesthetics databases. The experimental results demonstrate that the proposed method outperforms the state-of-the-arts in both generic and personalized IAA tasks.
Collapse
|
83
|
|
84
|
|
85
|
Cong R, Lei J, Fu H, Porikli F, Huang Q, Hou C. Video Saliency Detection via Sparsity-Based Reconstruction and Propagation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4819-4831. [PMID: 31059438 DOI: 10.1109/tip.2019.2910377] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Video saliency detection aims to continuously discover the motion-related salient objects from the video sequences. Since it needs to consider the spatial and temporal constraints jointly, video saliency detection is more challenging than image saliency detection. In this paper, we propose a new method to detect the salient objects in video based on sparse reconstruction and propagation. With the assistance of novel static and motion priors, a single-frame saliency model is first designed to represent the spatial saliency in each individual frame via the sparsity-based reconstruction. Then, through a progressive sparsity-based propagation, the sequential correspondence in the temporal space is captured to produce the inter-frame saliency map. Finally, these two maps are incorporated into a global optimization model to achieve spatio-temporal smoothness and global consistency of the salient object in the whole video. The experiments on three large-scale video saliency datasets demonstrate that the proposed method outperforms the state-of-the-art algorithms both qualitatively and quantitatively.
Collapse
|
86
|
Huang M, Liu Z, Ye L, Zhou X, Wang Y. Saliency detection via multi-level integration and multi-scale fusion neural networks. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.054] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
87
|
Mustafa A, Khan SH, Hayat M, Shen J, Shao L. Image Super-Resolution as a Defense Against Adversarial Attacks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1711-1724. [PMID: 31545722 DOI: 10.1109/tip.2019.2940533] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Convolutional Neural Networks have achieved significant success across multiple computer vision tasks. However, they are vulnerable to carefully crafted, human-imperceptible adversarial noise patterns which constrain their deployment in critical security-sensitive systems. This paper proposes a computationally efficient image enhancement approach that provides a strong defense mechanism to effectively mitigate the effect of such adversarial perturbations. We show that deep image restoration networks learn mapping functions that can bring off-the-manifold adversarial samples onto the natural image manifold, thus restoring classification towards correct classes. A distinguishing feature of our approach is that, in addition to providing robustness against attacks, it simultaneously enhances image quality and retains models performance on clean images. Furthermore, the proposed method does not modify the classifier or requires a separate mechanism to detect adversarial images. The effectiveness of the scheme has been demonstrated through extensive experiments, where it has proven a strong defense in gray-box settings. The proposed scheme is simple and has the following advantages: (1) it does not require any model training or parameter optimization, (2) it complements other existing defense mechanisms, (3) it is agnostic to the attacked model and attack type and (4) it provides superior performance across all popular attack algorithms. Our codes are publicly available at https://github.com/aamir-mustafa/super-resolution-adversarial-defense.
Collapse
|
88
|
Xiao Y, Li J, Du B, Wu J, Li X, Chang J, Zhou Y. Robust correlation filter tracking with multi-scale spatial view. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.05.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
89
|
Shen J, Dong X, Peng J, Jin X, Shao L, Porikli F. Submodular Function Optimization for Motion Clustering and Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2637-2649. [PMID: 30624228 DOI: 10.1109/tnnls.2018.2885591] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we propose a framework of maximizing quadratic submodular energy with a knapsack constraint approximately, to solve certain computer vision problems. The proposed submodular maximization problem can be viewed as a generalization of the classic 0/1 knapsack problem. Importantly, maximization of our knapsack constrained submodular energy function can be solved via dynamic programing. We further introduce a range-reduction step prior to dynamic programing as a two-stage procedure for more efficient maximization. In order to demonstrate the effectiveness of the proposed energy function and its maximization algorithm, we apply it to two representative computer vision tasks: image segmentation and motion trajectory clustering. Experimental results of image segmentation demonstrate that our method outperforms the classic segmentation algorithms of graph cuts and random walks. Moreover, our framework achieves better performance than state-of-the-art methods on the motion trajectory clustering task.
Collapse
|
90
|
|
91
|
Chen C, Wang G, Peng C, Zhang X, Qin H. Improved Robust Video Saliency Detection based on Long-term Spatial-temporal Information. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1090-1100. [PMID: 31449017 DOI: 10.1109/tip.2019.2934350] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This paper proposes to utilize supervised deep convolutional neural networks to take full advantage of the long-term spatial-temporal information in order to improve the video saliency detection performance. The conventional methods, which use the temporally neighbored frames solely, could easily encounter transient failure cases when the spatial-temporal saliency clues are less-trustworthy for a long period. To tackle the aforementioned limitation, we plan to identify those beyond-scope frames with trustworthy long-term saliency clues first and then align it with the current problem domain for an improved video saliency detection.
Collapse
|
92
|
Lai Q, Wang W, Sun H, Shen J. Video Saliency Prediction using Spatiotemporal Residual Attentive Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1113-1126. [PMID: 31449021 DOI: 10.1109/tip.2019.2936112] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e, effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion streams are tightly coupled via dense residual cross connections, which integrate appearance information with multi-layer, comprehensive motion features in a residual and dense way. Beyond traditional two-stream models learning appearance and motion features separately, such design allows early, multi-path information exchange between different domains, leading to a unified and powerful spatiotemporal learning architecture. For the second one, we propose a composite attention mechanism that learns multi-scale local attentions and global attention priors end-to-end. It is used for enhancing the fused spatiotemporal features via emphasizing important features in multi-scales. A lightweight convolutional Gated Recurrent Unit (convGRU), which is flexible for small training data situation, is used for long-term temporal characteristics modeling. Extensive experiments over four benchmark datasets clearly demonstrate the advantage of the proposed video saliency model over other competitors and the effectiveness of each component of our network. Our code and all the results will be available at https://github.com/ashleylqx/STRA-Net.
Collapse
|
93
|
Xie H, Lei H, He Y, Lei B. Deeply supervised full convolution network for HEp-2 specimen image segmentation. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.03.067] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
94
|
Dong X, Shen J, Wu D, Guo K, Jin X, Porikli F. Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:3516-3527. [PMID: 30762546 DOI: 10.1109/tip.2019.2898567] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In the same vein of discriminative one-shot learning, Siamese networks allow recognizing an object from a single exemplar with the same class label. However, they do not take advantage of the underlying structure of the data and the relationship among the multitude of samples as they only rely on the pairs of instances for training. In this paper, we propose a new quadruplet deep network to examine the potential connections among the training instances, aiming to achieve a more powerful representation. We design a shared network with four branches that receive a multi-tuple of instances as inputs and are connected by a novel loss function consisting of pair loss and triplet loss. According to the similarity metric, we select the most similar and the most dissimilar instances as the positive and negative inputs of triplet loss from each multi-tuple. We show that this scheme improves the training performance. Furthermore, we introduce a new weight layer to automatically select suitable combination weights, which will avoid the conflict between triplet and pair loss leading to worse performance. We evaluate our quadruplet framework by model-free tracking-by-detection of objects from a single initial exemplar in several visual object tracking benchmarks. Our extensive experimental analysis demonstrates that our tracker achieves superior performance with a real-time processing speed of 78 frames/s. Our source code is available.
Collapse
|
95
|
Xu W, Zhu L, Huang DS. DCDE: An Efficient Deep Convolutional Divergence Encoding Method for Human Promoter Recognition. IEEE Trans Nanobioscience 2019; 18:136-145. [PMID: 30624223 DOI: 10.1109/tnb.2019.2891239] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Efficient human promoter feature extraction is still a major challenge in genome analysis as it can better understand human gene regulation and will be useful for experimental guidance. Although many machine learning algorithms have been developed for eukaryotic gene recognition, performance on promoters is unsatisfactory due to the diverse nature. To extract discriminative features from human promoters, an efficient deep convolutional divergence encoding method (DCDE) is proposed based on statistical divergence (SD) and convolutional neural network (CNN). SD can help optimize kmer feature extraction for human promoters. CNN can also be used to automatically extract features in gene analysis. In DCDE, we first perform informative kmers settlement to encode original gene sequences. A series of SD methods can optimize the most discriminative kmers distributions while maintaining important positional information. Then, CNN is utilized to extract lower dimensional deep features by secondary encoding. Finally, we construct a hybrid recognition architecture with multiple support vector machines and a bilayer decision method. It is flexible to add new features or new models and can be extended to identify other genomic functional elements. The extensive experiments demonstrate that DCDE is effective in promoter encoding and can significantly improve the performance of promoter recognition.
Collapse
|
96
|
Zhang Z, Fu H, Dai H, Shen J, Pang Y, Shao L. ET-Net: A Generic Edge-aTtention Guidance Network for Medical Image Segmentation. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-32239-7_49] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
97
|
|