51
|
Yue H, Guo J, Yin X, Zhang Y, Zheng S, Zhang Z, Li C. Salient object detection in low-light images via functional optimization-inspired feature polishing. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
52
|
Zhang J, Fan DP, Dai Y, Anwar S, Saleh F, Aliakbarian S, Barnes N. Uncertainty Inspired RGB-D Saliency Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5761-5779. [PMID: 33856982 DOI: 10.1109/tpami.2021.3073564] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection which utilizes a latent variable to model the labeling variations. Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. Qualitative and quantitative results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps. The source code is publicly available via our project page: https://github.com/JingZhang617/UCNet.
Collapse
|
53
|
Zhu J, Zhang X, Fang X, Rahman MRU, Dong F, Li Y, Yan S, Tan P. Boosting RGB-D salient object detection with adaptively cooperative dynamic fusion network. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
54
|
Ren G, Yu Y, Liu H, Stathaki T. Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2022; 22:6188. [PMID: 36015947 PMCID: PMC9416116 DOI: 10.3390/s22166188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 06/12/2023]
Abstract
RGB-D salient object detection (SOD) demonstrates its superiority in detecting in complex environments due to the additional depth information introduced in the data. Inevitably, an independent stream is introduced to extract features from depth images, leading to extra computation and parameters. This methodology sacrifices the model size to improve the detection accuracy which may impede the practical application of SOD problems. To tackle this dilemma, we propose a dynamic knowledge distillation (DKD) method, along with a lightweight structure, which significantly reduces the computational burden while maintaining validity. This method considers the factors of both teacher and student performance within the training stage and dynamically assigns the distillation weight instead of applying a fixed weight on the student model. We also investigate the issue of RGB-D early fusion strategy in distillation and propose a simple noise elimination method to mitigate the impact of distorted training data caused by low quality depth maps. Extensive experiments are conducted on five public datasets to demonstrate that our method can achieve competitive performance with a fast inference speed (136FPS) compared to 12 prior methods.
Collapse
Affiliation(s)
- Guangyu Ren
- Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK
| | - Yinxiao Yu
- School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
| | - Hengyan Liu
- Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK
| | - Tania Stathaki
- Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
55
|
Modal complementary fusion network for RGB-T salient object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03950-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
56
|
Yan Y, Liu B, Lin W, Chen Y, Li K, Ou J, Fan C. MCCP: Multi-Collaboration Channel Pruning for Model Compression. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10984-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
57
|
Fan DP, Li T, Lin Z, Ji GP, Zhang D, Cheng MM, Fu H, Shen J. Re-Thinking Co-Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4339-4354. [PMID: 33600309 DOI: 10.1109/tpami.2021.3060412] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 40 cutting-edge algorithms, benchmarking 18 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community. The benchmark toolbox and results are available on our project page at https://dpfan.net/CoSOD3K.
Collapse
|
58
|
RGB-D saliency detection via complementary and selective learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03612-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
59
|
Giang TTH, Khai TQ, Im DY, Ryoo YJ. Fast Detection of Tomato Sucker Using Semantic Segmentation Neural Networks Based on RGB-D Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22145140. [PMID: 35890823 PMCID: PMC9320735 DOI: 10.3390/s22145140] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/01/2022] [Accepted: 07/06/2022] [Indexed: 05/14/2023]
Abstract
Tomato sucker or axillary shoots should be removed to increase the yield and reduce the disease on tomato plants. It is an essential step in the tomato plant care process. It is usually performed manually by farmers. An automated approach can save a lot of time and labor. In the literature review, we see that semantic segmentation is a process of recognizing or classifying each pixel in an image, and it can help machines recognize and localize tomato suckers. This paper proposes a semantic segmentation neural network that can detect tomato suckers quickly by the tomato plant images. We choose RGB-D images which capture not only the visual of objects but also the distance information from objects to the camera. We make a tomato RGB-D image dataset for training and evaluating the proposed neural network. The proposed semantic segmentation neural network can run in real-time at 138.2 frames per second. Its number of parameters is 680, 760, much smaller than other semantic segmentation neural networks. It can correctly detect suckers at 80.2%. It requires low system resources and is suitable for the tomato dataset. We compare it to other popular non-real-time and real-time networks on the accuracy, time of execution, and sucker detection to prove its better performance.
Collapse
Affiliation(s)
- Truong Thi Huong Giang
- Department of Electrical Engineering, Mokpo National University, Muan 58554, Korea; (T.T.H.G.); (T.Q.K.)
| | - Tran Quoc Khai
- Department of Electrical Engineering, Mokpo National University, Muan 58554, Korea; (T.T.H.G.); (T.Q.K.)
| | - Dae-Young Im
- Components & Materials R&D Group, Korea Institute of Industrial Technology, Gwangju 61012, Korea;
| | - Young-Jae Ryoo
- Department of Electrical and Control Engineering, Mokpo National University, Muan 58554, Korea
- Correspondence:
| |
Collapse
|
60
|
A2TPNet: Alternate Steered Attention and Trapezoidal Pyramid Fusion Network for RGB-D Salient Object Detection. ELECTRONICS 2022. [DOI: 10.3390/electronics11131968] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
RGB-D salient object detection (SOD) aims at locating the most eye-catching object in visual input by fusing complementary information of RGB modality and depth modality. Most of the existing RGB-D SOD methods integrate multi-modal features to generate the saliency map indiscriminately, ignoring the ambiguity between different modalities. To better use multi-modal complementary information and alleviate the negative impact of ambiguity among different modalities, this paper proposes a novel Alternate Steered Attention and Trapezoidal Pyramid Fusion Network (A2TPNet) for RGB-D SOD composed of Cross-modal Alternate Fusion Module (CAFM) and Trapezoidal Pyramid Fusion Module (TPFM). CAFM is focused on fusing cross-modal features, taking full consideration of the ambiguity between cross-modal data by an Alternate Steered Attention (ASA), and it reduces the interference of redundant information and non-salient features in the interactive process through a collaboration mechanism containing channel attention and spatial attention. TPFM endows the RGB-D SOD model with more powerful feature expression capabilities by combining multi-scale features to enhance the expressive ability of contextual semantics of the model. Extensive experimental results on five publicly available datasets demonstrate that the proposed model consistently outperforms 17 state-of-the-art methods.
Collapse
|
61
|
Zhao Z, Huang Z, Chai X, Wang J. Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10886-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
62
|
FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
63
|
CSA-Net: Deep Cross-Complementary Self Attention and Modality-Specific Preservation for Saliency Detection. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10875-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
64
|
Salient and consensus representation learning based incomplete multiview clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03530-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
65
|
Yu H, Li X, Feng Y, Han S. Multiple attentional path aggregation network for marine object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03622-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
66
|
Ji W, Yan G, Li J, Piao Y, Yao S, Zhang M, Cheng L, Lu H. DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2321-2336. [PMID: 35245195 DOI: 10.1109/tip.2022.3154931] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work, we propose a novel depth-induced multi-scale recurrent attention network for RGB-D saliency detection, named as DMRA. It achieves dramatic performance especially in complex scenarios. There are four main contributions of our network that are experimentally demonstrated to have significant practical merits. First, we design an effective depth refinement block using residual connections to fully extract and fuse cross-modal complementary cues from RGB and depth streams. Second, depth cues with abundant spatial information are innovatively combined with multi-scale contextual features for accurately locating salient objects. Third, a novel recurrent attention module inspired by Internal Generative Mechanism of human brain is designed to generate more accurate saliency results via comprehensively learning the internal semantic relation of the fused feature and progressively optimizing local details with memory-oriented scene understanding. Finally, a cascaded hierarchical feature fusion strategy is designed to promote efficient information interaction of multi-level contextual features and further improve the contextual representability of model. In addition, we introduce a new real-life RGB-D saliency dataset containing a variety of complex scenarios that has been widely used as a benchmark dataset in recent RGB-D saliency detection research. Extensive empirical experiments demonstrate that our method can accurately identify salient objects and achieve appealing performance against 18 state-of-the-art RGB-D saliency models on nine benchmark datasets.
Collapse
|
67
|
Xu Y, Yu X, Zhang J, Zhu L, Wang D. Weakly Supervised RGB-D Salient Object Detection With Prediction Consistency Training and Active Scribble Boosting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2148-2161. [PMID: 35196231 DOI: 10.1109/tip.2022.3151999] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
RGB-D salient object detection (SOD) has attracted increasingly more attention as it shows more robust results in complex scenes compared with RGB SOD. However, state-of-the-art RGB-D SOD approaches heavily rely on a large amount of pixel-wise annotated data for training. Such densely labeled annotations are often labor-intensive and costly. To reduce the annotation burden, we investigate RGB-D SOD from a weakly supervised perspective. More specifically, we use annotator-friendly scribble annotations as supervision signals for model training. Since scribble annotations are much sparser compared to ground-truth masks, some critical object structure information might be neglected. To preserve such structure information, we explicitly exploit the complementary edge information from two modalities (i.e., RGB and depth). Specifically, we leverage the dual-modal edge guidance and introduce a new network architecture with a dual-edge detection module and a modality-aware feature fusion module. In order to use the useful information of unlabeled pixels, we introduce a prediction consistency training scheme by comparing the predictions of two networks optimized by different strategies. Moreover, we develop an active scribble boosting strategy to provide extra supervision signals with negligible annotation cost, leading to significant SOD performance improvement. Extensive experiments on seven benchmarks validate the superiority of our proposed method. Remarkably, the proposed method with scribble annotations achieves competitive performance in comparison to fully supervised state-of-the-art methods.
Collapse
|
68
|
Wang F, Pan J, Xu S, Tang J. Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1285-1297. [PMID: 35015637 DOI: 10.1109/tip.2022.3140606] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
How to explore useful information from depth is the key success of the RGB-D saliency detection methods. While the RGB and depth images are from different domains, a modality gap will lead to unsatisfactory results for simple feature concatenation. Towards better performance, most methods focus on bridging this gap and designing different cross-modal fusion modules for features, while ignoring explicitly extracting some useful consistent information from them. To overcome this problem, we develop a simple yet effective RGB-D saliency detection method by learning discriminative cross-modality features based on the deep neural network. The proposed method first learns modality-specific features for RGB and depth inputs. And then we separately calculate the correlations of every pixel-pair in a cross-modality consistent way, i.e., the distribution ranges are consistent for the correlations calculated based on features extracted from RGB (RGB correlation) or depth inputs (depth correlation). From different perspectives, color or spatial, the RGB and depth correlations end up at the same point to depict how tightly each pixel-pair is related. Secondly, to complemently gather RGB and depth information, we propose a novel correlation-fusion to fuse RGB and depth correlations, resulting in a cross-modality correlation. Finally, the features are refined with both long-range cross-modality correlations and local depth correlations to predict salient maps. In which, the long-range cross-modality correlation provides context information for accurate localization, and the local depth correlation keeps good subtle structures for fine segmentation. In addition, a lightweight DepthNet is designed for efficient depth feature extraction. We solve the proposed network in an end-to-end manner. Both quantitative and qualitative experimental results demonstrate the proposed algorithm achieves favorable performance against state-of-the-art methods.
Collapse
|
69
|
RGB-T salient object detection via CNN feature and result saliency map fusion. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02984-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
70
|
Wang X, Zhu L, Tang S, Fu H, Li P, Wu F, Yang Y, Zhuang Y. Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1107-1119. [PMID: 34990359 DOI: 10.1109/tip.2021.3139232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.
Collapse
|
71
|
Chen T, Hu X, Xiao J, Zhang G, Wang S. CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06845-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
72
|
Wen H, Yan C, Zhou X, Cong R, Sun Y, Zheng B, Zhang J, Bao Y, Ding G. Dynamic Selective Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:9179-9192. [PMID: 34739374 DOI: 10.1109/tip.2021.3123548] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
RGB-D saliency detection is receiving more and more attention in recent years. There are many efforts have been devoted to this area, where most of them try to integrate the multi-modal information, i.e. RGB images and depth maps, via various fusion strategies. However, some of them ignore the inherent difference between the two modalities, which leads to the performance degradation when handling some challenging scenes. Therefore, in this paper, we propose a novel RGB-D saliency model, namely Dynamic Selective Network (DSNet), to perform salient object detection (SOD) in RGB-D images by taking full advantage of the complementarity between the two modalities. Specifically, we first deploy a cross-modal global context module (CGCM) to acquire the high-level semantic information, which can be used to roughly locate salient objects. Then, we design a dynamic selective module (DSM) to dynamically mine the cross-modal complementary information between RGB images and depth maps, and to further optimize the multi-level and multi-scale information by executing the gated and pooling based selection, respectively. Moreover, we conduct the boundary refinement to obtain high-quality saliency maps with clear boundary details. Extensive experiments on eight public RGB-D datasets show that the proposed DSNet achieves a competitive and excellent performance against the current 17 state-of-the-art RGB-D SOD models.
Collapse
|
73
|
Zhai Y, Fan DP, Yang J, Borji A, Shao L, Han J, Wang L. Bifurcated Backbone Strategy for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8727-8742. [PMID: 34613915 DOI: 10.1109/tip.2021.3116793] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-level feature fusion is a fundamental topic in computer vision. It has been exploited to detect, segment and classify objects at various scales. When multi-level features meet multi-modal cues, the optimal feature aggregation and multi-modal learning strategy become a hot potato. In this paper, we leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to devise a novel Bifurcated Backbone Strategy Network (BBS-Net). Our architecture, is simple, efficient, and backbone-independent. In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS). Second, we introduce a depth-enhanced module (DEM) to excavate informative depth cues from the channel and spatial views. Then, RGB and depth modalities are fused in a complementary way. Extensive experiments show that BBS-Net significantly outperforms 18 state-of-the-art (SOTA) models on eight challenging datasets under five evaluation measures, demonstrating the superiority of our approach (~4% improvement in S-measure vs . the top-ranked model: DMRA). In addition, we provide a comprehensive analysis on the generalization ability of different RGB-D datasets and provide a powerful training set for future research. The complete algorithm, benchmark results, and post-processing toolbox are publicly available at https://github.com/zyjwuyan/BBS-Net.
Collapse
|
74
|
Zhao Y, Zhao J, Li J, Chen X. RGB-D Salient Object Detection With Ubiquitous Target Awareness. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7717-7731. [PMID: 34478368 DOI: 10.1109/tip.2021.3108412] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Conventional RGB-D salient object detection methods aim to leverage depth as complementary information to find the salient regions in both modalities. However, the salient object detection results heavily rely on the quality of captured depth data which sometimes are unavailable. In this work, we make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework. This framework only relies on RGB data in the testing phase, utilizing captured depth data as supervision for representation learning. To construct our framework as well as achieving accurate salient detection results, we propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task: 1) a depth awareness module to excavate depth information and to mine ambiguous regions via adaptive depth-error weights, 2) a spatial-aware cross-modal interaction and a channel-aware cross-level interaction, exploiting the low-level boundary cues and amplifying high-level salient channels, and 3) a gated multi-scale predictor module to perceive the object saliency in different contextual scales. Besides its high performance, our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS. Experimental evidence demonstrates that our proposed network not only surpasses the state-of-the-art methods on five public RGB-D SOD benchmarks by a large margin, but also verifies its extensibility on five public RGB SOD benchmarks.
Collapse
|
75
|
Zhang Q, Wang S, Wang X, Sun Z, Kwong S, Jiang J. Geometry Auxiliary Salient Object Detection for Light Fields via Graph Neural Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7578-7592. [PMID: 34469299 DOI: 10.1109/tip.2021.3108018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Light field imaging, originated from the availability of light field capture technology, offers a wide range of applications in the field of computational vision. The capability of predicting salient objects of light fields remains technologically challenging due to its complicated geometry structure. In this paper, we propose a light field salient object detection approach that formulates the geometric coherence among multiple views of light fields as graphs, where the angular/central views represent the nodes and their relations compose the edges. The spatial and disparity correlations between multiple views are effectively explored through multi-scale graph neural networks, enabling the more comprehensive understanding of light field content and more representative and discriminative saliency features generation. Moreover, a multi-scale saliency feature consistency learning module is embedded to enhance the saliency features. Finally, an accurate salient object map is produced for the light field based upon the extracted features. In addition, we establish a new light field salient object detection dataset (CITYU-Lytro) that contains 817 light fields with diverse contents and their corresponding annotations, aiming to further promote the research on light field salient object detection. Quantitative and qualitative experiments demonstrate that the proposed method performs favorably compared with the state-of-the-art methods on the benchmark datasets.
Collapse
|
76
|
Zhou M, Cheng W, Huang H, Chen J. A Novel Approach to Automated 3D Spalling Defects Inspection in Railway Tunnel Linings Using Laser Intensity and Depth Information. SENSORS (BASEL, SWITZERLAND) 2021; 21:5725. [PMID: 34502618 PMCID: PMC8434528 DOI: 10.3390/s21175725] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/20/2021] [Accepted: 08/24/2021] [Indexed: 11/20/2022]
Abstract
The detection of concrete spalling is critical for tunnel inspectors to assess structural risks and guarantee the daily operation of the railway tunnel. However, traditional spalling detection methods mostly rely on visual inspection or camera images taken manually, which are inefficient and unreliable. In this study, an integrated approach based on laser intensity and depth features is proposed for the automated detection and quantification of concrete spalling. The Railway Tunnel Spalling Defects (RTSD) database, containing intensity images and depth images of the tunnel linings, is established via mobile laser scanning (MLS), and the Spalling Intensity Depurator Network (SIDNet) model is proposed for automatic extraction of the concrete spalling features. The proposed model is trained, validated and tested on the established RSTD dataset with impressive results. Comparison with several other spalling detection models shows that the proposed model performs better in terms of various indicators such as MPA (0.985) and MIoU (0.925). The extra depth information obtained from MLS allows for the accurate evaluation of the volume of detected spalling defects, which is beyond the reach of traditional methods. In addition, a triangulation mesh method is implemented to reconstruct the 3D tunnel lining model and visualize the 3D inspection results. As a result, a 3D inspection report can be outputted automatically containing quantified spalling defect information along with relevant spatial coordinates. The proposed approach has been conducted on several railway tunnels in Yunnan province, China and the experimental results have proved its validity and feasibility.
Collapse
Affiliation(s)
| | - Wen Cheng
- Key Laboratory of Geotechnical and Underground Engineering, Department of Geotechnical Engineering, Tongji University, Siping Road 1239, Shanghai 200092, China; (M.Z.); (H.H.); (J.C.)
| | | | | |
Collapse
|
77
|
Chen Z, Cong R, Xu Q, Huang Q. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7012-7024. [PMID: 33141667 DOI: 10.1109/tip.2020.3028289] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
There are two main issues in RGB-D salient object detection: (1) how to effectively integrate the complementarity from the cross-modal RGB-D data; (2) how to prevent the contamination effect from the unreliable depth map. In fact, these two problems are linked and intertwined, but the previous methods tend to focus only on the first problem and ignore the consideration of depth map quality, which may yield the model fall into the sub-optimal state. In this paper, we address these two issues in a holistic model synergistically, and propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity. By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner, and guide the fusion process of two modal data to prevent the contamination occurred. The gated multi-modality attention module in the fusion process exploits the attention mechanism with a gate controller to capture long-range dependencies from a cross-modal perspective. Experimental results compared with 16 state-of-the-art methods on 8 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively. https://github.com/JosephChenHub/DPANet.
Collapse
|
78
|
Wu Z, Su L, Huang Q. Decomposition and Completion Network for Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6226-6239. [PMID: 34242166 DOI: 10.1109/tip.2021.3093380] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recently, fully convolutional networks (FCNs) have made great progress in the task of salient object detection and existing state-of-the-arts methods mainly focus on how to integrate edge information in deep aggregation models. In this paper, we propose a novel Decomposition and Completion Network (DCN), which integrates edge and skeleton as complementary information and models the integrity of salient objects in two stages. In the decomposition network, we propose a cross multi-branch decoder, which iteratively takes advantage of cross-task aggregation and cross-layer aggregation to integrate multi-level multi-task features and predict saliency, edge, and skeleton maps simultaneously. In the completion network, edge and skeleton maps are further utilized to fill flaws and suppress noises in saliency maps via hierarchical structure-aware feature learning and multi-scale feature completion. Through jointly learning with edge and skeleton information for localizing boundaries and interiors of salient objects respectively, the proposed network generates precise saliency maps with uniformly and completely segmented salient objects. Experiments conducted on five benchmark datasets demonstrate that the proposed model outperforms existing networks. Furthermore, we extend the proposed model to the task of RGB-D salient object detection, and it also achieves state-of-the-art performance. The code is available at https://github.com/wuzhe71/DCN.
Collapse
|
79
|
Shi X, Yang Y, Liu Q. I Understand You: Blind 3D Human Attention Inference From the Perspective of Third-Person. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:6212-6225. [PMID: 34214041 DOI: 10.1109/tip.2021.3092842] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Inferring object-wise human attention in 3D space from the third-person perspective (e.g., a camera) is crucial to many visual tasks and applications, including human-robot collaboration, unmanned vehicle driving, etc. Challenges arise from classical human attention when human eyes are not visible to cameras, gaze point is outside the field of vision, or the gazed object is occluded by others in the 3D space. In this case, blind 3D human attention inference brings a new paradigm to the community. In this paper, we address these challenges by proposing a scene-behavior associated mechanism, in which both 3D scene and temporal behavior of human are adopted to infer object-wise human attention and its transition. Specifically, point cloud is reconstructed and used for the spatial representation of 3D scene, which is beneficial to handle the blind problem from the perspective of a camera. Based on this, in order to address the blind human attention inference without eye information, we propose a Sequential Skeleton Based Attention Network (S2BAN) for behavior-based attention modeling. As is embedded in the scene-behavior associated mechanism, the proposed S2BAN is built under the temporal architecture of Long-Short-Term-Memory (LSTM). Our network employs human skeleton as behavior representation, and maps it to the attention direction frame by frame, which makes attention inference a temporal-correlated issue. With the help of S2BAN, 3D gaze spot and further the attended objects can be obtained frame by frame via intersection and segmentation on the previously reconstructed point cloud. Finally, we conduct experiments from various aspects to verify the object-wise attention localization accuracy, the angular error of attention direction calculation, as well as the subjective results. The experimental results show that the proposed outperforms other competitors.
Collapse
|
80
|
CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01452-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
81
|
|
82
|
Tu Z, Li Z, Li C, Lang Y, Tang J. Multi-Interactive Dual-Decoder for RGB-Thermal Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:5678-5691. [PMID: 34125680 DOI: 10.1109/tip.2021.3087412] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RGB-thermal salient object detection (SOD) aims to segment the common prominent regions of visible image and corresponding thermal infrared image that we call it RGBT SOD. Existing methods don't fully explore and exploit the potentials of complementarity of different modalities and multi-type cues of image contents, which play a vital role in achieving accurate results. In this paper, we propose a multi-interactive dual-decoder to mine and model the multi-type interactions for accurate RGBT SOD. In specific, we first encode two modalities into multi-level multi-modal feature representations. Then, we design a novel dual-decoder to conduct the interactions of multi-level features, two modalities and global contexts. With these interactions, our method works well in diversely challenging scenarios even in the presence of invalid modality. Finally, we carry out extensive experiments on public RGBT and RGBD SOD datasets, and the results show that the proposed method achieves the outstanding performance against state-of-the-art algorithms. The source code has been released at: https://github.com/lz118/Multi-interactive-Dual-decoder.
Collapse
|
83
|
Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images. REMOTE SENSING 2021. [DOI: 10.3390/rs13112163] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although remarkable progress has been made in salient object detection (SOD) in natural scene images (NSI), the SOD of optical remote sensing images (RSI) still faces significant challenges due to various spatial resolutions, cluttered backgrounds, and complex imaging conditions, mainly for two reasons: (1) accurate location of salient objects; and (2) subtle boundaries of salient objects. This paper explores the inherent properties of multi-level features to develop a novel semantic-guided attention refinement network (SARNet) for SOD of NSI. Specifically, the proposed semantic guided decoder (SGD) roughly but accurately locates the multi-scale object by aggregating multiple high-level features, and then this global semantic information guides the integration of subsequent features in a step-by-step feedback manner to make full use of deep multi-level features. Simultaneously, the proposed parallel attention fusion (PAF) module combines cross-level features and semantic-guided information to refine the object’s boundary and highlight the entire object area gradually. Finally, the proposed network architecture is trained through an end-to-end fully supervised model. Quantitative and qualitative evaluations on two public RSI datasets and additional NSI datasets across five metrics show that our SARNet is superior to 14 state-of-the-art (SOTA) methods without any post-processing.
Collapse
|
84
|
Fu K, Fan DP, Ji GP, Zhao Q, Shen J, Zhu C. Siamese Network for RGB-D Salient Object Detection and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; PP:1-1. [PMID: 33861691 DOI: 10.1109/tpami.2021.3073689] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using 5 popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the SOTAs by an average of ~2.0% (F-measure) across 7 challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T SOD and video SOD, achieving comparable or better performance.
Collapse
|
85
|
Wang X, Li S, Chen C, Hao A, Qin H. Depth quality-aware selective saliency fusion for RGB-D image salient object detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
86
|
Li L, Zhao S, Sun R, Chai X, Zheng S, Chen X, Lv Z. AFI-Net: Attention-Guided Feature Integration Network for RGBD Saliency Detection. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8861446. [PMID: 33859681 PMCID: PMC8026315 DOI: 10.1155/2021/8861446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 03/04/2021] [Accepted: 03/17/2021] [Indexed: 11/17/2022]
Abstract
This article proposes an innovative RGBD saliency model, that is, attention-guided feature integration network, which can extract and fuse features and perform saliency inference. Specifically, the model first extracts multimodal and level deep features. Then, a series of attention modules are deployed to the multilevel RGB and depth features, yielding enhanced deep features. Next, the enhanced multimodal deep features are hierarchically fused. Lastly, the RGB and depth boundary features, that is, low-level spatial details, are added to the integrated feature to perform saliency inference. The key points of the AFI-Net are the attention-guided feature enhancement and the boundary-aware saliency inference, where the attention module indicates salient objects coarsely, and the boundary information is used to equip the deep feature with more spatial details. Therefore, salient objects are well characterized, that is, well highlighted. The comprehensive experiments on five challenging public RGBD datasets clearly exhibit the superiority and effectiveness of the proposed AFI-Net.
Collapse
Affiliation(s)
- Liming Li
- School of Information Science and Technology, Donghua University, Shanghai 201620, China
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Shuguang Zhao
- School of Information Science and Technology, Donghua University, Shanghai 201620, China
| | - Rui Sun
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Xiaodong Chai
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Shubin Zheng
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Xingjie Chen
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China
| | - Zhaomin Lv
- School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China
| |
Collapse
|
87
|
Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H. Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3528-3542. [PMID: 33667161 DOI: 10.1109/tip.2021.3062689] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing RGB-D Salient Object Detection (SOD) methods take advantage of depth cues to improve the detection accuracy, while pay insufficient attention to the quality of depth information. In practice, a depth map is often with uneven quality and sometimes suffers from distractors, due to various factors in the acquisition procedure. In this article, to mitigate distractors in depth maps and highlight salient objects in RGB images, we propose a Hierarchical Alternate Interactions Network (HAINet) for RGB-D SOD. Specifically, HAINet consists of three key stages: feature encoding, cross-modal alternate interaction, and saliency reasoning. The main innovation in HAINet is the Hierarchical Alternate Interaction Module (HAIM), which plays a key role in the second stage for cross-modal feature interaction. HAIM first uses RGB features to filter distractors in depth features, and then the purified depth features are exploited to enhance RGB features in turn. The alternate RGB-depth-RGB interaction proceeds in a hierarchical manner, which progressively integrates local and global contexts within a single feature scale. In addition, we adopt a hybrid loss function to facilitate the training of HAINet. Extensive experiments on seven datasets demonstrate that our HAINet not only achieves competitive performance as compared with 19 relevant state-of-the-art methods, but also reaches a real-time processing speed of 43 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/HAINet.
Collapse
|
88
|
Jin WD, Xu J, Han Q, Zhang Y, Cheng MM. CDNet: Complementary Depth Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3376-3390. [PMID: 33646949 DOI: 10.1109/tip.2021.3060167] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Current RGB-D salient object detection (SOD) methods utilize the depth stream as complementary information to the RGB stream. However, the depth maps are usually of low-quality in existing RGB-D SOD datasets. Most RGB-D SOD networks trained with these datasets would produce error-prone results. In this paper, we propose a novel Complementary Depth Network (CDNet) to well exploit saliency-informative depth features for RGB-D SOD. To alleviate the influence of low-quality depth maps to RGB-D SOD, we propose to select saliency-informative depth maps as the training targets and leverage RGB features to estimate meaningful depth maps. Besides, to learn robust depth features for accurate prediction, we propose a new dynamic scheme to fuse the depth features extracted from the original and estimated depth maps with adaptive weights. What's more, we design a two-stage cross-modal feature fusion scheme to well integrate the depth features with the RGB ones, further improving the performance of our CDNet on RGB-D SOD. Experiments on seven benchmark datasets demonstrate that our CDNet outperforms state-of-the-art RGB-D SOD methods. The code is publicly available at https://github.com/blanclist/CDNet.
Collapse
|
89
|
Zhang P, Liu W, Zeng Y, Lei Y, Lu H. Looking for the Detail and Context Devils: High-Resolution Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3204-3216. [PMID: 33621174 DOI: 10.1109/tip.2020.3045624] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, Salient Object Detection (SOD) has shown great success with the achievements of large-scale benchmarks and deep learning techniques. However, existing SOD methods mainly focus on natural images with low-resolutions, e.g., 400×400 or less. This drawback hinders them for advanced practical applications, which need high-resolution, detail-aware results. Besides, lacking of the boundary detail and semantic context of salient objects is also a key concern for accurate SOD. To address these issues, in this work we focus on the High-Resolution Salient Object Detection (HRSOD) task. Technically, we propose the first end-to-end learnable framework, named Dual ReFinement Network (DRFNet), for fully automatic HRSOD. More specifically, the proposed DRFNet consists of a shared feature extractor and two effective refinement heads. By decoupling the detail and context information, one refinement head adopts a global-aware feature pyramid. Without increasing too much computational burden, it can boost the spatial detail information, which narrows the gap between high-level semantics and low-level details. In parallel, the other refinement head adopts hybrid dilated convolutional blocks and group-wise upsamplings, which are very efficient in extracting contextual information. Based on the dual refinements, our approach can enlarge receptive fields and obtain more discriminative features from high-resolution images. Experimental results on high-resolution benchmarks (the public DUT-HRSOD and the proposed DAVIS-SOD) demonstrate that our method is not only efficient but also performs more accurate than other state-of-the-arts. Besides, our method generalizes well on typical low-resolution benchmarks.
Collapse
|
90
|
Chen C, Wei J, Peng C, Qin H. Depth-Quality-Aware Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2350-2363. [PMID: 33481710 DOI: 10.1109/tip.2021.3052069] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The existing fusion-based RGB-D salient object detection methods usually adopt the bistream structure to strike a balance in the fusion trade-off between RGB and depth (D). While the D quality usually varies among the scenes, the state-of-the-art bistream approaches are depth-quality-unaware, resulting in substantial difficulties in achieving complementary fusion status between RGB and D and leading to poor fusion results for low-quality D. Thus, this paper attempts to integrate a novel depth-quality-aware subnet into the classic bistream structure in order to assess the depth quality prior to conducting the selective RGB-D fusion. Compared to the SOTA bistream methods, the major advantage of our method is its ability to lessen the importance of the low-quality, no-contribution, or even negative-contribution D regions during RGB-D fusion, achieving a much improved complementary status between RGB and D. Our source code and data are available online at https://github.com/qdu1995/DQSD.
Collapse
|
91
|
Wu J, Han G, Liu P, Yang H, Luo H, Li Q. Saliency Detection with Bilateral Absorbing Markov Chain Guided by Depth Information. SENSORS 2021; 21:s21030838. [PMID: 33513849 PMCID: PMC7865590 DOI: 10.3390/s21030838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/19/2021] [Accepted: 01/22/2021] [Indexed: 11/23/2022]
Abstract
The effectiveness of depth information in saliency detection has been fully proved. However, it is still worth exploring how to utilize the depth information more efficiently. Erroneous depth information may cause detection failure, while non-salient objects may be closer to the camera which also leads to erroneously emphasis on non-salient regions. Moreover, most of the existing RGB-D saliency detection models have poor robustness when the salient object touches the image boundaries. To mitigate these problems, we propose a multi-stage saliency detection model with the bilateral absorbing Markov chain guided by depth information. The proposed model progressively extracts the saliency cues with three level (low-, mid-, and high-level) stages. First, we generate low-level saliency cues by explicitly combining color and depth information. Then, we design a bilateral absorbing Markov chain to calculate mid-level saliency maps. In mid-level, to suppress boundary touch problem, we present the background seed screening mechanism (BSSM) for improving the construction of the two-layer sparse graph and better selecting background-based absorbing nodes. Furthermore, the cross-modal multi-graph learning model (CMLM) is designed to fully explore the intrinsic complementary relationship between color and depth information. Finally, to obtain a more highlighted and homogeneous saliency map in high-level, we structure a depth-guided optimization module which combines cellular automata and suppression-enhancement function pair. This optimization module refines the saliency map in color space and depth space, respectively. Comprehensive experiments on three challenging benchmark datasets demonstrate the effectiveness of our proposed method both qualitatively and quantitatively.
Collapse
Affiliation(s)
- Jiajia Wu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangliang Han
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- Correspondence:
| | - Peixun Liu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
| | - Hang Yang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
| | - Huiyuan Luo
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qingqing Li
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (J.W.); (P.L.); (H.Y.); (H.L.); (Q.L.)
- School of Optoelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
92
|
Zhang Z, Lin Z, Xu J, Jin WD, Lu SP, Fan DP. Bilateral Attention Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1949-1961. [PMID: 33439842 DOI: 10.1109/tip.2021.3049959] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RGB-D salient object detection (SOD) aims to segment the most attractive objects in a pair of cross-modal RGB and depth images. Currently, most existing RGB-D SOD methods focus on the foreground region when utilizing the depth images. However, the background also provides important information in traditional SOD methods for promising performance. To better explore salient information in both foreground and background regions, this paper proposes a Bilateral Attention Network (BiANet) for the RGB-D SOD task. Specifically, we introduce a Bilateral Attention Module (BAM) with a complementary attention mechanism: foreground-first (FF) attention and background-first (BF) attention. The FF attention focuses on the foreground region with a gradual refinement style, while the BF one recovers potentially useful salient information in the background region. Benefited from the proposed BAM module, our BiANet can capture more meaningful foreground and background cues, and shift more attention to refining the uncertain details between foreground and background regions. Additionally, we extend our BAM by leveraging the multi-scale techniques for better SOD performance. Extensive experiments on six benchmark datasets demonstrate that our BiANet outperforms other state-of-the-art RGB-D SOD methods in terms of objective metrics and subjective visual comparison. Our BiANet can run up to 80 fps on 224×224 RGB-D images, with an NVIDIA GeForce RTX 2080Ti GPU. Comprehensive ablation studies also validate our contributions.
Collapse
|
93
|
Zhou T, Fan DP, Cheng MM, Shen J, Shao L. RGB-D salient object detection: A survey. COMPUTATIONAL VISUAL MEDIA 2021; 7:37-69. [PMID: 33432275 PMCID: PMC7788385 DOI: 10.1007/s41095-020-0199-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/07/2020] [Indexed: 06/12/2023]
Abstract
Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
Collapse
Affiliation(s)
- Tao Zhou
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Deng-Ping Fan
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | | | - Jianbing Shen
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Ling Shao
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| |
Collapse
|
94
|
Abdel-Basset M, Chang V, Hawash H, Chakrabortty RK, Ryan M. FSS-2019-nCov: A deep learning architecture for semi-supervised few-shot segmentation of COVID-19 infection. Knowl Based Syst 2021; 212:106647. [PMID: 33519100 PMCID: PMC7836902 DOI: 10.1016/j.knosys.2020.106647] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/20/2020] [Accepted: 11/30/2020] [Indexed: 01/06/2023]
Abstract
The newly discovered coronavirus (COVID-19) pneumonia is providing major challenges to research in terms of diagnosis and disease quantification. Deep-learning (DL) techniques allow extremely precise image segmentation; yet, they necessitate huge volumes of manually labeled data to be trained in a supervised manner. Few-Shot Learning (FSL) paradigms tackle this issue by learning a novel category from a small number of annotated instances. We present an innovative semi-supervised few-shot segmentation (FSS) approach for efficient segmentation of 2019-nCov infection (FSS-2019-nCov) from only a few amounts of annotated lung CT scans. The key challenge of this study is to provide accurate segmentation of COVID-19 infection from a limited number of annotated instances. For that purpose, we propose a novel dual-path deep-learning architecture for FSS. Every path contains encoder-decoder (E-D) architecture to extract high-level information while maintaining the channel information of COVID-19 CT slices. The E-D architecture primarily consists of three main modules: a feature encoder module, a context enrichment (CE) module, and a feature decoder module. We utilize the pre-trained ResNet34 as an encoder backbone for feature extraction. The CE module is designated by a newly introduced proposed Smoothed Atrous Convolution (SAC) block and Multi-scale Pyramid Pooling (MPP) block. The conditioner path takes the pairs of CT images and their labels as input and produces a relevant knowledge representation that is transferred to the segmentation path to be used to segment the new images. To enable effective collaboration between both paths, we propose an adaptive recombination and recalibration (RR) module that permits intensive knowledge exchange between paths with a trivial increase in computational complexity. The model is extended to multi-class labeling for various types of lung infections. This contribution overcomes the limitation of the lack of large numbers of COVID-19 CT scans. It also provides a general framework for lung disease diagnosis in limited data situations.
Collapse
Affiliation(s)
- Mohamed Abdel-Basset
- Faculty of Computers and Informatics, Zagazig University, Zagazig, Sharqiyah, 44519, Egypt
| | - Victor Chang
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
| | - Hossam Hawash
- Faculty of Computers and Informatics, Zagazig University, Zagazig, Sharqiyah, 44519, Egypt
| | - Ripon K Chakrabortty
- Capability Systems Centre, School of Engineering and IT, UNSW Canberra, Australia
| | - Michael Ryan
- Capability Systems Centre, School of Engineering and IT, UNSW Canberra, Australia
| |
Collapse
|
95
|
Zhou W, Pan S, Lei J, Yu L. TMFNet: Three-Input Multilevel Fusion Network for Detecting Salient Objects in RGB-D Images. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2021.3097393] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
96
|
Ma G, Li S, Chen C, Hao A, Qin H. Stage-wise Salient Object Detection in 360° Omnidirectional Image via Object-level Semantical Saliency Ranking. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3535-3545. [PMID: 32941153 DOI: 10.1109/tvcg.2020.3023636] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The 2D image based salient object detection (SOD) has been extensively explored, while the 360° omnidirectional image based SOD has received less research attention and there exist three major bottlenecks that are limiting its performance. Firstly, the currently available training data is insufficient for the training of 360° SOD deep model. Secondly, the visual distortions in 360° omnidirectional images usually result in large feature gap between 360° images and 2D images; consequently, the widely used stage-wise training-a widely-used solution to alleviate the training data shortage problem, becomes infeasible when conducing SOD in 360° omnidirectional images. Thirdly, the existing 360° SOD approach has followed a multi-task methodology that performs salient object localization and segmentation-like saliency refinement at the same time, being faced with extremely large problem domain, making the training data shortage dilemma even worse. To tackle all these issues, this paper divides the 360° SOD into a multi-staqe task, the key rationale of which is to decompose the original complex problem domain into sequential easy sub problems that only demand for small-scale training data. Meanwhile, we learn how to rank the "object-level semantical saliency", aiming to locate salient viewpoints and objects accurately. Specifically, to alleviate the training data shortage problem, we have released a novel dataset named 360-SSOD, containing 1,105 360° omnidirectional images with manually annotated object-level saliency ground truth, whose semantical distribution is more balanced than that of the existing dataset. Also, we have compared the proposed method with 13 SOTA methods, and all quantitative results have demonstrated the performance superiority.
Collapse
|
97
|
Zhang K, Chen Z, Liu S. A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:572-587. [PMID: 33206602 DOI: 10.1109/tip.2020.3036749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, a recurrent neural network is designed for video saliency prediction considering spatial-temporal features. In our work, video frames are routed through the static network for spatial features and the dynamic network for temporal features. For the spatial-temporal feature integration, a novel select and re-weight fusion model is proposed which can learn and adjust the fusion weights based on the spatial and temporal features in different scenes automatically. Finally, an attention-aware convolutional long short term memory (ConvLSTM) network is developed to predict salient regions based on the features extracted from consecutive frames and generate the ultimate saliency map for each video frame. The proposed method is compared with state-of-the-art saliency models on five public video saliency benchmark datasets. The experimental results demonstrate that our model can achieve advanced performance on video saliency prediction.
Collapse
|
98
|
Wang X, Li S, Chen C, Fang Y, Hao A, Qin H. Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:458-471. [PMID: 33201813 DOI: 10.1109/tip.2020.3037470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing RGB-D salient object detection methods treat depth information as an independent component to complement RGB and widely follow the bistream parallel network architecture. To selectively fuse the CNN features extracted from both RGB and depth as a final result, the state-of-the-art (SOTA) bistream networks usually consist of two independent subbranches: one subbranch is used for RGB saliency, and the other aims for depth saliency. However, depth saliency is persistently inferior to the RGB saliency because the RGB component is intrinsically more informative than the depth component. The bistream architecture easily biases its subsequent fusion procedure to the RGB subbranch, leading to a performance bottleneck. In this paper, we propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction, where we cyclically convert the original 4-dimensional RGB-D into DGB, RDB and RGD. Then, a newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D, achieving a new SOTA performance.
Collapse
|
99
|
Gupta AK, Seal A, Prasad M, Khanna P. Salient Object Detection Techniques in Computer Vision-A Survey. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1174. [PMID: 33286942 PMCID: PMC7597345 DOI: 10.3390/e22101174] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/03/2020] [Accepted: 10/13/2020] [Indexed: 01/12/2023]
Abstract
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end.
Collapse
Affiliation(s)
- Ashish Kumar Gupta
- PDPM-Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, Dumna Airport Road, Jabalpur 482005, India; (A.K.G.); (A.S.)
| | - Ayan Seal
- PDPM-Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, Dumna Airport Road, Jabalpur 482005, India; (A.K.G.); (A.S.)
| | - Mukesh Prasad
- Centre for Artificial Intelligence, School of Computer Science, FEIT, University of Technology Sydney, Broadway, Sydney, NSW 2007, Australia;
| | - Pritee Khanna
- PDPM-Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, Dumna Airport Road, Jabalpur 482005, India; (A.K.G.); (A.S.)
| |
Collapse
|
100
|
Abstract
Depth information has been widely used to improve RGB-D salient object detection by extracting attention maps to determine the position information of objects in an image. However, non-salient objects may be close to the depth sensor and present high pixel intensities in the depth maps. This situation in depth maps inevitably leads to erroneously emphasize non-salient areas and may have a negative impact on the saliency results. To mitigate this problem, we propose a hybrid attention neural network that fuses middle- and high-level RGB features with depth features to generate a hybrid attention map to remove background information. The proposed network extracts multilevel features from RGB images using the Res2Net architecture and then integrates high-level features from depth maps using the Inception-v4-ResNet2 architecture. The mixed high-level RGB features and depth features generate the hybrid attention map, which is then multiplied to the low-level RGB features. After decoding by several convolutions and upsampling, we obtain the final saliency prediction, achieving state-of-the-art performance on the NJUD and NLPR datasets. Moreover, the proposed network has good generalization ability compared with other methods. An ablation study demonstrates that the proposed network effectively performs saliency prediction even when non-salient objects interfere detection. In fact, after removing the branch with high-level RGB features, the RGB attention map that guides the network for saliency prediction is lost, and all the performance measures decline. The resulting prediction map from the ablation study shows the effect of non-salient objects close to the depth sensor. This effect is not present when using the complete hybrid attention network. Therefore, RGB information can correct and supplement depth information, and the corresponding hybrid attention map is more robust than using a conventional attention map constructed only with depth information.
Collapse
|