1
|
Ren P, Bai T, Sun F. Bio-inspired two-stage network for efficient RGB-D salient object detection. Neural Netw 2025; 185:107244. [PMID: 39933318 DOI: 10.1016/j.neunet.2025.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/15/2025] [Accepted: 01/30/2025] [Indexed: 02/13/2025]
Abstract
Recently, with the development of the Convolutional Neural Network and Vision Transformer, the detection accuracy of the RGB-D salient object detection (SOD) model has been greatly improved. However, most of the existing methods cannot balance computational efficiency and performance well. In this paper, inspired by the P visual pathway and the M visual pathway in the primate biological visual system, we propose a Bio-inspired Two-stage Network for Efficient RGB-D SOD, named BTNet. It simulates the visual information processing of the P visual pathway and the M visual pathway. Specifically, BTNet contains two stages: region locking and object refinement. Among them, the region locking stage simulates the visual information processing process of the M visual pathway to obtain coarse-grained visual representation. The object refinement stage simulates the visual information processing process of the P visual pathway to obtain fine-grained visual representation. Experimental results show that BTNet outperforms other state-of-the-art methods on six mainstream benchmark datasets, achieving significant parameter reduction and processing 384 × 384 resolution images at a speed of 175.4 Frames Per Second (FPS). Compared with the cutting-edge method CPNet, BTNet reduces parameters by 93.6% and is nearly 7.2 times faster. The source codes are available at https://github.com/ROC-Star/BTNet.
Collapse
Affiliation(s)
- Peng Ren
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Tian Bai
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
| | - Fuming Sun
- School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|
2
|
Zhang Y, Chen F, Peng Z, Zou W, Nie M, Zhang C. Two-way focal stack fusion for light field saliency detection. APPLIED OPTICS 2023; 62:9057-9065. [PMID: 38108742 DOI: 10.1364/ao.500999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/02/2023] [Indexed: 12/19/2023]
Abstract
To improve the accuracy of saliency detection in challenging scenes such as small objects, multiple objects, and blur, we propose a light field saliency detection method via two-way focal stack fusion. The first way extracts latent depth features by calculating the transmittance of the focal stack to avoid the interference of out-of-focus regions. The second way analyzes the focused distribution and calculates the background probability of the slice, which can distinguish the foreground from the background. Extracting the potential cues of the focal stack through the two different ways can improve saliency detection in complex scenes. Finally, a multi-layer cellular automaton optimizer is utilized to incorporate compactness, focus, center prior, and depth features to obtain the final salient result. Comparison and ablation experiments are performed to verify the effectiveness of the proposed method. Experimental results prove that the proposed method demonstrates effectiveness in challenging scenarios and outperforms the state-of-the-art methods. They also verify that the depth and focus cues of the focal stack can enhance the performance of previous methods.
Collapse
|
3
|
Zhang Y, Chen F, Peng Z, Zou W, Zhang C. Exploring Focus and Depth-Induced Saliency Detection for Light Field. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1336. [PMID: 37761635 PMCID: PMC10530224 DOI: 10.3390/e25091336] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/30/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023]
Abstract
An abundance of features in the light field has been demonstrated to be useful for saliency detection in complex scenes. However, bottom-up saliency detection models are limited in their ability to explore light field features. In this paper, we propose a light field saliency detection method that focuses on depth-induced saliency, which can more deeply explore the interactions between different cues. First, we localize a rough saliency region based on the compactness of color and depth. Then, the relationships among depth, focus, and salient objects are carefully investigated, and the focus cue of the focal stack is used to highlight the foreground objects. Meanwhile, the depth cue is utilized to refine the coarse salient objects. Furthermore, considering the consistency of color smoothing and depth space, an optimization model referred to as color and depth-induced cellular automata is improved to increase the accuracy of saliency maps. Finally, to avoid interference of redundant information, the mean absolute error is chosen as the indicator of the filter to obtain the best results. The experimental results on three public light field datasets show that the proposed method performs favorably against the state-of-the-art conventional light field saliency detection approaches and even light field saliency detection approaches based on deep learning.
Collapse
Affiliation(s)
- Yani Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| | - Fen Chen
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Zongju Peng
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Wenhui Zou
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Changhe Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| |
Collapse
|
4
|
Transformers and CNNs Fusion Network for Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.10.081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
5
|
Duan F, Wu Y, Guan H, Wu C. Saliency Detection of Light Field Images by Fusing Focus Degree and GrabCut. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22197411. [PMID: 36236507 PMCID: PMC9573000 DOI: 10.3390/s22197411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/17/2022] [Accepted: 09/27/2022] [Indexed: 06/12/2023]
Abstract
In the light field image saliency detection task, redundant cues are introduced due to computational methods. Inevitably, it leads to the inaccurate boundary segmentation of detection results and the problem of the chain block effect. To tackle this issue, we propose a method for salient object detection (SOD) in light field images that fuses focus and GrabCut. The method improves the light field focus calculation based on the spatial domain by performing secondary blurring processing on the focus image and effectively suppresses the focus information of out-of-focus areas in different focus images. Aiming at the redundancy of focus cues generated by multiple foreground images, we use the optimal single foreground image to generate focus cues. In addition, aiming at the fusion of various cues in the light field in complex scenes, the GrabCut algorithm is combined with the focus cue to guide the generation of color cues, which realizes the automatic saliency target segmentation of the image foreground. Extensive experiments are conducted on the light field dataset to demonstrate that our algorithm can effectively segment the salient target area and background area under the light field image, and the outline of the salient object is clear. Compared with the traditional GrabCut algorithm, the focus degree is used instead of artificial Interactively initialize GrabCut to achieve automatic saliency segmentation.
Collapse
Affiliation(s)
- Fuzhou Duan
- Engineering Research Center of Spatial Information Technology, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
- Key Lab of 3D Information Acquisition and Application, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
| | - Yanyan Wu
- Engineering Research Center of Spatial Information Technology, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
- Key Lab of 3D Information Acquisition and Application, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
| | - Hongliang Guan
- Engineering Research Center of Spatial Information Technology, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
- Key Lab of 3D Information Acquisition and Application, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
- Academy for Multidisciplinary Studies, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
| | - Chenbo Wu
- Engineering Research Center of Spatial Information Technology, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
- Key Lab of 3D Information Acquisition and Application, MOE, Capital Normal University, 105 West Third Ring North Road, Haidian District, Beijing 100048, China
| |
Collapse
|
6
|
Progressive Multi-Scale Fusion Network for Light Field Super-Resolution. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Light field (LF) cameras can record multi-view images from a single scene, and these images can provide spatial and angular information to improve the performance of image super-resolution (SR). However, it is a challenge to incorporate distinctive information from different LF views. At the same time, due to the inherent resolution of the image sensor, high spatial and angular resolution are trade-off problems. In this paper, we propose a progressive multi-scale fusion network (PMFN) to improve the LFSR performance. Specifically, a progressive feature fusion block (PFFB) based on an encoder-and-decoder structure is designed to implicitly align disparities and integrate complementary information between complementary views. The core module of the PFFB is a dual-branch multi-scale fusion module (DMFM), which can integrate the information from a reference view and auxiliary views to produce a fusion feature. Each DMFM consists of two parallel branches, which have different receptive fields to fuse hierarchical features from complementary views. Three DMFMs with a dense connection are used in the PFFB, which can fully exploit multi-level features to improve the SR performance. Experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves state-of-the-art performance among existing methods. Moreover, quantitative results show that our method can also generate faithful details.
Collapse
|
7
|
High Edge-Quality Light-Field Salient Object Detection Using Convolutional Neural Network. ELECTRONICS 2022. [DOI: 10.3390/electronics11071054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The detection result of current light-field salient object detection methods suffers from loss of edge details, which significantly limits the performance of subsequent computer vision tasks. To solve this problem, we propose a novel convolutional neural network to accurately detect salient objects, by digging effective edge information from light-field data. In particular, our method is divided into four steps. Firstly, the network extracts multi-level saliency features from light-field data. Secondly, edge features are extracted from low-level saliency features and optimized by ground-truth guidance. Then, to sufficiently leverage high-level saliency features and edge features, the network hierarchically fuses them in a complementary manner. Finally, spatial correlations between different levels of fused features are considered to detect salient objects. Our method can accurately locate salient objects with exquisite edge details, by extracting clear edge information and accurate saliency information and fully fusing them. We conduct extensive evaluations on three widely used benchmark datasets. The experimental results demonstrate the effectiveness of our method, and it is superior to eight state-of-the-art methods.
Collapse
|
8
|
|
9
|
Zhou T, Fan DP, Cheng MM, Shen J, Shao L. RGB-D salient object detection: A survey. COMPUTATIONAL VISUAL MEDIA 2021; 7:37-69. [PMID: 33432275 PMCID: PMC7788385 DOI: 10.1007/s41095-020-0199-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/07/2020] [Indexed: 06/12/2023]
Abstract
Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
Collapse
Affiliation(s)
- Tao Zhou
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Deng-Ping Fan
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | | | - Jianbing Shen
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Ling Shao
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| |
Collapse
|
10
|
Wang Y, Yang J, Wang L, Ying X, Wu T, An W, Guo Y. Light Field Image Super-Resolution Using Deformable Convolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:1057-1071. [PMID: 33290218 DOI: 10.1109/tip.2020.3042059] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Light field (LF) cameras can record scenes from multiple perspectives, and thus introduce beneficial angular information for image super-resolution (SR). However, it is challenging to incorporate angular information due to disparities among LF images. In this paper, we propose a deformable convolution network (i.e., LF-DFnet) to handle the disparity problem for LF image SR. Specifically, we design an angular deformable alignment module (ADAM) for feature-level alignment. Based on ADAM, we further propose a collect-and-distribute approach to perform bidirectional alignment between the center-view feature and each side-view feature. Using our approach, angular information can be well incorporated and encoded into features of each view, which benefits the SR reconstruction of all LF images. Moreover, we develop a baseline-adjustable LF dataset to evaluate SR performance under different disparity variations. Experiments on both public and our self-developed datasets have demonstrated the superiority of our method. Our LF-DFnet can generate high-resolution images with more faithful details and achieve state-of-the-art reconstruction accuracy. Besides, our LF-DFnet is more robust to disparity variations, which has not been well addressed in literature.
Collapse
|
11
|
Wang X, Li S, Chen C, Fang Y, Hao A, Qin H. Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:458-471. [PMID: 33201813 DOI: 10.1109/tip.2020.3037470] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing RGB-D salient object detection methods treat depth information as an independent component to complement RGB and widely follow the bistream parallel network architecture. To selectively fuse the CNN features extracted from both RGB and depth as a final result, the state-of-the-art (SOTA) bistream networks usually consist of two independent subbranches: one subbranch is used for RGB saliency, and the other aims for depth saliency. However, depth saliency is persistently inferior to the RGB saliency because the RGB component is intrinsically more informative than the depth component. The bistream architecture easily biases its subsequent fusion procedure to the RGB subbranch, leading to a performance bottleneck. In this paper, we propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction, where we cyclically convert the original 4-dimensional RGB-D into DGB, RDB and RGD. Then, a newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D, achieving a new SOTA performance.
Collapse
|