1
|
Ma S, Zhu X, Xu L, Zhou L, Chen D. LRNet: lightweight attention-oriented residual fusion network for light field salient object detection. Sci Rep 2024; 14:26030. [PMID: 39472603 PMCID: PMC11522285 DOI: 10.1038/s41598-024-76874-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 10/17/2024] [Indexed: 11/02/2024] Open
Abstract
Light field imaging contains abundant scene structure information, which can improve the accuracy of salient object detection in challenging tasks and has received widespread attention. However, how to apply the abundant information of light field imaging to salient object detection still faces enormous challenges. In this paper, the lightweight attention and residual convLSTM network is proposed to address this issue, which is mainly composed of the lightweight attention-based feature enhancement module (LFM) and residual convLSTM-based feature integration module (RFM). The LFM can provide an attention map for each focal slice through the attention mechanism to focus on the features related to the object, thereby enhancing saliency features. The RFM leverages the residual mechanism and convLSTM to fully utilize the spatial structural information of focal slices, thereby achieving high-precision feature fusion. Experimental results on three publicly available light field datasets show that the proposed method surpasses the existing 17 state-of-the-art methods and achieves the highest score among five quantitative indicators.
Collapse
Affiliation(s)
- Shuai Ma
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| | - Xusheng Zhu
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China.
| | - Long Xu
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| | - Li Zhou
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| | - Daixin Chen
- ChengDu Aircraft Industrial (Group) Co., Ltd., Qingyang, Chengdu, 610092, Sichuan, China
| |
Collapse
|
2
|
Zhang Y, Chen F, Peng Z, Zou W, Nie M, Zhang C. Two-way focal stack fusion for light field saliency detection. APPLIED OPTICS 2023; 62:9057-9065. [PMID: 38108742 DOI: 10.1364/ao.500999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/02/2023] [Indexed: 12/19/2023]
Abstract
To improve the accuracy of saliency detection in challenging scenes such as small objects, multiple objects, and blur, we propose a light field saliency detection method via two-way focal stack fusion. The first way extracts latent depth features by calculating the transmittance of the focal stack to avoid the interference of out-of-focus regions. The second way analyzes the focused distribution and calculates the background probability of the slice, which can distinguish the foreground from the background. Extracting the potential cues of the focal stack through the two different ways can improve saliency detection in complex scenes. Finally, a multi-layer cellular automaton optimizer is utilized to incorporate compactness, focus, center prior, and depth features to obtain the final salient result. Comparison and ablation experiments are performed to verify the effectiveness of the proposed method. Experimental results prove that the proposed method demonstrates effectiveness in challenging scenarios and outperforms the state-of-the-art methods. They also verify that the depth and focus cues of the focal stack can enhance the performance of previous methods.
Collapse
|
3
|
Zhang Y, Chen F, Peng Z, Zou W, Zhang C. Exploring Focus and Depth-Induced Saliency Detection for Light Field. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1336. [PMID: 37761635 PMCID: PMC10530224 DOI: 10.3390/e25091336] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/30/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023]
Abstract
An abundance of features in the light field has been demonstrated to be useful for saliency detection in complex scenes. However, bottom-up saliency detection models are limited in their ability to explore light field features. In this paper, we propose a light field saliency detection method that focuses on depth-induced saliency, which can more deeply explore the interactions between different cues. First, we localize a rough saliency region based on the compactness of color and depth. Then, the relationships among depth, focus, and salient objects are carefully investigated, and the focus cue of the focal stack is used to highlight the foreground objects. Meanwhile, the depth cue is utilized to refine the coarse salient objects. Furthermore, considering the consistency of color smoothing and depth space, an optimization model referred to as color and depth-induced cellular automata is improved to increase the accuracy of saliency maps. Finally, to avoid interference of redundant information, the mean absolute error is chosen as the indicator of the filter to obtain the best results. The experimental results on three public light field datasets show that the proposed method performs favorably against the state-of-the-art conventional light field saliency detection approaches and even light field saliency detection approaches based on deep learning.
Collapse
Affiliation(s)
- Yani Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| | - Fen Chen
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Zongju Peng
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Wenhui Zou
- Faculty of Information Science and Engineering, Ningbo University, No. 818, Ningbo 315211, China;
| | - Changhe Zhang
- School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing 400054, China; (Y.Z.); (Z.P.); (C.Z.)
| |
Collapse
|
4
|
Wang S, Sheng H, Yang D, Cui Z, Cong R, Ke W. MFSRNet: spatial-angular correlation retaining for light field super-resolution. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04558-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
|
5
|
Salem A, Ibrahem H, Kang HS. Light Field Image Super-Resolution Using Deep Residual Networks on Lenslet Images. SENSORS (BASEL, SWITZERLAND) 2023; 23:2018. [PMID: 36850618 PMCID: PMC9968150 DOI: 10.3390/s23042018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/06/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
Due to its widespread usage in many applications, numerous deep learning algorithms have been proposed to overcome Light Field's trade-off (LF). The sensor's low resolution limits angular and spatial resolution, which causes this trade-off. The proposed method should be able to model the non-local properties of the 4D LF data fully to mitigate this problem. Therefore, this paper proposes a different approach to increase spatial and angular information interaction for LF image super-resolution (SR). We achieved this by processing the LF Sub-Aperture Images (SAI) independently to extract the spatial information and the LF Macro-Pixel Image (MPI) to extract the angular information. The MPI or Lenslet LF image is characterized by its ability to integrate more complementary information between different viewpoints (SAIs). In particular, we extract initial features and then process MAI and SAIs alternately to incorporate angular and spatial information. Finally, the interacted features are added to the initial extracted features to reconstruct the final output. We trained the proposed network to minimize the sum of absolute errors between low-resolution (LR) input and high-resolution (HR) output images. Experimental results prove the high performance of our proposed method over the state-of-the-art methods on LFSR for small baseline LF images.
Collapse
Affiliation(s)
- Ahmed Salem
- School of Information and Communication Engineering, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea
- Electrical Engineering Department, Faculty of Engineering, Assiut University, Assiut 71515, Egypt
| | - Hatem Ibrahem
- School of Information and Communication Engineering, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea
| | - Hyun-Soo Kang
- School of Information and Communication Engineering, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea
| |
Collapse
|
6
|
Piao Y, Jiang Y, Zhang M, Wang J, Lu H. PANet: Patch-Aware Network for Light Field Salient Object Detection. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:379-391. [PMID: 34406954 DOI: 10.1109/tcyb.2021.3095512] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Most existing light field saliency detection methods have achieved great success by exploiting unique light field data-focus information in focal slices. However, they process light field data in a slicewise way, leading to suboptimal results because the relative contribution of different regions in focal slices is ignored. How we can comprehensively explore and integrate focused saliency regions that would positively contribute to accurate saliency detection. Answering this question inspires us to develop a new insight. In this article, we propose a patch-aware network to explore light field data in a regionwise way. First, we excavate focused salient regions with a proposed multisource learning module (MSLM), which generates a filtering strategy for integration followed by three guidances based on saliency, boundary, and position. Second, we design a sharpness recognition module (SRM) to refine and update this strategy and perform feature integration. With our proposed MSLM and SRM, we can obtain more accurate and complete saliency maps. Comprehensive experiments on three benchmark datasets prove that our proposed method achieves competitive performance over 2-D, 3-D, and 4-D salient object detection methods. The code and results of our method are available at https://github.com/OIPLab-DUT/IEEE-TCYB-PANet.
Collapse
|
7
|
Li J, Ji W, Zhang M, Piao Y, Lu H, Cheng L. Delving into Calibrated Depth for Accurate RGB-D Salient Object Detection. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01734-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
8
|
Liang Z, Wang P, Xu K, Zhang P, Lau RWH. Weakly-Supervised Salient Object Detection on Light Fields. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6295-6305. [PMID: 36149997 DOI: 10.1109/tip.2022.3207605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most existing salient object detection (SOD) methods are designed for RGB images and do not take advantage of the abundant information provided by light fields. Hence, they may fail to detect salient objects of complex structures and delineate their boundaries. Although some methods have explored multi-view information of light field images for saliency detection, they require tedious pixel-level manual annotations of ground truths. In this paper, we propose a novel weakly-supervised learning framework for salient object detection on light field images based on bounding box annotations. Our method has two major novelties. First, given an input light field image and a bounding-box annotation indicating the salient object, we propose a ground truth label hallucination method to generate a pixel-level pseudo saliency map, to avoid heavy cost of pixel-level annotations. This method generates high quality pseudo ground truth saliency maps to help supervise the training, by exploiting information obtained from the light field (including depths and RGB images). Second, to exploit the multi-view nature of the light field data in learning, we propose a fusion attention module to calibrate the spatial and channel-wise light field representations. It learns to focus on informative features and suppress redundant information from the multi-view inputs. Based on these two novelties, we are able to train a new salient object detector with two branches in a weakly-supervised manner. While the RGB branch focuses on modeling the color contrast in the all-in-focus image for locating the salient objects, the Focal branch exploits the depth and the background spatial redundancy of focal slices for eliminating background distractions. Extensive experiments show that our method outperforms existing weakly-supervised methods and most fully supervised methods.
Collapse
|
9
|
Zhang M, Xu S, Piao Y, Lu H. Exploring Spatial Correlation for Light Field Saliency Detection: Expansion From a Single View. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6152-6163. [PMID: 36112561 DOI: 10.1109/tip.2022.3205749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Previous 2D saliency detection methods extract salient cues from a single view and directly predict the expected results. Both traditional and deep-learning-based 2D methods do not consider geometric information of 3D scenes. Therefore the relationship between scene understanding and salient objects cannot be effectively established. This limits the performance of 2D saliency detection in challenging scenes. In this paper, we show for the first time that saliency detection problem can be reformulated as two sub-problems: light field synthesis from a single view and light-field-driven saliency detection. This paper first introduces a high-quality light field synthesis network to produce reliable 4D light field information. Then a novel light-field-driven saliency detection network is proposed, in which a Direction-specific Screening Unit (DSU) is tailored to exploit the spatial correlation among multiple viewpoints. The whole pipeline can be trained in an end-to-end fashion. Experimental results demonstrate that the proposed method outperforms the state-of-the-art 2D, 3D and 4D saliency detection methods. Our code is publicly available at https://github.com/OIPLab-DUT/ESCNet.
Collapse
|
10
|
|
11
|
High Edge-Quality Light-Field Salient Object Detection Using Convolutional Neural Network. ELECTRONICS 2022. [DOI: 10.3390/electronics11071054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The detection result of current light-field salient object detection methods suffers from loss of edge details, which significantly limits the performance of subsequent computer vision tasks. To solve this problem, we propose a novel convolutional neural network to accurately detect salient objects, by digging effective edge information from light-field data. In particular, our method is divided into four steps. Firstly, the network extracts multi-level saliency features from light-field data. Secondly, edge features are extracted from low-level saliency features and optimized by ground-truth guidance. Then, to sufficiently leverage high-level saliency features and edge features, the network hierarchically fuses them in a complementary manner. Finally, spatial correlations between different levels of fused features are considered to detect salient objects. Our method can accurately locate salient objects with exquisite edge details, by extracting clear edge information and accurate saliency information and fully fusing them. We conduct extensive evaluations on three widely used benchmark datasets. The experimental results demonstrate the effectiveness of our method, and it is superior to eight state-of-the-art methods.
Collapse
|
12
|
Ji W, Yan G, Li J, Piao Y, Yao S, Zhang M, Cheng L, Lu H. DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:2321-2336. [PMID: 35245195 DOI: 10.1109/tip.2022.3154931] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work, we propose a novel depth-induced multi-scale recurrent attention network for RGB-D saliency detection, named as DMRA. It achieves dramatic performance especially in complex scenarios. There are four main contributions of our network that are experimentally demonstrated to have significant practical merits. First, we design an effective depth refinement block using residual connections to fully extract and fuse cross-modal complementary cues from RGB and depth streams. Second, depth cues with abundant spatial information are innovatively combined with multi-scale contextual features for accurately locating salient objects. Third, a novel recurrent attention module inspired by Internal Generative Mechanism of human brain is designed to generate more accurate saliency results via comprehensively learning the internal semantic relation of the fused feature and progressively optimizing local details with memory-oriented scene understanding. Finally, a cascaded hierarchical feature fusion strategy is designed to promote efficient information interaction of multi-level contextual features and further improve the contextual representability of model. In addition, we introduce a new real-life RGB-D saliency dataset containing a variety of complex scenarios that has been widely used as a benchmark dataset in recent RGB-D saliency detection research. Extensive empirical experiments demonstrate that our method can accurately identify salient objects and achieve appealing performance against 18 state-of-the-art RGB-D saliency models on nine benchmark datasets.
Collapse
|
13
|
Luo H, Han G, Wu X, Liu P, Yang H, Zhang X. LF3Net: Leader-follower feature fusing network for fast saliency detection. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01452-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
15
|
Zhang P, Liu W, Zeng Y, Lei Y, Lu H. Looking for the Detail and Context Devils: High-Resolution Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3204-3216. [PMID: 33621174 DOI: 10.1109/tip.2020.3045624] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In recent years, Salient Object Detection (SOD) has shown great success with the achievements of large-scale benchmarks and deep learning techniques. However, existing SOD methods mainly focus on natural images with low-resolutions, e.g., 400×400 or less. This drawback hinders them for advanced practical applications, which need high-resolution, detail-aware results. Besides, lacking of the boundary detail and semantic context of salient objects is also a key concern for accurate SOD. To address these issues, in this work we focus on the High-Resolution Salient Object Detection (HRSOD) task. Technically, we propose the first end-to-end learnable framework, named Dual ReFinement Network (DRFNet), for fully automatic HRSOD. More specifically, the proposed DRFNet consists of a shared feature extractor and two effective refinement heads. By decoupling the detail and context information, one refinement head adopts a global-aware feature pyramid. Without increasing too much computational burden, it can boost the spatial detail information, which narrows the gap between high-level semantics and low-level details. In parallel, the other refinement head adopts hybrid dilated convolutional blocks and group-wise upsamplings, which are very efficient in extracting contextual information. Based on the dual refinements, our approach can enlarge receptive fields and obtain more discriminative features from high-resolution images. Experimental results on high-resolution benchmarks (the public DUT-HRSOD and the proposed DAVIS-SOD) demonstrate that our method is not only efficient but also performs more accurate than other state-of-the-arts. Besides, our method generalizes well on typical low-resolution benchmarks.
Collapse
|
16
|
Zhou T, Fan DP, Cheng MM, Shen J, Shao L. RGB-D salient object detection: A survey. COMPUTATIONAL VISUAL MEDIA 2021; 7:37-69. [PMID: 33432275 PMCID: PMC7788385 DOI: 10.1007/s41095-020-0199-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/07/2020] [Indexed: 06/12/2023]
Abstract
Salient object detection, which simulates human visual perception in locating the most significant object(s) in a scene, has been widely applied to various computer vision tasks. Now, the advent of depth sensors means that depth maps can easily be captured; this additional spatial information can boost the performance of salient object detection. Although various RGB-D based salient object detection models with promising performance have been proposed over the past several years, an in-depth understanding of these models and the challenges in this field remains lacking. In this paper, we provide a comprehensive survey of RGB-D based salient object detection models from various perspectives, and review related benchmark datasets in detail. Further, as light fields can also provide depth maps, we review salient object detection models and popular benchmark datasets from this domain too. Moreover, to investigate the ability of existing models to detect salient objects, we have carried out a comprehensive attribute-based evaluation of several representative RGB-D based salient object detection models. Finally, we discuss several challenges and open directions of RGB-D based salient object detection for future research. All collected models, benchmark datasets, datasets constructed for attribute-based evaluation, and related code are publicly available at https://github.com/taozh2017/RGBD-SODsurvey.
Collapse
Affiliation(s)
- Tao Zhou
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Deng-Ping Fan
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | | | - Jianbing Shen
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| | - Ling Shao
- Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, United Arab Emirates
| |
Collapse
|
17
|
Zhang K, Chen Z, Liu S. A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:572-587. [PMID: 33206602 DOI: 10.1109/tip.2020.3036749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, a recurrent neural network is designed for video saliency prediction considering spatial-temporal features. In our work, video frames are routed through the static network for spatial features and the dynamic network for temporal features. For the spatial-temporal feature integration, a novel select and re-weight fusion model is proposed which can learn and adjust the fusion weights based on the spatial and temporal features in different scenes automatically. Finally, an attention-aware convolutional long short term memory (ConvLSTM) network is developed to predict salient regions based on the features extracted from consecutive frames and generate the ultimate saliency map for each video frame. The proposed method is compared with state-of-the-art saliency models on five public video saliency benchmark datasets. The experimental results demonstrate that our model can achieve advanced performance on video saliency prediction.
Collapse
|